Talk to a Video
We can use Google Gemini's native video input capabilities to ask questions about a video. In this example, we upload a video file using Gemini's file API, and then pass the result using the GenericArtifact to the Agent. Note that because we are using Gemini-specific features, this will not work with other Prompt Drivers.
import time
import google.generativeai as genai
from griptape.artifacts import GenericArtifact, TextArtifact
from griptape.configs import Defaults
from griptape.configs.drivers import GoogleDriversConfig
from griptape.structures import Agent
Defaults.drivers_config = GoogleDriversConfig()
video_file = genai.upload_file(path="tests/resources/griptape-comfyui.mp4")
while video_file.state.name == "PROCESSING":
time.sleep(2)
video_file = genai.get_file(video_file.name)
if video_file.state.name == "FAILED":
raise ValueError(video_file.state.name)
agent = Agent(
input=[
GenericArtifact(video_file),
TextArtifact("Answer this question regarding the video: {{ args[0] }}"),
]
)
agent.run("Are there any scenes that show a character with earings?")
agent.run("What happens in the scene starting at 19 seconds?")
[07/15/24 11:27:17] INFO PromptTask 765a4a2833a34084b0077fb49c948ace
Input: genai.File({
'name': 'files/zzqllezrzmz7',
'display_name': 'griptape-comfyui.mp4',
'mime_type': 'video/mp4',
'sha256_hash': 'ODk5ZDIxMzQwMGZjYTJkNWU3OTY3YjgzZmUxNzg1ZTNmYzc2YTAxMzgxMWIzYWQyMTBjNzM4ODc5MjU1ZmFmNQ==',
'size_bytes': '4667824',
'state': 'ACTIVE',
'uri': 'https://generativelanguage.googleapis.com/v1beta/files/zzqllezrzmz7',
'video_metadata': {'video_duration': '36s'},
'create_time': '2024-07-15T18:27:14.692475Z',
'expiration_time': '2024-07-17T18:27:14.625351853Z',
'update_time': '2024-07-15T18:27:16.179456Z'})
Are there any scenes that show a character with earings?
[07/15/24 11:27:21] INFO PromptTask 765a4a2833a34084b0077fb49c948ace
Output: Yes, there are a few scenes that show characters with earrings:
* **0:10-0:15:** A woman with short, dark hair and facial tattoos is shown wearing large hoop earrings.
* **0:23:** A woman with short, dark hair and facial tattoos is shown wearing large hoop earrings.
Let me know if you have any other questions about the video!
INFO PromptTask 765a4a2833a34084b0077fb49c948ace
Input: genai.File({
'name': 'files/zzqllezrzmz7',
'display_name': 'griptape-comfyui.mp4',
'mime_type': 'video/mp4',
'sha256_hash': 'ODk5ZDIxMzQwMGZjYTJkNWU3OTY3YjgzZmUxNzg1ZTNmYzc2YTAxMzgxMWIzYWQyMTBjNzM4ODc5MjU1ZmFmNQ==',
'size_bytes': '4667824',
'state': 'ACTIVE',
'uri': 'https://generativelanguage.googleapis.com/v1beta/files/zzqllezrzmz7',
'video_metadata': {'video_duration': '36s'},
'create_time': '2024-07-15T18:27:14.692475Z',
'expiration_time': '2024-07-17T18:27:14.625351853Z',
'update_time': '2024-07-15T18:27:16.179456Z'})
What happens in the scene starting at 19 seconds?
[07/15/24 11:27:26] INFO PromptTask 765a4a2833a34084b0077fb49c948ace
Output: At 19 seconds, a futuristic, four-legged robotic vehicle descends from the sky in a misty forest. The vehicle resembles a mechanical
spider or insect, with a rounded central body and powerful-looking legs. It hovers slightly above the ground, emitting two bright beams of
light from its underside. The scene has a slightly eerie and mysterious atmosphere, with the fog and bare trees adding to the sense of
otherworldliness.