Talk to a Video
We can use Google Gemini's native video input capabilities to ask questions about a video. In this example, we upload a video file using Gemini's file API, and then pass the result using the GenericArtifact to the Agent. Note that because we are using Gemini-specific features, this will not work with other Prompt Drivers.
import time
from google.generativeai.files import get_file, upload_file
from griptape.artifacts import GenericArtifact, TextArtifact
from griptape.configs import Defaults
from griptape.configs.drivers import GoogleDriversConfig
from griptape.structures import Agent
Defaults.drivers_config = GoogleDriversConfig()
video_file = upload_file(path="tests/resources/griptape-comfyui.mp4")
while video_file.state.name == "PROCESSING":
time.sleep(2)
video_file = get_file(video_file.name)
if video_file.state.name == "FAILED":
raise ValueError(video_file.state.name)
agent = Agent(
input=[
GenericArtifact(video_file),
TextArtifact("Answer this question regarding the video: {{ args[0] }}"),
]
)
agent.run("Are there any scenes that show a character with earings?")
agent.run("What happens in the scene starting at 19 seconds?")
[07/15/24 11:27:17] INFO PromptTask 765a4a2833a34084b0077fb49c948ace
Input: genai.File({
'name': 'files/zzqllezrzmz7',
'display_name': 'griptape-comfyui.mp4',
'mime_type': 'video/mp4',
'sha256_hash': 'ODk5ZDIxMzQwMGZjYTJkNWU3OTY3YjgzZmUxNzg1ZTNmYzc2YTAxMzgxMWIzYWQyMTBjNzM4ODc5MjU1ZmFmNQ==',
'size_bytes': '4667824',
'state': 'ACTIVE',
'uri': 'https://generativelanguage.googleapis.com/v1beta/files/zzqllezrzmz7',
'video_metadata': {'video_duration': '36s'},
'create_time': '2024-07-15T18:27:14.692475Z',
'expiration_time': '2024-07-17T18:27:14.625351853Z',
'update_time': '2024-07-15T18:27:16.179456Z'})
Are there any scenes that show a character with earings?
[07/15/24 11:27:21] INFO PromptTask 765a4a2833a34084b0077fb49c948ace
Output: Yes, there are a few scenes that show characters with earrings:
* **0:10-0:15:** A woman with short, dark hair and facial tattoos is shown wearing large hoop earrings.
* **0:23:** A woman with short, dark hair and facial tattoos is shown wearing large hoop earrings.
Let me know if you have any other questions about the video!
INFO PromptTask 765a4a2833a34084b0077fb49c948ace
Input: genai.File({
'name': 'files/zzqllezrzmz7',
'display_name': 'griptape-comfyui.mp4',
'mime_type': 'video/mp4',
'sha256_hash': 'ODk5ZDIxMzQwMGZjYTJkNWU3OTY3YjgzZmUxNzg1ZTNmYzc2YTAxMzgxMWIzYWQyMTBjNzM4ODc5MjU1ZmFmNQ==',
'size_bytes': '4667824',
'state': 'ACTIVE',
'uri': 'https://generativelanguage.googleapis.com/v1beta/files/zzqllezrzmz7',
'video_metadata': {'video_duration': '36s'},
'create_time': '2024-07-15T18:27:14.692475Z',
'expiration_time': '2024-07-17T18:27:14.625351853Z',
'update_time': '2024-07-15T18:27:16.179456Z'})
What happens in the scene starting at 19 seconds?
[07/15/24 11:27:26] INFO PromptTask 765a4a2833a34084b0077fb49c948ace
Output: At 19 seconds, a futuristic, four-legged robotic vehicle descends from the sky in a misty forest. The vehicle resembles a mechanical
spider or insect, with a rounded central body and powerful-looking legs. It hovers slightly above the ground, emitting two bright beams of
light from its underside. The scene has a slightly eerie and mysterious atmosphere, with the fog and bare trees adding to the sense of
otherworldliness.