Talk to a Video

We can use Google Gemini's native video input capabilities to ask questions about a video. In this example, we upload a video file using Gemini's file API, and then pass the result using the GenericArtifact to the Agent. Note that because we are using Gemini-specific features, this will not work with other Prompt Drivers.

import time

from google.generativeai.files import get_file, upload_file

from griptape.artifacts import GenericArtifact, TextArtifact
from griptape.configs import Defaults
from griptape.configs.drivers import GoogleDriversConfig
from griptape.structures import Agent

Defaults.drivers_config = GoogleDriversConfig()

video_file = upload_file(path="tests/resources/griptape-comfyui.mp4")
while video_file.state.name == "PROCESSING":
    time.sleep(2)
    video_file = get_file(video_file.name)

if video_file.state.name == "FAILED":
    raise ValueError(video_file.state.name)

agent = Agent(
    input=[
        GenericArtifact(video_file),
        TextArtifact("Answer this question regarding the video: {{ args[0] }}"),
    ]
)

agent.run("Are there any scenes that show a character with earings?")
agent.run("What happens in the scene starting at 19 seconds?")
[07/15/24 11:27:17] INFO     PromptTask 765a4a2833a34084b0077fb49c948ace
                             Input: genai.File({
                                 'name': 'files/zzqllezrzmz7',
                                 'display_name': 'griptape-comfyui.mp4',
                                 'mime_type': 'video/mp4',
                                 'sha256_hash': 'ODk5ZDIxMzQwMGZjYTJkNWU3OTY3YjgzZmUxNzg1ZTNmYzc2YTAxMzgxMWIzYWQyMTBjNzM4ODc5MjU1ZmFmNQ==',
                                 'size_bytes': '4667824',
                                 'state': 'ACTIVE',
                                 'uri': 'https://generativelanguage.googleapis.com/v1beta/files/zzqllezrzmz7',
                                 'video_metadata': {'video_duration': '36s'},
                                 'create_time': '2024-07-15T18:27:14.692475Z',
                                 'expiration_time': '2024-07-17T18:27:14.625351853Z',
                                 'update_time': '2024-07-15T18:27:16.179456Z'})

                             Are there any scenes that show a character with earings?
[07/15/24 11:27:21] INFO     PromptTask 765a4a2833a34084b0077fb49c948ace
                             Output: Yes, there are a few scenes that show characters with earrings:

                             * **0:10-0:15:** A woman with short, dark hair and facial tattoos is shown wearing large hoop earrings.
                             * **0:23:** A woman with short, dark hair and facial tattoos is shown wearing large hoop earrings.

                             Let me know if you have any other questions about the video!

                    INFO     PromptTask 765a4a2833a34084b0077fb49c948ace
                             Input: genai.File({
                                 'name': 'files/zzqllezrzmz7',
                                 'display_name': 'griptape-comfyui.mp4',
                                 'mime_type': 'video/mp4',
                                 'sha256_hash': 'ODk5ZDIxMzQwMGZjYTJkNWU3OTY3YjgzZmUxNzg1ZTNmYzc2YTAxMzgxMWIzYWQyMTBjNzM4ODc5MjU1ZmFmNQ==',
                                 'size_bytes': '4667824',
                                 'state': 'ACTIVE',
                                 'uri': 'https://generativelanguage.googleapis.com/v1beta/files/zzqllezrzmz7',
                                 'video_metadata': {'video_duration': '36s'},
                                 'create_time': '2024-07-15T18:27:14.692475Z',
                                 'expiration_time': '2024-07-17T18:27:14.625351853Z',
                                 'update_time': '2024-07-15T18:27:16.179456Z'})

                             What happens in the scene starting at 19 seconds?
[07/15/24 11:27:26] INFO     PromptTask 765a4a2833a34084b0077fb49c948ace
                             Output: At 19 seconds, a futuristic, four-legged robotic vehicle descends from the sky in a misty forest. The vehicle resembles a mechanical
                             spider or insect, with a rounded central body and powerful-looking legs. It hovers slightly above the ground, emitting two bright beams of
                             light from its underside. The scene has a slightly eerie and mysterious atmosphere, with the fog and bare trees adding to the sense of
                             otherworldliness.