Talk to a Document

Some LLM providers, such as Anthropic and Amazon Bedrock, offer the ability to pass documents directly to the LLM.

In this example, we pass a PDF document to the Agent using Anthropic's document message content format. The Agent then uses the document to answer questions about the paper.

We use Task hooks to add and remove a log filter to truncate the logs before printing the large document content.

import base64
import logging

import requests

from griptape.artifacts import GenericArtifact, TextArtifact
from griptape.configs import Defaults
from griptape.configs.logging import TruncateLoggingFilter
from griptape.drivers import AnthropicPromptDriver
from griptape.structures import Agent
from griptape.tasks.base_task import BaseTask
from griptape.tasks.prompt_task import PromptTask

# Truncate logs to 100 characters to avoid printing the entire document
truncate_log_filter = TruncateLoggingFilter(max_log_length=100)


def on_before_run(_: BaseTask) -> None:
    logging.getLogger(Defaults.logging_config.logger_name).addFilter(truncate_log_filter)


def on_after_run(_: BaseTask) -> None:
    logging.getLogger(Defaults.logging_config.logger_name).removeFilter(truncate_log_filter)


doc_bytes = requests.get("https://arxiv.org/pdf/1706.03762.pdf").content

agent = Agent(
    tasks=[
        PromptTask(
            prompt_driver=AnthropicPromptDriver(model="claude-3-5-sonnet-20240620", max_attempts=0),
            on_before_run=on_before_run,
            on_after_run=on_after_run,
            input=[
                GenericArtifact(
                    {
                        "type": "document",
                        "source": {
                            "type": "base64",
                            "media_type": "application/pdf",
                            "data": base64.b64encode(doc_bytes).decode("utf-8"),
                        },
                    }
                ),
                TextArtifact("{{ args[0] }}"),
            ],
        )
    ],
)

agent.run("What is the title and who are the authors of this paper?")
[12/23/24 09:37:47] INFO     PromptTask cc77e4c193c84a5986a4e02e56614d6b
                             Input: Document: application/pdf

                             What is the title and who are the authors of this paper?
[12/23/24 09:37:57] INFO     PromptTask cc77e4c193c84a5986a4e02e56614d6b
                             Output: The title of this paper is "Attention Is All You Need" and the authors are:

                             Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ɓukasz Kaiser, and Illia
                             Polosukhin.

                             The paper is from Google Brain, Google Research, and the University of Toronto. It introduces the Transformer model
                             architecture for sequence transduction tasks like machine translation.