Talk to a Document
Some LLM providers, such as Anthropic and Amazon Bedrock, offer the ability to pass documents directly to the LLM.
In this example, we pass a PDF document to the Agent using Anthropic's document message content format. The Agent then uses the document to answer questions about the paper.
We use Task hooks to add and remove a log filter to truncate the logs before printing the large document content.
import base64
import logging
import requests
from griptape.artifacts import GenericArtifact, TextArtifact
from griptape.configs import Defaults
from griptape.configs.logging import TruncateLoggingFilter
from griptape.drivers import AnthropicPromptDriver
from griptape.structures import Agent
from griptape.tasks.base_task import BaseTask
from griptape.tasks.prompt_task import PromptTask
# Truncate logs to 100 characters to avoid printing the entire document
truncate_log_filter = TruncateLoggingFilter(max_log_length=100)
def on_before_run(_: BaseTask) -> None:
logging.getLogger(Defaults.logging_config.logger_name).addFilter(truncate_log_filter)
def on_after_run(_: BaseTask) -> None:
logging.getLogger(Defaults.logging_config.logger_name).removeFilter(truncate_log_filter)
doc_bytes = requests.get("https://arxiv.org/pdf/1706.03762.pdf").content
agent = Agent(
tasks=[
PromptTask(
prompt_driver=AnthropicPromptDriver(model="claude-3-5-sonnet-20240620", max_attempts=0),
on_before_run=on_before_run,
on_after_run=on_after_run,
input=[
GenericArtifact(
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": base64.b64encode(doc_bytes).decode("utf-8"),
},
}
),
TextArtifact("{{ args[0] }}"),
],
)
],
)
agent.run("What is the title and who are the authors of this paper?")
[12/23/24 09:37:47] INFO PromptTask cc77e4c193c84a5986a4e02e56614d6b
Input: Document: application/pdf
What is the title and who are the authors of this paper?
[12/23/24 09:37:57] INFO PromptTask cc77e4c193c84a5986a4e02e56614d6b
Output: The title of this paper is "Attention Is All You Need" and the authors are:
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Ćukasz Kaiser, and Illia
Polosukhin.
The paper is from Google Brain, Google Research, and the University of Toronto. It introduces the Transformer model
architecture for sequence transduction tasks like machine translation.