Text to Speech
Overview
Text to Speech Drivers are used to build and execute API calls to audio generation models.
Provide a Driver to a Tool for use by an Agent:
Text to Speech Drivers
Eleven Labs
The Eleven Labs Text to Speech Driver provides support for text-to-speech models hosted by Eleven Labs. This Driver supports configurations specific to Eleven Labs, like voice selection and output format.
Info
This driver requires the drivers-text-to-speech-elevenlabs
extra.
import os
from griptape.drivers.text_to_speech.elevenlabs import ElevenLabsTextToSpeechDriver
from griptape.structures import Agent
from griptape.tools.text_to_speech.tool import TextToSpeechTool
driver = ElevenLabsTextToSpeechDriver(
api_key=os.environ["ELEVEN_LABS_API_KEY"],
model="eleven_multilingual_v2",
voice="Matilda",
)
tool = TextToSpeechTool(
text_to_speech_driver=driver,
)
Agent(tools=[tool]).run("Generate audio from this text: 'Hello, world!'")
[02/27/25 20:23:06] INFO PromptTask cbd101c5ddf242c4a2526b217ccdc1aa
Input: Generate audio from this text: 'Hello,
world!'
[02/27/25 20:23:08] INFO Subtask e2e67690e1844462b0b202eed0bec355
Actions: [
{
"tag": "call_AClrmPrfklt1V6Tj8xslZBOO",
"name": "TextToSpeechTool",
"path": "text_to_speech",
"input": {
"values": {
"text": "Hello, world!"
}
}
}
]
[02/27/25 20:23:10] INFO Subtask e2e67690e1844462b0b202eed0bec355
Response: Audio, format: mp3, size: 19226 bytes
[02/27/25 20:23:11] INFO PromptTask cbd101c5ddf242c4a2526b217ccdc1aa
Output: The audio for the text "Hello, world!" has
been generated successfully.
OpenAI
The OpenAI Text to Speech Driver provides support for text-to-speech models hosted by OpenAI. This Driver supports configurations specific to OpenAI, like voice selection and output format.
from griptape.drivers.text_to_speech.openai import OpenAiTextToSpeechDriver
from griptape.structures import Agent
from griptape.tools.text_to_speech.tool import TextToSpeechTool
driver = OpenAiTextToSpeechDriver()
tool = TextToSpeechTool(
text_to_speech_driver=driver,
)
Agent(tools=[tool]).run("Generate audio from this text: 'Hello, world!'")
[02/27/25 20:26:31] INFO PromptTask 318fd0d80db749e0ac2e23a1e6be94f8
Input: Generate audio from this text: 'Hello,
world!'
[02/27/25 20:26:33] INFO Subtask e53fe18c447e4325b0108ce8803ced58
Actions: [
{
"tag": "call_nkTaeYE0DDVp7wpJMVqLWnSZ",
"name": "TextToSpeechTool",
"path": "text_to_speech",
"input": {
"values": {
"text": "Hello, world!"
}
}
}
]
[02/27/25 20:26:35] INFO Subtask e53fe18c447e4325b0108ce8803ced58
Response: Audio, format: mp3, size: 15360 bytes
INFO PromptTask 318fd0d80db749e0ac2e23a1e6be94f8
Output: The audio for the text "Hello, world!" has
been generated successfully.
Azure OpenAI
The Azure OpenAI Text to Speech Driver provides support for text-to-speech models hosted in your Azure OpenAI instance. This Driver supports configurations specific to OpenAI, like voice selection and output format.
import os
from griptape.drivers.text_to_speech.openai import AzureOpenAiTextToSpeechDriver
from griptape.structures import Agent
from griptape.tools.text_to_speech.tool import TextToSpeechTool
driver = AzureOpenAiTextToSpeechDriver(
api_key=os.environ["AZURE_OPENAI_API_KEY_4"],
model="tts",
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT_4"],
)
tool = TextToSpeechTool(
text_to_speech_driver=driver,
)
Agent(tools=[tool]).run("Generate audio from this text: 'Hello, world!'")
[02/27/25 20:25:20] INFO PromptTask 23d5174bdca740aba7003927df6825be
Input: Generate audio from this text: 'Hello,
world!'
[02/27/25 20:25:22] INFO Subtask f770f02a6d5d4148b658d4adc5939ce3
Actions: [
{
"tag": "call_UApwJoyADFfmWc76D7F1wjVJ",
"name": "TextToSpeechTool",
"path": "text_to_speech",
"input": {
"values": {
"text": "Hello, world!"
}
}
}
]
[02/27/25 20:25:23] INFO Subtask f770f02a6d5d4148b658d4adc5939ce3
Response: Audio, format: mp3, size: 14400 bytes
[02/27/25 20:25:24] INFO PromptTask 23d5174bdca740aba7003927df6825be
Output: The audio for the text "Hello, world!" has
been generated successfully.