Text to Speech Drivers
Overview
Text to Speech Drivers are used to build and execute API calls to audio generation models.
Provide a Driver to a Tool for use by an Agent:
Text to Speech Drivers
Eleven Labs
The Eleven Labs Text to Speech Driver provides support for text-to-speech models hosted by Eleven Labs. This Driver supports configurations specific to Eleven Labs, like voice selection and output format.
Info
This driver requires the drivers-text-to-speech-elevenlabs
extra.
import os
from griptape.drivers.text_to_speech.elevenlabs import ElevenLabsTextToSpeechDriver
from griptape.structures import Agent
from griptape.tools.text_to_speech.tool import TextToSpeechTool
driver = ElevenLabsTextToSpeechDriver(
api_key=os.environ["ELEVEN_LABS_API_KEY"],
model="eleven_multilingual_v2",
voice="Matilda",
)
tool = TextToSpeechTool(
text_to_speech_driver=driver,
)
Agent(tools=[tool]).run("Generate audio from this text: 'Hello, world!'")
OpenAI
The OpenAI Text to Speech Driver provides support for text-to-speech models hosted by OpenAI. This Driver supports configurations specific to OpenAI, like voice selection and output format.
from griptape.drivers.text_to_speech.openai import OpenAiTextToSpeechDriver
from griptape.structures import Agent
from griptape.tools.text_to_speech.tool import TextToSpeechTool
driver = OpenAiTextToSpeechDriver()
tool = TextToSpeechTool(
text_to_speech_driver=driver,
)
Agent(tools=[tool]).run("Generate audio from this text: 'Hello, world!'")
Azure OpenAI
The Azure OpenAI Text to Speech Driver provides support for text-to-speech models hosted in your Azure OpenAI instance. This Driver supports configurations specific to OpenAI, like voice selection and output format.
import os
from griptape.drivers.text_to_speech.openai import AzureOpenAiTextToSpeechDriver
from griptape.structures import Agent
from griptape.tools.text_to_speech.tool import TextToSpeechTool
driver = AzureOpenAiTextToSpeechDriver(
api_key=os.environ["AZURE_OPENAI_API_KEY_4"],
model="tts",
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT_4"],
)
tool = TextToSpeechTool(
text_to_speech_driver=driver,
)
Agent(tools=[tool]).run("Generate audio from this text: 'Hello, world!'")