Skip to content

Text to Speech Drivers

Overview

Text to Speech Drivers are used to build and execute API calls to audio generation models.

Provide a Driver to a Tool for use by an Agent:

Text to Speech Drivers

Eleven Labs

The Eleven Labs Text to Speech Driver provides support for text-to-speech models hosted by Eleven Labs. This Driver supports configurations specific to Eleven Labs, like voice selection and output format.

Info

This driver requires the drivers-text-to-speech-elevenlabs extra.

import os

from griptape.drivers.text_to_speech.elevenlabs import ElevenLabsTextToSpeechDriver
from griptape.structures import Agent
from griptape.tools.text_to_speech.tool import TextToSpeechTool

driver = ElevenLabsTextToSpeechDriver(
    api_key=os.environ["ELEVEN_LABS_API_KEY"],
    model="eleven_multilingual_v2",
    voice="Matilda",
)

tool = TextToSpeechTool(
    text_to_speech_driver=driver,
)

Agent(tools=[tool]).run("Generate audio from this text: 'Hello, world!'")

OpenAI

The OpenAI Text to Speech Driver provides support for text-to-speech models hosted by OpenAI. This Driver supports configurations specific to OpenAI, like voice selection and output format.

from griptape.drivers.text_to_speech.openai import OpenAiTextToSpeechDriver
from griptape.structures import Agent
from griptape.tools.text_to_speech.tool import TextToSpeechTool

driver = OpenAiTextToSpeechDriver()

tool = TextToSpeechTool(
    text_to_speech_driver=driver,
)

Agent(tools=[tool]).run("Generate audio from this text: 'Hello, world!'")

Azure OpenAI

The Azure OpenAI Text to Speech Driver provides support for text-to-speech models hosted in your Azure OpenAI instance. This Driver supports configurations specific to OpenAI, like voice selection and output format.

import os

from griptape.drivers.text_to_speech.openai import AzureOpenAiTextToSpeechDriver
from griptape.structures import Agent
from griptape.tools.text_to_speech.tool import TextToSpeechTool

driver = AzureOpenAiTextToSpeechDriver(
    api_key=os.environ["AZURE_OPENAI_API_KEY_4"],
    model="tts",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT_4"],
)

tool = TextToSpeechTool(
    text_to_speech_driver=driver,
)

Agent(tools=[tool]).run("Generate audio from this text: 'Hello, world!'")