Skip to content

Audio Engines

Overview

Audio Generation Engines facilitate audio generation. Audio Generation Engines provides a run method that accepts the necessary inputs for its particular mode and provides the request to the configured Driver.

Text to Speech

This Engine facilitates synthesizing speech from text inputs.

import os

from griptape.drivers import ElevenLabsTextToSpeechDriver
from griptape.engines import TextToSpeechEngine


driver = ElevenLabsTextToSpeechDriver(
    api_key=os.getenv("ELEVEN_LABS_API_KEY"),
    model="eleven_multilingual_v2",
    voice="Rachel",
)

engine = TextToSpeechEngine(
    text_to_speech_driver=driver,
)

engine.run(
    prompts=["Hello, world!"],
)

Audio Transcription

The Audio Transcription Engine facilitates transcribing speech from audio inputs.

from griptape.drivers import OpenAiAudioTranscriptionDriver
from griptape.engines import AudioTranscriptionEngine
from griptape.loaders import AudioLoader
from griptape.utils import load_file


driver = OpenAiAudioTranscriptionDriver(
    model="whisper-1"
)

engine = AudioTranscriptionEngine(
    audio_transcription_driver=driver,
)

audio_artifact = AudioLoader().load(load_file("tests/resources/sentences.wav"))
engine.run(audio_artifact)