Image Generation Drivers

Overview

Image Generation Drivers are used by image generation Engines to build and execute API calls to image generation models.

Provide a Driver when building an Engine, then pass it to a Tool for use by an Agent:

from griptape.structures import Agent
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import OpenAiImageGenerationDriver
from griptape.tools import PromptImageGenerationClient

driver = OpenAiImageGenerationDriver(
    model="dall-e-2",
)

engine = PromptImageGenerationEngine(image_generation_driver=driver)

agent = Agent(tools=[
    PromptImageGenerationClient(engine=engine),
])

agent.run("Generate a watercolor painting of a dog riding a skateboard")

Amazon Bedrock

The Amazon Bedrock Image Generation Driver provides multi-model access to image generation models hosted by Amazon Bedrock. This Driver manages API calls to the Bedrock API, while the specific Model Drivers below format the API requests and parse the responses.

Bedrock Stable Diffusion Model Driver

The Bedrock Stable Diffusion Model Driver provides support for Stable Diffusion models hosted by Amazon Bedrock. This Model Driver supports configurations specific to Stable Diffusion, like style presets, clip guidance presets, and sampler.

This Model Driver supports negative prompts. When provided (for example, when used with an image generation Engine configured with Negative Rulesets), the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.

from griptape.structures import Agent
from griptape.tools import PromptImageGenerationClient
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import AmazonBedrockImageGenerationDriver, \
    BedrockStableDiffusionImageGenerationModelDriver

model_driver = BedrockStableDiffusionImageGenerationModelDriver(
    style_preset="pixel-art",
)

driver = AmazonBedrockImageGenerationDriver(
    image_generation_model_driver=model_driver,
    model="stability.stable-diffusion-xl-v0",
)

engine = PromptImageGenerationEngine(image_generation_driver=driver)

agent = Agent(tools=[
    PromptImageGenerationClient(engine=engine),
])

agent.run("Generate an image of a dog riding a skateboard")

Bedrock Titan Image Generator Model Driver

The Bedrock Titan Image Generator Model Driver provides support for Titan Image Generator models hosted by Amazon Bedrock. This Model Driver supports configurations specific to Titan Image Generator, like quality, seed, and cfg_scale.

This Model Driver supports negative prompts. When provided (for example, when used with an image generation engine configured with Negative Rulesets), the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.

from griptape.structures import Agent
from griptape.tools import PromptImageGenerationClient
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import AmazonBedrockImageGenerationDriver, \
    BedrockTitanImageGenerationModelDriver

model_driver = BedrockTitanImageGenerationModelDriver()

driver = AmazonBedrockImageGenerationDriver(
    image_generation_model_driver=model_driver,
    model="amazon.titan-image-generator-v1",
)

engine = PromptImageGenerationEngine(image_generation_driver=driver)

agent = Agent(tools=[
    PromptImageGenerationClient(engine=engine),
])

agent.run("Generate a watercolor painting of a dog riding a skateboard")

Azure OpenAI

The Azure OpenAI Image Generation Driver provides access to OpenAI models hosted by Azure. In addition to the configurations provided by the underlying OpenAI Driver, the Azure OpenAI Driver allows configuration of Azure-specific deployment values.

import os

from griptape.structures import Agent
from griptape.tools import PromptImageGenerationClient
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import AzureOpenAiImageGenerationDriver

driver = AzureOpenAiImageGenerationDriver(
    model="dall-e-3",
    azure_deployment=os.environ["AZURE_OPENAI_DALL_E_3_DEPLOYMENT_ID"],
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT_2"],
    api_key=os.environ["AZURE_OPENAI_API_KEY_2"],
)

engine = PromptImageGenerationEngine(image_generation_driver=driver)

agent = Agent(tools=[
    PromptImageGenerationClient(engine=engine),
])

agent.run("Generate a watercolor painting of a dog riding a skateboard")

Leonardo.Ai

The Leonardo Image Generation Driver enables image generation using models hosted by Leonardo.ai.

This Driver supports configurations like model selection, image size, specifying a generation seed, and generation steps. For details on supported configuration parameters, see Leonardo.Ai's image generation documentation.

This Driver supports negative prompts. When provided (for example, when used with an image generation engine configured with Negative Rulesets), the image generation request will include negatively-weighted prompts describing features or characteristics to avoid in the resulting generation.

import os

from griptape.structures import Agent
from griptape.tools import PromptImageGenerationClient
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import LeonardoImageGenerationDriver

driver = LeonardoImageGenerationDriver(
    model=os.environ["LEONARDO_MODEL_ID"],
    api_key=os.environ["LEONARDO_API_KEY"],
    image_width=512,
    image_height=1024,
)

engine = PromptImageGenerationEngine(image_generation_driver=driver)

agent = Agent(tools=[
    PromptImageGenerationClient(engine=engine),
])

agent.run("Generate a watercolor painting of a dog riding a skateboard")

OpenAI

The OpenAI Image Generation Driver provides access to OpenAI image generation models. Like other OpenAI Drivers, the image generation Driver will implicitly load an API key in the OPENAI_API_KEY environment variable if one is not explicitly provided.

This Driver supports image generation configurations like style presets, image quality preference, and image size. For details on supported configuration values, see the OpenAI documentation.

from griptape.structures import Agent
from griptape.tools import PromptImageGenerationClient
from griptape.engines import PromptImageGenerationEngine
from griptape.drivers import OpenAiImageGenerationDriver

driver = OpenAiImageGenerationDriver(
    model="dall-e-2",
    image_size="512x512",
)

engine = PromptImageGenerationEngine(image_generation_driver=driver)

agent = Agent(tools=[
    PromptImageGenerationClient(engine=engine),
])

agent.run("Generate a watercolor painting of a dog riding a skateboard")