Skip to content


The Image Query Engine is used to execute natural language queries on the contents of images. You can specify the provider and model used to query the image by providing the Engine with a particular Image Query Driver.

All Image Query Drivers default to a max_tokens of 256. You can tune this value based on your use case and the Image Query Driver you are providing.

from griptape.drivers import OpenAiVisionImageQueryDriver
from griptape.engines import ImageQueryEngine
from griptape.loaders import ImageLoader

driver = OpenAiVisionImageQueryDriver(

engine = ImageQueryEngine(

with open("tests/resources/mountain.png", "rb") as f:
    image_artifact = ImageLoader().load("Describe the weather in the image", [image_artifact])