Eval Engine
Overview
The Eval Engine is used to evaluate the performance of an LLM's output against a given input. The engine returns a score between 0 and 1, along with a reason for the score.
Eval Engines require either criteria or evaluation_steps to be set.
If criteria
is set, Griptape will generate evaluation_steps
for you. This is useful for getting started, but you may to explicitly set evaluation_steps
for more complex evaluations.
Either criteria
or evaluation_steps
must be set, but not both.
import json
from griptape.engines import EvalEngine
engine = EvalEngine(
criteria="Determine whether the actual output is factually correct based on the expected output.",
)
score, reason = engine.evaluate(
input="If you have a red house made of red bricks, a blue house made of blue bricks, what is a greenhouse made of?",
expected_output="Glass",
actual_output="Glass",
)
print("Eval Steps", json.dumps(engine.evaluation_steps, indent=2))
print(f"Score: {score}")
print(f"Reason: {reason}")
Eval Steps [
"Compare the actual output to the expected output to identify any discrepancies.",
"Verify the factual accuracy of the actual output by cross-referencing with the expected output.",
"Assess whether the actual output meets the criteria outlined in the expected output.",
"Determine if any information in the actual output contradicts the expected output."
]
Score: 1.0
Reason: The actual output 'Glass' matches the expected output 'Glass', with no discrepancies or contradictions.