Skip to content

Extraction Engines

Overview

Extraction Engines in Griptape facilitate the extraction of data from text formats such as CSV and JSON. These engines play a crucial role in the functionality of Extraction Tasks. As of now, Griptape supports two types of Extraction Engines: the CSV Extraction Engine and the JSON Extraction Engine.

CSV Extraction Engine

The CSV Extraction Engine is designed specifically for extracting data from CSV-formatted content.

Info

The CSV Extraction Engine requires the column_names parameter for specifying the columns to be extracted.

from griptape.engines import CsvExtractionEngine
from griptape.artifacts import ListArtifact, ErrorArtifact

# Initialize the CsvExtractionEngine instance
csv_engine = CsvExtractionEngine()

# Given a sample text which contains CSV content
sample_text = """
name, age, location
Alice, 28, New York
Bob, 35, California
Charlie, 40, Texas
"""

# Extract CSV rows using the engine
result = csv_engine.extract(sample_text, column_names=["name", "age", "location"])

# Check and display the result
if isinstance(result, ListArtifact):
    for row in result.value:
        print(row.value)  # This will print each row in the CSV content
elif isinstance(result, ErrorArtifact):
    print(f"Error: {result.value}")  # Print error if there's any issue in extraction
{'name': 'name, age, location'}
{'name': 'Alice, 28, New York'}
{'name': 'Bob, 35, California'}
{'name': 'Charlie, 40, Texas'}

JSON Extraction Engine

The JSON Extraction Engine is tailored for extracting data from JSON-formatted content.

Info

The JSON Extraction Engine requires the template_schema parameter for specifying the structure to be extracted.

from griptape.engines import JsonExtractionEngine
from griptape.artifacts import ListArtifact, ErrorArtifact
from schema import Schema

json_engine = JsonExtractionEngine()

# Given a sample JSON text
sample_json_text = """
{
  "users": [
    {
      "name": "Alice",
      "age": 28,
      "location": "New York"
    },
    {
      "name": "Bob",
      "age": 35,
      "location": "California"
    }
  ]
}
"""

# Define a schema for extraction
user_schema = Schema(
    {"users": [{"name": str, "age": int, "location": str}]}
).json_schema("UserSchema")

# Extract data using the engine
result = json_engine.extract(sample_json_text, template_schema=user_schema)

# Check and display the result
if isinstance(result, ListArtifact):
    for artifact in result.value:
        print(artifact.value)  # This will print each extracted artifact
elif isinstance(result, ErrorArtifact):
    print(f"Error: {result.value}")  # Print error if there's any issue in extraction
{'name': 'Alice', 'age': 28, 'location': 'New York'}
{'name': 'Bob', 'age': 35, 'location': 'California'}