Overview

This page explains how to work with Datasets using the Basalt Python SDK. Datasets let you organize, retrieve, and extend structured test data (inputs, ideal outputs, metadata) to evaluate prompts, models, and full workflows.

Initialization

Create a single Basalt client and reuse it across your application or script.

from basalt import Basalt

basalt = Basalt(api_key="your-api-key")

When you are done (for example in a CLI script or worker), call:

basalt.shutdown()

to clean up resources.

Listing datasets

Retrieve all datasets accessible to your API key.

Basic listing (sync)

from basalt import Basalt

basalt = Basalt(api_key="your-api-key")

# List all datasets
response = basalt.datasets.list_sync()

print(f"Total datasets: {response.total}")

for dataset in response.datasets:
    print(f"\nSlug: {dataset.slug}")
    print(f"  Name: {dataset.name}")
    print(f"  Description: {dataset.description}")
    print(f"  Rows: {dataset.num_rows}")
    print(f"  Columns: {len(dataset.columns)}")

basalt.shutdown()

Async listing

import asyncio
from basalt import Basalt

async def list_datasets_async():
    basalt = Basalt(api_key="your-api-key")

    response = await basalt.datasets.list()

    for dataset in response.datasets:
        print(f"{dataset.slug}: {dataset.num_rows} rows")

    basalt.shutdown()

asyncio.run(list_datasets_async())

Use the sync version in scripts and simple backends; prefer the async method list in async frameworks like FastAPI.

Getting datasets

Retrieve a specific dataset with all its rows and columns.

from basalt import Basalt

basalt = Basalt(api_key="your-api-key")

# Get dataset by slug
dataset = basalt.datasets.get_sync(slug="customer-support-qa")

print(f"Dataset: {dataset.name}")
print(f"Description: {dataset.description}")
print(f"Total rows: {dataset.num_rows}")
print(f"Columns: {[col.name for col in dataset.columns]}")

# Access rows
for row in dataset.rows:
    print(f"\nRow: {row.name}")
    print(f"  Values: {row.values}")
    print(f"  Ideal output: {row.ideal_output}")
    print(f"  Metadata: {row.metadata}")

basalt.shutdown()

Inspecting columns

from basalt import Basalt

basalt = Basalt(api_key="your-api-key")

dataset = basalt.datasets.get_sync(slug="my-dataset")

for column in dataset.columns:
    print(f"\nColumn: {column.name}")
    print(f"  Type: {column.type}")
    print(f"  Description: {column.description}")

basalt.shutdown()

Dataset objects

The main objects you work with are:

Dataset
DatasetColumn
DatasetRow

Dataset

slug: Unique identifier for the dataset
name: Human-readable name
description: Description of what the dataset is for
num_rows: Number of rows
columns: List of DatasetColumn objects
rows: List of DatasetRow objects

DatasetColumn

name: Column name
type: Data type (e.g. "string", "number")
description: Column description

DatasetRow

name: Row identifier
values: Dict of column_name -> value. Values can be strings, numbers, or FileAttachment objects.
ideal_output: Optional expected output for evaluation
metadata: Optional dict with additional context

FileAttachment

source: Path to the file to upload
content_type: MIME type of the file

See the API Reference for method signatures and all available fields.

Start Here

Prompts

Datasets

Experiments

Observability

Learn more

Overview

Overview

Initialization

Listing datasets

Basic listing (sync)

Async listing

Getting datasets

Inspecting columns

Dataset objects

Dataset

DatasetColumn

DatasetRow

FileAttachment

Start Here

Prompts

Datasets

Experiments

Observability

Learn more

​Overview

​Initialization

​Listing datasets

​Basic listing (sync)

​Async listing

​Getting datasets

​Inspecting columns

​Dataset objects

​Dataset

​DatasetColumn

​DatasetRow

​FileAttachment

Overview

Initialization

Listing datasets

Basic listing (sync)

Async listing

Getting datasets

Inspecting columns

Dataset objects

Dataset

DatasetColumn

DatasetRow

FileAttachment