Documentation Index
Fetch the complete documentation index at: https://docs.getbasalt.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This page explains how to work with Datasets using the Basalt Python SDK.
Datasets let you organize, retrieve, and extend structured test data (inputs, ideal outputs, metadata) to evaluate prompts, models, and full workflows.
Initialization
Create a single Basalt client and reuse it across your application or script.
from basalt import Basalt
basalt = Basalt(api_key="your-api-key")
When you are done (for example in a CLI script or worker), call:
to clean up resources.
Listing datasets
Retrieve all datasets accessible to your API key.
Basic listing (sync)
from basalt import Basalt
basalt = Basalt(api_key="your-api-key")
# List all datasets
response = basalt.datasets.list_sync()
print(f"Total datasets: {response.total}")
for dataset in response.datasets:
print(f"\nSlug: {dataset.slug}")
print(f" Name: {dataset.name}")
print(f" Description: {dataset.description}")
print(f" Rows: {dataset.num_rows}")
print(f" Columns: {len(dataset.columns)}")
basalt.shutdown()
Async listing
import asyncio
from basalt import Basalt
async def list_datasets_async():
basalt = Basalt(api_key="your-api-key")
response = await basalt.datasets.list()
for dataset in response.datasets:
print(f"{dataset.slug}: {dataset.num_rows} rows")
basalt.shutdown()
asyncio.run(list_datasets_async())
Use the sync version in scripts and simple backends; prefer the async method list in async frameworks like FastAPI.
Getting datasets
Retrieve a specific dataset with all its rows and columns.
from basalt import Basalt
basalt = Basalt(api_key="your-api-key")
# Get dataset by slug
dataset = basalt.datasets.get_sync(slug="customer-support-qa")
print(f"Dataset: {dataset.name}")
print(f"Description: {dataset.description}")
print(f"Total rows: {dataset.num_rows}")
print(f"Columns: {[col.name for col in dataset.columns]}")
# Access rows
for row in dataset.rows:
print(f"\nRow: {row.name}")
print(f" Values: {row.values}")
print(f" Ideal output: {row.ideal_output}")
print(f" Metadata: {row.metadata}")
basalt.shutdown()
Inspecting columns
from basalt import Basalt
basalt = Basalt(api_key="your-api-key")
dataset = basalt.datasets.get_sync(slug="my-dataset")
for column in dataset.columns:
print(f"\nColumn: {column.name}")
print(f" Type: {column.type}")
print(f" Description: {column.description}")
basalt.shutdown()
Dataset objects
The main objects you work with are:
Dataset
DatasetColumn
DatasetRow
Dataset
slug: Unique identifier for the dataset
name: Human-readable name
description: Description of what the dataset is for
num_rows: Number of rows
columns: List of DatasetColumn objects
rows: List of DatasetRow objects
DatasetColumn
name: Column name
type: Data type (e.g. "string", "number")
description: Column description
DatasetRow
name: Row identifier
values: Dict of column_name -> value. Values can be strings, numbers, or FileAttachment objects.
ideal_output: Optional expected output for evaluation
metadata: Optional dict with additional context
FileAttachment
source: Path to the file to upload
content_type: MIME type of the file
See the API Reference for method signatures and all available fields.