> ## Documentation Index
> Fetch the complete documentation index at: https://docs.getbasalt.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

# Overview

This page explains how to work with **Datasets** using the Basalt **Python SDK**.

Datasets let you organize, retrieve, and extend structured test data (inputs, ideal outputs, metadata) to evaluate prompts, models, and full workflows.

## Initialization

Create a single `Basalt` client and reuse it across your application or script.

```python theme={null}
from basalt import Basalt

basalt = Basalt(api_key="your-api-key")
```

When you are done (for example in a CLI script or worker), call:

```python theme={null}
basalt.shutdown()
```

to clean up resources.

## Listing datasets

Retrieve all datasets accessible to your API key.

### Basic listing (sync)

```python theme={null}
from basalt import Basalt

basalt = Basalt(api_key="your-api-key")

# List all datasets
response = basalt.datasets.list_sync()

print(f"Total datasets: {response.total}")

for dataset in response.datasets:
    print(f"\nSlug: {dataset.slug}")
    print(f"  Name: {dataset.name}")
    print(f"  Description: {dataset.description}")
    print(f"  Rows: {dataset.num_rows}")
    print(f"  Columns: {len(dataset.columns)}")

basalt.shutdown()
```

### Async listing

```python theme={null}
import asyncio
from basalt import Basalt

async def list_datasets_async():
    basalt = Basalt(api_key="your-api-key")

    response = await basalt.datasets.list()

    for dataset in response.datasets:
        print(f"{dataset.slug}: {dataset.num_rows} rows")

    basalt.shutdown()

asyncio.run(list_datasets_async())
```

Use the sync version in scripts and simple backends; prefer the async method `list` in async frameworks like FastAPI.

## Getting datasets

Retrieve a specific dataset with all its rows and columns.

```python theme={null}
from basalt import Basalt

basalt = Basalt(api_key="your-api-key")

# Get dataset by slug
dataset = basalt.datasets.get_sync(slug="customer-support-qa")

print(f"Dataset: {dataset.name}")
print(f"Description: {dataset.description}")
print(f"Total rows: {dataset.num_rows}")
print(f"Columns: {[col.name for col in dataset.columns]}")

# Access rows
for row in dataset.rows:
    print(f"\nRow: {row.name}")
    print(f"  Values: {row.values}")
    print(f"  Ideal output: {row.ideal_output}")
    print(f"  Metadata: {row.metadata}")

basalt.shutdown()
```

### Inspecting columns

```python theme={null}
from basalt import Basalt

basalt = Basalt(api_key="your-api-key")

dataset = basalt.datasets.get_sync(slug="my-dataset")

for column in dataset.columns:
    print(f"\nColumn: {column.name}")
    print(f"  Type: {column.type}")
    print(f"  Description: {column.description}")

basalt.shutdown()
```

## Dataset objects

The main objects you work with are:

* `Dataset`
* `DatasetColumn`
* `DatasetRow`

### Dataset

* `slug`: Unique identifier for the dataset
* `name`: Human-readable name
* `description`: Description of what the dataset is for
* `num_rows`: Number of rows
* `columns`: List of `DatasetColumn` objects
* `rows`: List of `DatasetRow` objects

### DatasetColumn

* `name`: Column name
* `type`: Data type (e.g. `"string"`, `"number"`)
* `description`: Column description

### DatasetRow

* `name`: Row identifier
* `values`: Dict of `column_name -> value`. Values can be strings, numbers, or `FileAttachment` objects.
* `ideal_output`: Optional expected output for evaluation
* `metadata`: Optional dict with additional context

### FileAttachment

* `source`: Path to the file to upload
* `content_type`: MIME type of the file

See the API Reference for method signatures and all available fields.
