Example Dataset#

CaliBrain workflows require local forward-model and leadfield files before paper-scale simulations can be run. This page describes the example dataset layout expected by the current codebase and how to point CaliBrain to that data.

Unlike MNE-Python’s public dataset registry, CaliBrain does not currently ship a download helper that fetches data automatically. Dataset access is local and explicit: users configure a data root with CALIBRAIN_DATA or pass paths in the workflow configuration files.

Data root#

CaliBrain resolves the default data root with calibrain.utils.get_data_path. The lookup order is:

the CALIBRAIN_DATA environment variable, if set;
the repository-level data directory, otherwise.

For a local installation, set:

export CALIBRAIN_DATA=/path/to/calibrain/data

The data root is expected to contain precomputed forward solutions and leadfield matrices. These files are intentionally kept outside version control because they are large and site-specific.

Available local example data#

The current workflow configurations assume an example source space with 1284 sources and multiple subjects. A typical local data directory contains:

Expected example dataset layout#
Path	Contents	Used by
`1284src_fwd/`	MNE forward solutions such as `CC120166-fwd.fif` and `fsaverage-fwd.fif`.	Calibration EMD and source-coordinate lookup.
`1284src_leadfield/`	Reduced leadfield NPZ files such as `CC120166_fixed_leadfield.npz` and `CC120166_free_leadfield.npz`.	Data-generation workflow and inverse solvers.
`fwd/`	Full or intermediate forward-solution files.	Leadfield extraction and reduction utilities.
`leadfield/`	Alternative fixed/free leadfield NPZ layout.	Legacy or exploratory scripts.

Subject identifiers#

The default local example dataset commonly uses:

CC120166
CC120264
CC120309
CC120313
fsaverage

The exact subject list is controlled by the workflow configs. If a config requests a subject whose forward or leadfield file is missing, data generation or calibration will fail with a file-not-found error.

Minimal check#

Use this short check before running data generation:

from calibrain.utils import get_data_path

data_root = get_data_path()
print(data_root)
print(sorted((data_root / "1284src_leadfield").glob("*_fixed_leadfield.npz")))

For a configured example dataset, the second line should print at least one fixed-orientation leadfield file.

Workflow usage#

The data-generation workflow reads leadfields from the configured leadfield_dir:

CONFIG = {
    "leadfield_dir": "/path/to/calibrain/data/1284src_leadfield",
    "manifest_path": "/path/to/results/run_manifest/fixed.csv",
}

The calibration workflow uses forward solutions to recover source coordinates when source-space EMD is requested:

CALIBRAIN_DATA/1284src_fwd/<subject>-fwd.fif

Storage policy#

Large forward solutions, leadfields, generated posterior summaries, aggregated NPZ files, and calibration results should remain outside git. The repository .gitignore excludes local data/ and results/ directories by default.

Relationship to generated artifacts#

The example dataset is an input to the workflow. It is distinct from generated outputs:

Artifact	Created by	Purpose
Forward solutions and leadfields	Prepared before running CaliBrain workflows.	Inputs for simulation, inverse estimation, and spatial metrics.
Posterior H5 summaries	`calibrain/workflows/data_generation.py`	Raw per-run solver output.
Manifest CSV	`calibrain/workflows/data_generation.py`	Auditable index of generated posterior summaries.
Aggregated NPZ datasets	`calibrain/workflows/aggregation.py`	Calibration-ready reduced uncertainty representation.
Calibration JSON records	`calibrain/workflows/calibration.py`	Pre/post calibration curves, metrics, and metadata.