Example Dataset#
CaliBrain workflows require local forward-model and leadfield files before paper-scale simulations can be run. This page describes the example dataset layout expected by the current codebase and how to point CaliBrain to that data.
Unlike MNE-Python’s public dataset registry, CaliBrain does not currently ship a
download helper that fetches data automatically. Dataset access is local and
explicit: users configure a data root with CALIBRAIN_DATA or pass paths in
the workflow configuration files.
Data root#
CaliBrain resolves the default data root with calibrain.utils.get_data_path.
The lookup order is:
the
CALIBRAIN_DATAenvironment variable, if set;the repository-level
datadirectory, otherwise.
For a local installation, set:
export CALIBRAIN_DATA=/path/to/calibrain/data
The data root is expected to contain precomputed forward solutions and leadfield matrices. These files are intentionally kept outside version control because they are large and site-specific.
Available local example data#
The current workflow configurations assume an example source space with 1284 sources and multiple subjects. A typical local data directory contains:
Path |
Contents |
Used by |
|---|---|---|
|
MNE forward solutions such as |
Calibration EMD and source-coordinate lookup. |
|
Reduced leadfield NPZ files such as
|
Data-generation workflow and inverse solvers. |
|
Full or intermediate forward-solution files. |
Leadfield extraction and reduction utilities. |
|
Alternative fixed/free leadfield NPZ layout. |
Legacy or exploratory scripts. |
Subject identifiers#
The default local example dataset commonly uses:
CC120166
CC120264
CC120309
CC120313
fsaverage
The exact subject list is controlled by the workflow configs. If a config requests a subject whose forward or leadfield file is missing, data generation or calibration will fail with a file-not-found error.
Minimal check#
Use this short check before running data generation:
from calibrain.utils import get_data_path
data_root = get_data_path()
print(data_root)
print(sorted((data_root / "1284src_leadfield").glob("*_fixed_leadfield.npz")))
For a configured example dataset, the second line should print at least one fixed-orientation leadfield file.
Workflow usage#
The data-generation workflow reads leadfields from the configured
leadfield_dir:
CONFIG = {
"leadfield_dir": "/path/to/calibrain/data/1284src_leadfield",
"manifest_path": "/path/to/results/run_manifest/fixed.csv",
}
The calibration workflow uses forward solutions to recover source coordinates when source-space EMD is requested:
CALIBRAIN_DATA/1284src_fwd/<subject>-fwd.fif
Storage policy#
Large forward solutions, leadfields, generated posterior summaries, aggregated
NPZ files, and calibration results should remain outside git. The repository
.gitignore excludes local data/ and results/ directories by default.
Relationship to generated artifacts#
The example dataset is an input to the workflow. It is distinct from generated outputs:
Artifact |
Created by |
Purpose |
|---|---|---|
Forward solutions and leadfields |
Prepared before running CaliBrain workflows. |
Inputs for simulation, inverse estimation, and spatial metrics. |
Posterior H5 summaries |
|
Raw per-run solver output. |
Manifest CSV |
|
Auditable index of generated posterior summaries. |
Aggregated NPZ datasets |
|
Calibration-ready reduced uncertainty representation. |
Calibration JSON records |
|
Pre/post calibration curves, metrics, and metadata. |