.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_tutorials/09_metric_evaluation.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_tutorials_09_metric_evaluation.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_tutorials_09_metric_evaluation.py:


09. Metric Evaluation
=====================

This tutorial mainly explains the ``MetricEvaluator`` class. It demonstrates
the high-level ``MetricEvaluator`` API for quantitative evaluation of
uncertainty and calibration outputs.

It covers:

- calibration curves via ``MetricEvaluator.calibration_curve``;
- calibration summary metrics via ``MetricEvaluator.calibration_metrics_4``;
- combined error and uncertainty summaries via ``MetricEvaluator.evaluate_all``;
- fixed-orientation and free-orientation EEG examples.

.. GENERATED FROM PYTHON SOURCE LINES 20-37

Scientific motivation
---------------------

After source estimation and uncertainty estimation, CaliBrain needs a compact
way to answer three questions:

- how accurate are the reconstructed source signals?
- how large is the posterior uncertainty on average?
- how well do empirical coverages match nominal coverages?

``MetricEvaluator`` is the high-level class that summarizes these quantities.
It wraps ``UncertaintyEstimator`` and exposes workflow-facing evaluation
methods. This is intentionally more general than the named calibration modes
such as ``post_oracle``, ``post_pooled``, ``post_pooled_mismatch``, and
``post_fixed``: those modes define common recalibration workflows, whereas
evaluation can be performed in multiple ways once predictions and uncertainty
summaries are available.

.. GENERATED FROM PYTHON SOURCE LINES 37-54

.. code-block:: Python


    import matplotlib.pyplot as plt
    import numpy as np
    from mne.io.constants import FIFF

    from calibrain import (
        MetricEvaluator,
        SensorSimulator,
        SourceEstimator,
        SourceSimulator,
        UncertaintyEstimator,
        gamma_map_sflex,
    )


    RANDOM_SEED = 71


.. GENERATED FROM PYTHON SOURCE LINES 55-64

Build a lightweight evaluation fixture
--------------------------------------

The tutorial generates two small synthetic examples:

- one fixed-orientation example;
- one free-orientation EEG example.

Both use the active ``gamma_map_sflex`` solver and the same uncertainty grid.

.. GENERATED FROM PYTHON SOURCE LINES 64-114

.. code-block:: Python


    erp_config = {
        "tmin": -0.1,
        "tmax": 0.8,
        "stim_onset": 0.0,
        "sfreq": 100,
        "fmin": 2,
        "fmax": 8,
        "amplitude_distribution": {
            "median": 8.0,
            "sigma": 0.15,
            "clip": [2.0, 20.0],
        },
        "random_erp_timing": False,
        "erp_min_length": 20,
    }

    nominal_coverages = np.linspace(0.0, 1.0, 11)
    ue = UncertaintyEstimator(nominal_coverages=nominal_coverages)
    metric_evaluator = MetricEvaluator(ue)
    source_simulator = SourceSimulator(ERP_config=erp_config)
    sensor_simulator = SensorSimulator()
    times = np.arange(erp_config["tmin"], erp_config["tmax"], 1.0 / erp_config["sfreq"])

    rng = np.random.default_rng(RANDOM_SEED)
    n_sensors = 16
    n_sources = 32
    src_coords = rng.normal(scale=0.04, size=(n_sources, 3))

    leadfield_fixed = rng.normal(scale=0.03, size=(n_sensors, n_sources))
    leadfield_fixed /= np.maximum(
        np.linalg.norm(leadfield_fixed, axis=0, keepdims=True),
        np.finfo(float).eps,
    )
    leadfield_fixed *= 0.6

    leadfield_free_eeg = rng.normal(scale=0.015, size=(n_sensors, n_sources, 3))
    leadfield_free_eeg /= np.maximum(
        np.linalg.norm(leadfield_free_eeg, axis=0, keepdims=True),
        np.finfo(float).eps,
    )
    leadfield_free_eeg *= 0.4

    sensor_simulator.set_sensor_metadata(
        kind=FIFF.FIFFV_EEG_CH,
        units=FIFF.FIFF_UNIT_V,
        unitmult=FIFF.FIFF_UNITM_MU,
        coil_type=FIFF.FIFFV_COIL_EEG,
    )


.. GENERATED FROM PYTHON SOURCE LINES 115-122

Fixed-orientation evaluation
----------------------------

For fixed orientation, ``MetricEvaluator`` works with:

- ``x_true`` and ``x_hat`` of shape ``(N, T)``;
- uncertainty as either ``posterior_var`` or full ``posterior_cov``.

.. GENERATED FROM PYTHON SOURCE LINES 122-168

.. code-block:: Python


    x_true_fixed, active_fixed = source_simulator.simulate(
        n_sources=n_sources,
        nnz=4,
        orientation_type="fixed",
        seed=RANDOM_SEED,
    )

    y_fixed_clean, y_fixed_noisy, fixed_noise, fixed_eta = sensor_simulator.simulate(
        x=x_true_fixed,
        L=leadfield_fixed,
        alpha_SNR=0.7,
        sensor_white_noise_std=0.2,
        seed=RANDOM_SEED,
    )
    fixed_noise_var = float(np.var(fixed_noise))

    fixed_estimator = SourceEstimator(
        solver=gamma_map_sflex,
        solver_params={"max_iter": 150, "tol": 1e-7, "sigma": 0.01, "src_coords": src_coords},
        noise_var=fixed_noise_var,
        n_orient=1,
    )
    fixed_estimator.fit(leadfield_fixed, y_fixed_noisy)
    fixed_result = fixed_estimator.predict()

    fixed_curve = metric_evaluator.calibration_curve(
        x_true=x_true_fixed,
        x_hat=fixed_result["posterior_mean"],
        posterior_uncert=fixed_result["posterior_cov"],
        setting="fixed",
        mode="aggregated",
    )
    fixed_summary = metric_evaluator.evaluate_all(
        x_true=x_true_fixed,
        x_hat=fixed_result["posterior_mean"],
        posterior_uncert=fixed_result["posterior_cov"],
        setting="fixed",
        mode="aggregated",
    )

    print("fixed calibration curve keys:", sorted(fixed_curve.keys()))
    print("fixed calibration metrics:", fixed_curve["metrics_4"])
    print("fixed evaluate_all keys:", sorted(fixed_summary.keys()))
    print("fixed mean posterior std:", fixed_summary["mean_posterior_std"])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    fixed calibration curve keys: ['empirical', 'metrics_4', 'nominal']
    fixed calibration metrics: {'max_underconfidence_deviation': 0.0, 'max_overconfidence_deviation': 0.4, 'mean_absolute_deviation': 0.19318181818181815, 'mean_signed_deviation': 0.19318181818181815}
    fixed evaluate_all keys: ['calibration', 'mae', 'mean_posterior_std', 'mse', 'rmae', 'rmse']
    fixed mean posterior std: 0.019154597617136844


.. GENERATED FROM PYTHON SOURCE LINES 169-178

Free-orientation EEG evaluation
-------------------------------

For free-orientation EEG, ``MetricEvaluator`` supports two interval types:

- ``marginal``: pooled component-wise intervals;
- ``full_cov``: local 3D covariance blocks.

Both are evaluated below in aggregated mode.

.. GENERATED FROM PYTHON SOURCE LINES 178-235

.. code-block:: Python


    x_true_free, active_free = source_simulator.simulate(
        n_sources=n_sources,
        nnz=4,
        orientation_type="free",
        coil_type=FIFF.FIFFV_COIL_EEG,
        seed=RANDOM_SEED + 1,
    )

    y_free_clean, y_free_noisy, free_noise, free_eta = sensor_simulator.simulate(
        x=x_true_free,
        L=leadfield_free_eeg,
        alpha_SNR=0.7,
        sensor_white_noise_std=0.05,
        seed=RANDOM_SEED + 1,
    )
    free_noise_var = float(np.var(free_noise))

    free_estimator = SourceEstimator(
        solver=gamma_map_sflex,
        solver_params={"max_iter": 150, "tol": 1e-7, "sigma": 0.01, "src_coords": src_coords},
        noise_var=free_noise_var,
        n_orient=3,
    )
    free_estimator.fit(leadfield_free_eeg, y_free_noisy)
    free_result = free_estimator.predict()

    free_curve_marginal = metric_evaluator.calibration_curve(
        x_true=x_true_free,
        x_hat=free_result["posterior_mean_reshaped"],
        posterior_uncert=free_result["posterior_cov"],
        setting="eeg_free",
        mode="aggregated",
        free_interval_type="marginal",
    )
    free_curve_full_cov = metric_evaluator.calibration_curve(
        x_true=x_true_free,
        x_hat=free_result["posterior_mean_reshaped"],
        posterior_uncert=free_result["posterior_cov"],
        setting="eeg_free",
        mode="aggregated",
        free_interval_type="full_cov",
    )
    free_summary = metric_evaluator.evaluate_all(
        x_true=x_true_free,
        x_hat=free_result["posterior_mean_reshaped"],
        posterior_uncert=free_result["posterior_cov"],
        setting="eeg_free",
        mode="aggregated",
        free_interval_type="full_cov",
    )

    print("free marginal calibration metrics:", free_curve_marginal["metrics_4"])
    print("free full_cov calibration metrics:", free_curve_full_cov["metrics_4"])
    print("free evaluate_all mse:", free_summary["mse"])
    print("free evaluate_all mean posterior std:", free_summary["mean_posterior_std"])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    free marginal calibration metrics: {'max_underconfidence_deviation': 0.0, 'max_overconfidence_deviation': 0.74375, 'mean_absolute_deviation': 0.34753787878787873, 'mean_signed_deviation': 0.34753787878787873}
    free full_cov calibration metrics: {'max_underconfidence_deviation': 0.0, 'max_overconfidence_deviation': 0.74375, 'mean_absolute_deviation': 0.3352272727272727, 'mean_signed_deviation': 0.3352272727272727}
    free evaluate_all mse: 0.004249153282609895
    free evaluate_all mean posterior std: 0.0140126642270974


.. GENERATED FROM PYTHON SOURCE LINES 236-242

Compare calibration summary metrics directly
--------------------------------------------

``calibration_metrics_4`` can also be called directly on nominal and empirical
coverage arrays. This is useful when calibration curves are already available
and only the summary metrics need to be recomputed.

.. GENERATED FROM PYTHON SOURCE LINES 242-255

.. code-block:: Python


    fixed_metrics_direct = metric_evaluator.calibration_metrics_4(
        fixed_curve["nominal"],
        fixed_curve["empirical"],
    )
    free_metrics_direct = metric_evaluator.calibration_metrics_4(
        free_curve_full_cov["nominal"],
        free_curve_full_cov["empirical"],
    )

    print("fixed direct metrics:", fixed_metrics_direct)
    print("free direct metrics:", free_metrics_direct)


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    fixed direct metrics: {'max_underconfidence_deviation': 0.0, 'max_overconfidence_deviation': 0.4, 'mean_absolute_deviation': 0.19318181818181815, 'mean_signed_deviation': 0.19318181818181815}
    free direct metrics: {'max_underconfidence_deviation': 0.0, 'max_overconfidence_deviation': 0.74375, 'mean_absolute_deviation': 0.3352272727272727, 'mean_signed_deviation': 0.3352272727272727}


.. GENERATED FROM PYTHON SOURCE LINES 256-261

Plot calibration and evaluation summaries
-----------------------------------------

The first panel compares the fixed and free-EEG calibration curves. The
second panel summarizes the default calibration metrics.

.. GENERATED FROM PYTHON SOURCE LINES 261-311

.. code-block:: Python


    metric_names = [
        "mean_absolute_deviation",
        "mean_signed_deviation",
        "max_underconfidence_deviation",
        "max_overconfidence_deviation",
    ]
    fixed_metric_values = [fixed_curve["metrics_4"][name] for name in metric_names]
    free_metric_values = [free_curve_full_cov["metrics_4"][name] for name in metric_names]

    fig, axes = plt.subplots(1, 2, figsize=(11, 4.5))
    axes[0].plot([0, 1], [0, 1], "--", color="0.5", label="perfect calibration")
    axes[0].plot(fixed_curve["nominal"], fixed_curve["empirical"], marker="o", label="fixed")
    axes[0].plot(
        free_curve_full_cov["nominal"],
        free_curve_full_cov["empirical"],
        marker="s",
        label="free EEG full_cov",
    )
    axes[0].plot(
        free_curve_marginal["nominal"],
        free_curve_marginal["empirical"],
        marker="^",
        label="free EEG marginal",
    )
    axes[0].set(
        xlabel="Nominal coverage",
        ylabel="Empirical coverage",
        title="Calibration curves",
    )
    axes[0].legend(loc="best")

    bar_positions = np.arange(len(metric_names))
    bar_width = 0.38
    axes[1].bar(bar_positions - bar_width / 2, fixed_metric_values, width=bar_width, label="fixed")
    axes[1].bar(bar_positions + bar_width / 2, free_metric_values, width=bar_width, label="free EEG full_cov")
    axes[1].set(
        xticks=bar_positions,
        xticklabels=[
            "MAD",
            "MSD",
            "Max under",
            "Max over",
        ],
        ylabel="Metric value",
        title="Calibration summary metrics",
    )
    axes[1].legend(loc="best")
    fig.tight_layout()


.. image-sg:: /auto_tutorials/images/sphx_glr_09_metric_evaluation_001.png
   :alt: Calibration curves, Calibration summary metrics
   :srcset: /auto_tutorials/images/sphx_glr_09_metric_evaluation_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 312-317

Inspect the combined evaluation output
--------------------------------------

``evaluate_all`` combines error metrics, uncertainty summary, and calibration
summary into one dictionary.

.. GENERATED FROM PYTHON SOURCE LINES 317-321

.. code-block:: Python


    for key in ["mse", "mae", "rmse", "rmae", "mean_posterior_std", "calibration"]:
        print(f"fixed evaluate_all[{key!r}] =", fixed_summary[key])


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    fixed evaluate_all['mse'] = 0.0004689190050340884
    fixed evaluate_all['mae'] = 0.012077275043368751
    fixed evaluate_all['rmse'] = 0.02165453774695014
    fixed evaluate_all['rmae'] = 0.10989665619739643
    fixed evaluate_all['mean_posterior_std'] = 0.019154597617136844
    fixed evaluate_all['calibration'] = {'nominal': array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]), 'empirical': array([0.     , 0.5    , 0.53125, 0.53125, 0.6875 , 0.75   , 0.84375,
           0.875  , 0.90625, 1.     , 1.     ]), 'metrics_4': {'max_underconfidence_deviation': 0.0, 'max_overconfidence_deviation': 0.4, 'mean_absolute_deviation': 0.19318181818181815, 'mean_signed_deviation': 0.19318181818181815}}


.. GENERATED FROM PYTHON SOURCE LINES 322-334

Summary
-------

``MetricEvaluator`` is the high-level evaluation class used after uncertainty
estimation and calibration.

In this tutorial it was used to:

- compute aggregated calibration curves;
- summarize them with the default four calibration metrics;
- compare ``marginal`` and ``full_cov`` free-EEG uncertainty;
- collect reconstruction and uncertainty summaries with ``evaluate_all``.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.333 seconds)


.. _sphx_glr_download_auto_tutorials_09_metric_evaluation.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: 09_metric_evaluation.ipynb <09_metric_evaluation.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: 09_metric_evaluation.py <09_metric_evaluation.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: 09_metric_evaluation.zip <09_metric_evaluation.zip>`