Monte Carlo Replica Method Fits

In this section, we describe how to run a Monte Carlo (MC) replica method fit in Colibri.

The Monte Carlo replica method introduces a pseudodata distribution, \(d_p \sim N(d_0, \Sigma)\), which is an approximation to the actual distribution from which the measurement \(d_0\) was drawn.

We then define the corresponding ‘best-fit parameter’ values to be those which minimise the \(\chi^2\) evaluated on the pseudodata:

\[\chi^2 = (d_p - t(\theta))^T \, \Sigma^{-1} \, (d_p - t(\theta))\]

where:

  • \(d_p\) is the pseudodata vector, drawn from the distribution \(d_p \sim N(d_0, \Sigma)\),

  • \(t(\theta)\) is the theoretical prediction vector, dependent on the model parameters \(\theta\),

  • \(\Sigma\) is the covariance matrix encoding the uncertainties.

The idea is to repeat this procedure across multiple data replicas, so that the collection of best-fit models for each pseudodata instance approximates a posterior sample.

Runcard

Here is an example runcard to perform a fit using the Monte Carlo replica method. Note that the dependence on the model will come from the model-specific executable.

meta: 'An example fit using Colibri, reduced DIS dataset.'

#######################
# Data and theory specs
#######################

dataset_inputs:
# DIS
- {dataset: SLAC_NC_NOTFIXED_P_EM-F2, variant: legacy_dw}
- {dataset: SLAC_NC_NOTFIXED_D_EM-F2, variant: legacy_dw}
- {dataset: BCDMS_NC_NOTFIXED_P_EM-F2, variant: legacy_dw}
- {dataset: BCDMS_NC_NOTFIXED_D_EM-F2, variant: legacy_dw}
# - {dataset: CHORUS_CC_NOTFIXED_PB_NU-SIGMARED, variant: legacy_dw}
# - {dataset: CHORUS_CC_NOTFIXED_PB_NB-SIGMARED, variant: legacy_dw}
# - {dataset: NUTEV_CC_NOTFIXED_FE_NU-SIGMARED, cfac: [MAS], variant: legacy_dw}
# - {dataset: NUTEV_CC_NOTFIXED_FE_NB-SIGMARED, cfac: [MAS], variant: legacy_dw}
# - {dataset: HERA_NC_318GEV_EM-SIGMARED, variant: legacy}
# - {dataset: HERA_NC_225GEV_EP-SIGMARED, variant: legacy}
# - {dataset: HERA_NC_251GEV_EP-SIGMARED, variant: legacy}
# - {dataset: HERA_NC_300GEV_EP-SIGMARED, variant: legacy}
# - {dataset: HERA_NC_318GEV_EP-SIGMARED, variant: legacy}
# - {dataset: HERA_CC_318GEV_EM-SIGMARED, variant: legacy}
# - {dataset: HERA_CC_318GEV_EP-SIGMARED, variant: legacy}
# - {dataset: HERA_NC_318GEV_EAVG_CHARM-SIGMARED, variant: legacy}
# - {dataset: HERA_NC_318GEV_EAVG_BOTTOM-SIGMARED, variant: legacy}
# - {dataset: NMC_NC_NOTFIXED_EM-F2, variant: legacy_dw}
# - {dataset: NMC_NC_NOTFIXED_P_EM-SIGMARED, variant: legacy}

theoryid: 40000000                     # The theory from which the predictions are drawn.
use_cuts: internal                     # The kinematic cuts to be applied to the data.

#####################
# Loss function specs
#####################

positivity:                            # Positivity datasets, used in the positivity penalty.
posdatasets:
- {dataset: NNPDF_POS_2P24GEV_F2U, variant: None, maxlambda: 1e6}

# positivity_penalty_settings:
#   positivity_penalty: false
#   alpha: 1e-7
#   lambda_positivity: 0

use_fit_t0: True                       # Whether the t0 covariance is used in the chi2 loss.
t0pdfset: NNPDF40_nnlo_as_01180        # The t0 PDF used to build the t0 covariance matrix.


###################
# Methodology specs
###################

# Integrability Settings
#integrability_settings:
#  integrability: False

# Monte Carlo settings
use_gen_t0: True                       # Whether the t0 covariance is used to generated pseudodata.
max_epochs: 300                        # The max number of epochs in Monte Carlo training.
mc_validation_fraction: 0.2            # The fraction of the data used for validation in Monte Carlo training.

mc_initialiser_settings:               # The initialiser for Monte Carlo training.
    type: uniform
    bounds:
        alpha_gluon: [-0.1, 1]
        beta_gluon: [9, 13]
        alpha_up: [0.4, 0.9]
        beta_up: [3, 4.5]
        epsilon_up: [-3, 3]
        gamma_up: [1, 6]
        alpha_down: [1, 2]
        beta_down: [8, 12]
        epsilon_down: [-4.5, -3]
        gamma_down: [3.8, 5.8]
        norm_sigma: [0.1, 0.5]
        alpha_sigma: [-0.2, 0.1]
        beta_sigma: [1.2, 3]

actions_:
- run_monte_carlo_fit

If it is appropriate for a given model, you may choose to have single, global minimum and maximum values for all parameters, instead of specific bounds for each parameter. In that, case, you may replace

bounds:
    alpha_gluon: [-0.1, 1]
    beta_gluon: [9, 13]
...

with, for example:

min_val: -4.5
max_val: 13

Running the fit

To perform a Monte Carlo fit for the Les Houches model (presented in this tutorial), you would run:

pip install -e .

from the examples/les_houches_example directory.

Then, you can use the above runcard with the following command:

les_houches_exe monte_carlo_runcard.yml -rep N

Note that this command will generate one single replica, namely replica number N. For a fit with more than one replica, you should iterate the above or submit the job to a batch system.

You can then run a postfit selection on the fit folders and evolve the fit. Details on how to do this can be found in this section.