Colibri fit folders
A Colibri fit folder is the output resulting from a Colibri fit. It is a folder containing a set of relevant information for the fit. Currently, we distinguish between two types of fit folders: Bayesian fit folders and Monte Carlo replica fit folders.
Bayesian fit folders
Note
By “Bayesian fit folder” we mean a folder containing the results of a fit performed with a Bayesian sampling method (see this section for details on how to run a Bayesian fit).
Any Bayesian fit folder should contain the following files:
colibri_fit/
├── bayes_metrics.csv
├── filter.yml # YAML file: copy of the input runcard
├── full_posterior_sample.csv
├── input/ # directory of input data and runcard(s)
├── md5 # checksum file to verify integrity of the fit folder
├── pdf_model.pkl # pickled PDF model used for the fit
└── replicas/ # Folder containing replica sub-folders (one per replica) with exportgrid files.
The replicas
folder contains the subfolders of the replicas that were used in the fit.
Each of these folders contains an .exportgrid
file, which can be interpreted as a sample
from the posterior distribution of the PDF model.
The pdf_model.pkl
file contains the pickled PDF model used for the fit. This file can
be used for several purposes. For example, it can be used to resample from the posterior
distribution of the PDF model when a Bayesian fit is performed (See also colibri.scripts.ns_resampler
).
The input/
directory contains data and the filter.yml
file is a copy of the input
runcard used for the fit.
The md5
file is a checksum file that can be used to verify the integrity of the fit folder.
The bayes_metrics.csv
file contains the metrics of the fit, such as the log-likelihood
and the evidence.
The full_posterior_sample.csv
file contains the full posterior sample of the fit
(whose size is specified in the runcard).
Depending on the type of Bayesian fit, other files may be present. For example for a Bayesian fit
using UltraNest, the following files will be present if sampler_plot
is set to true
:
ultranest_colibri_fit/
├── ultranest_logs/
├── ns_result.csv
While a fit done using the analytic_fit
module will contain the following extra file:
analytic_colibri_fit/
├── analytic_result.csv
Finding the \(\chi^2\) of a Bayesian Fit
The \(\chi^2\) for a Bayesian fit is stored in the bayes_metrics.csv
file, which looks
like this:
bayes_complexity,avg_chi2,min_chi2,logz
6.693346300122812,3633.618330629202,3.62692e+03,-1.83561e+03
After running a Bayesian fit, you should evolve it as described in Evolution script.
MC replica fit folders
Note
By “MC replica fit folder” we mean a folder containing the results of a fit performed with a Monte Carlo replica method (See [CMMM24] for more details on this method.).
A MC replica fit folder should have the following structure:
mc_replica_fit/
├── filter.yml # YAML file: copy of the input runcard
├── fit_replicas/ # Folder containing replica sub-folders (one per replica) with exportgrid files.
├── input/ # directory of input data and runcard(s)
├── md5 # checksum file to verify integrity of the fit folder
└── pdf_model.pkl # pickled PDF model used for the fit
where the fit_replicas
folder contains the subfolders of the replicas that were used in the fit.
The other files/folders are analogous to the ones produced by a Bayesian fit, discussed above.
Finding the \(\chi^2\) of a Monte Carlo fit
The \(\chi^2\) for each replica of a MC fit is stored in the
fit_replicas/replica_n/mc_loss.csv
file, where n is the specific replica number.
This file lists the training and validation losses for every 50 epochs. For example,
the first few lines would look like this:
epochs,training_loss,validation_loss
0,7.80859e+00,1.13569e+01
1,6.19384e+00,9.22697e+00
2,4.86740e+00,7.44600e+00
...
which would represent the losses for the first 150 epochs (i.e. 0, 1, 2 are just labels).
Postfit selection
After running a MC fit, you should run a postfit selection of the replicas. This is done by
the colibri.scripts.mc_postfit
script, which uses the fit_replicas` and creates a
new replicas
folder, which contains the replicas that pass the postfit, and are the ones
used to evolve the fit.
You can run a postfit selection by running:
mc_postfit -c CHI2_THRESHOLD monte_carlo_output_directory
where the -c
is optional and CHI2_THRESHOLD
is a number that determines
the \(\chi^2\) threshold above which a MC replica will be rejected, where this
value is taken from the last row of the training_loss
column shown above.
This can also be run as --chi2_threshold
instead of -c
. If no value is
specified, a default value of 1.5 will be applied.
Other options are:
--nsigma NSIGMA
: The nsigma threshold above which replicas are rejected. The default is 5.--target_replicas TARGET_REPLICAS
or-t TARGET_REPLICAS
: The target number of replicas to be produced by postfit. The default is 100.
After running a postfit selection of a MC fit, you should evolve it as described in this section.