How to do DL2 reconstruction using ctapipe tools#
This guide explains how to use ctapipe to process files containing images (data level 1a/ DL1a)
to data level 2 (DL2) for both monoscopic and stereoscopic analyses.
This includes the steps of image cleaning, image parametrization, training and applying machine learning
models to estimate the energy and particle type, and either geometric direction reconstruction (stereo)
or machine learning based direction reconstruction using the disp algorithm (mono or stereo).
Note
This guide assumes you have a directory containing gamma, proton, and electron files already processed to DL1a.
The provided commands assume you are trying to process files in a
bashshell environment.
Setup#
As always, you can run
ctapipe-quickstart
to get some example configuration files, which you can use as a starting point to create your desired configuration. You can also get up to date configuration files from gitlab, but take care to select the configuration files appropriate for the version of ctapipe you are using.
To keep things organized, also define an output directory and make sure it exists, for example like this:
OUTPUT_DIR=build
mkdir -p $OUTPUT_DIR
Image cleaning, image parametrization, and geometric direction reconstruction#
Image cleaning and parametrization, as well as the geometric direction reconstruction can all be done by
ctapipe-process with the following command:
ctapipe-process --input $INPUT_FILE \
  --output $OUTPUT_DIR/$OUTPUT_FILE \
  --config $CONFIG_FILE \
  --config $SUBARRAY_FILE \
  --provenance-log $OUTPUT_DIR/$PROVENANCE_LOG
INPUT_FILE is the path to an input file and OUTPUT_FILE the name of the output file.
CONFIG_FILE is the path to a configuration file, such as dl1_to_dl2.yml found in the gitlab
for v0.23/v1 of ctapipe, and SUBARRAY_FILE is another configuration file describing
the subarray, such as prod6/subarray_north_alpha.yml found next to the  dl1_to_dl2.yml in the gitlab.
You can also combine all the configuration options in one file and only pass this one to the tool.
Finally, PROVENANCE_LOG is the name of the file in which the provenance log will be saved, take care
to change it for each input file or later tracking will be complicated.
Run the above command with the inputs adjusted so that each of the available input files is processed
into a separate file, and you will end up with a set of h5 files containing image parameters and
entries at the path dl2/event/subarray/geometry.
To exclude the geometric direction reconstruction from the processing (e.g. for a mono analysis),
the ShowerProcessor section in dl1_to_dl2.yml would have to be changed to not include
HillasReconstructor under reconstructor_types.
Or, even easier, the whole ShowerProcessor section can be removed.
Merging#
Decide on which fraction of your files you want to use for training each ML model:
Training the energy regressor requires a gamma training file (called e.g.
gamma_merged_train_en.dl2.h5).Training the particle classifier requires another gamma and a proton training file (called e.g.
gamma_merged_train_clf.dl2.h5andproton_merged_train.dl2.h5).Training the disp reconstructor requires a gamma training file, which can be the same file as used for the particle classifier (e.g.
gamma_merged_train_clf.dl2.h5).
The remaining gamma and proton files can be used for testing (or e.g. IRF calculation). It does not make sense to use electrons for training, so all electron files can be used for testing.
Then save these sets of files into a total of six lists and run something like the following command for each of these lists:
ctapipe-merge $GAMMA_TRAIN_EN_FILES --output $OUTPUT_DIR/gamma_merged_train_en.dl2.h5
where GAMMA_TRAIN_EN_FILES is an environment variable containing the list of gamma files
you want to merge into a training set, for example generated using:
GAMMA_TRAIN_EN_FILES=$(echo $INPUT_DIR/gamma*[0-1]*.h5)
which will merge all gamma files in INPUT_DIR with names that start with “0” or “1”.
Alternatively you could save the files into a literal file (e.g. gamma_train_en_files.list),
one file name per row, which you then use like this:
GAMMA_TRAIN_EN_FILES=$(cat gamma_train_en_files.list)
ctapipe-merge $GAMMA_TRAIN_EN_FILES --output $OUTPUT_DIR/gamma_merged_train_en.dl2.h5
Using some method of specifying files, merge your gamma, proton, and electron files so that you end up with six merged files:
Gamma train energy, the file containing the gamma events to be used for training the energy regressor.
Gamma train classifier, the file containing the gamma events to be used for training the particle classifier and the disp reconstructor.
Gamma test, the file with gamma events used for “testing” the performance of the analysis.
Proton train, the file containing the proton events to be used for training the particle classifier.
Proton test, the file with proton events used for “testing” the performance of the analysis.
Electron test, the file with electron events used for “testing” the performance of the analysis.
Training the machine learning models#
The training process has the following steps:
Train an energy model on the gamma train energy file.
Apply the energy model to the gamma train classifier file and the proton train file.
Train a particle classifier on the gamma train classifier file and the proton train file.
Train a disp reconstructor on the gamma train classifier file.
First define the following environment variables:
REG_CONF_FILE, a configuration file for the energy regression training for exampletrain_energy_regressor.yamlCLF_CONF_FILE, a configuration file for the particle classification training for exampletrain_particle_classifier.yamlDISP_CONF_FILE, a configuration file for the disp reconstructor training for exampletrain_disp_reconstructor.yamlINPUT_GAMMA_EN_FILE, the gamma train energy file created in the previous stepINPUT_GAMMA_CLF_FILE, the gamma train classifier file created in the previous stepINPUT_PROTON_FILE, the proton train fileEVAL_GAMMA_FILE, the gamma test fileEVAL_PROTON_FILE, the proton test fileEVAL_ELECTRON_FILE, the electron test file
Then the training of the machine learning models is done using the following commands:
ctapipe-train-energy-regressor --input $INPUT_GAMMA_EN_FILE \
  --output $OUTPUT_DIR/energy_regressor.pkl \
  --config $REG_CONF_FILE \
  --cv-output $OUTPUT_DIR/cv_energy.h5 \
  --provenance-log $OUTPUT_DIR/train_energy.provenance.log \
  --log-file $OUTPUT_DIR/train_energy.log \
  --log-level INFO
ctapipe-apply-models --input $INPUT_GAMMA_CLF_FILE \
  --output $OUTPUT_DIR/gamma_train_clf.dl2.h5 \
  --reconstructor $OUTPUT_DIR/energy_regressor.pkl \
  --provenance-log $OUTPUT_DIR/apply_gamma_train_clf.provenance.log \
  --log-file $OUTPUT_DIR/apply_gamma_train_clf.log \
  --log-level INFO
ctapipe-apply-models --input $INPUT_PROTON_FILE  \
  --output $OUTPUT_DIR/proton_train_clf.dl2.h5 \
  --reconstructor $OUTPUT_DIR/energy_regressor.pkl \
  --provenance-log $OUTPUT_DIR/apply_proton_train.provenance.log \
  --log-file $OUTPUT_DIR/apply_proton_train.log \
  --log-level INFO
ctapipe-train-particle-classifier --signal $OUTPUT_DIR/gamma_train_clf.dl2.h5 \
  --background $OUTPUT_DIR/proton_train_clf.dl2.h5 \
  --output $OUTPUT_DIR/particle_classifier.pkl \
  --config $CLF_CONF_FILE \
  --cv-output $OUTPUT_DIR/cv_particle.h5 \
  --provenance-log $OUTPUT_DIR/train_particle.provenance.log \
  --log-file $OUTPUT_DIR/train_particle.log \
  --log-level INFO
ctapipe-train-disp-reconstructor --input $OUTPUT_DIR/gamma_train_clf.dl2.h5 \
  --output $OUTPUT_DIR/disp_reconstructor.pkl \
  --config $DISP_CONF_FILE \
  --cv-output $OUTPUT_DIR/cv_disp.h5 \
  --provenance-log $OUTPUT_DIR/train_disp.provenance.log \
  --log-file $OUTPUT_DIR/train_disp.log \
  --log-level INFO
which will produce three trained models saved as $OUTPUT_DIR/energy_regressor.pkl, $OUTPUT_DIR/particle_classifier.pkl,
and $OUTPUT_DIR/disp_reconstructor.pkl.
The saved model for the disp reconstruction contains both, the regressor for estimating norm(disp) and the classifier
for determining sign(disp).
Applying the machine learning models on the test files#
Now we can apply these trained models on the test files, EVAL_GAMMA_FILE, EVAL_PROTON_FILE, and EVAL_ELECTRON_FILE,
to produce the final DL2 files:
ctapipe-apply-models --input $EVAL_GAMMA_FILE \
  --output $OUTPUT_DIR/gamma_final.dl2.h5 \
  --reconstructor $OUTPUT_DIR/energy_regressor.pkl \
  --reconstructor $OUTPUT_DIR/particle_classifier.pkl \
  --reconstructor $OUTPUT_DIR/disp_reconstructor.pkl \
  --provenance-log $OUTPUT_DIR/apply_gamma_final.provenance.log \
  --log-file $OUTPUT_DIR/apply_gamma_final.log \
  --log-level INFO
ctapipe-apply-models --input $EVAL_PROTON_FILE \
  --output $OUTPUT_DIR/proton_final.dl2.h5 \
  --reconstructor $OUTPUT_DIR/energy_regressor.pkl \
  --reconstructor $OUTPUT_DIR/particle_classifier.pkl \
  --reconstructor $OUTPUT_DIR/disp_reconstructor.pkl \
  --provenance-log $OUTPUT_DIR/apply_proton_final.provenance.log \
  --log-file $OUTPUT_DIR/apply_proton_final.log \
  --log-level INFO
ctapipe-apply-models --input $EVAL_ELECTRON_FILE \
  --output $OUTPUT_DIR/electron_final.dl2.h5 \
  --reconstructor $OUTPUT_DIR/energy_regressor.pkl \
  --reconstructor $OUTPUT_DIR/particle_classifier.pkl \
  --reconstructor $OUTPUT_DIR/disp_reconstructor.pkl \
  --provenance-log $OUTPUT_DIR/apply_electron_final.provenance.log \
  --log-file $OUTPUT_DIR/apply_electron_final.log \
  --log-level INFO
which will produce $OUTPUT_DIR/gamma_final.dl2.h5, $OUTPUT_DIR/proton_final.dl2.h5,
and $OUTPUT_DIR/electron_final.dl2.h5.