User Tools

Site Tools


progetti:icarus:production-guide

Production Guide

List of Productions

Date Type Tag
03/24 MC mc-v09_84_00_01-202403-cnaf-corrsce
03/24 DATA run2-v09_84_00_01-202403-cnaf
03/24 MC mc-v09_84_00_01-202403-cnaf
02/24 DATA data_run2-v09_83_01202402-cnaf
02/24 MC mc_nucosm-v09_83_01202402-cnaf
12/23 DATA run2-v09_72_00_06-122023-variables

General Info

This page details the steps needed to submit and monitor production campaigns (hereafter campaigns) at CNAF. Two main types of campaigns are possible: real and MC data. In both cases, a first setup is needed for each new campaign. After that, the capaign is submitted in multiple steps, with each step submitting a batch of jobs. At the end, a final check of the completion of the campaign is requested.

Below, details on the initial setup, what to do while on shift and how to check the completion of the campaign are given.

Initial Setup

The first step is to download and setup all the needed scripts. This must be done only once per campaign. All the needed batches of jobs for the current campaign will be submitted with the same scripts. Each production request has its own configuration and will be associated to a (git) tag (<selected-tag>) used to download the correct version of the scripts. Once the tag has been provided, the shifter has to create a working directory in the default production folder (/storage/gpfs_data/icarus/local/prod) with the same name as the selected tag and access it:

cd /storage/gpfs_data/icarus/local/prod
mkdir <selected-tag>
cd <selected-tag>

From this folder, download the correct version of the scripts,

git clone https://baltig.infn.it/icarus/prod-scripts/ --recurse-submodules --branch <selected-tag>

and access the prod-scripts folder from where all steps will be submitted.

cd prod-scripts

Now, The initial setup is complete. Here is a complete example:

cd /storage/gpfs_data/icarus/local/prod
mkdir run2-v09_72_00_06-122023-variables
cd run2-v09_72_00_06-122023-variables
git clone https://baltig.infn.it/icarus/prod-scripts/ --recurse-submodules --branch run2-v09_72_00_06-122023-variables
cd prod-scripts

What to do while on shift

During the campaign, it's requested to check, every 6 hours, the status of the submitted jobs and submit new batch of jobs if needed. The steps, in sequential order, are:

Once the submission of the production is complete, the shifter should check the completion of the campaing

Check queue's status

The first step is to check the number of jobs in the idle state. This can be done as follows:

  1. Open the grafana page
  2. Scroll to the bottom of the page and check the Idle number in the Job Status icarus section

If this number is smaller than 300, go to the next step. If you don't see the the Idle section, also go to the next step. If there are more than 300 Idle jobs, repeat this step in 6 hours.

Configure the next job submission

If the current number of pending jobs is smaller than 300, a new batch of jobs can be submitted. First the shifter has to configure the job submission. To do so, the shifter has to go to the prod-scripts folder inside the working area created in the intial setup section (i.e cd /storage/gpfs_data/icarus/local/prod/<selected-tag>/prod-scripts). Example:

cd /storage/gpfs_data/icarus/local/prod/run2-v09_72_00_06-122023-variables/prod-scripts

Then, the shifter has to configure the job submission editing and modifing the variable.sh file. This step is different based on the production type, real or mc data.

Configure the job submission for real data production

Here, it's requested to modify the variables.sh file with the details of the batch of jobs to be submitted. The only variable to be modified is the YOUR_CUSTOM_RUN_LIST variable. This should be a list of numbers corresponding to the runs to submit in the batch.

From a Google Sheet document

The list of runs is provided in the first column of the sheet.

Here what to do for each batch:

  • open (with an editor, i.e. vim, nano, emacs or whatever you like) the variables.sh file and check the YOUR_CUSTOM_RUN_LIST variable (it should be empty the first time)
  • looks for runs with the Submitted column set to no. Select runs among these to have a total number of files of about 1000
  • copy the selected run numbners in the YOUR_CUSTOM_RUN_LIST variable, between quotes and separated by a space
  • save variables.sh

Configure the job submission for MC data production

Here, it's requested to modify the variables.sh file with the details of the batch of jobs to be submitted. Two variables need to be modified:

  • STARTING_RUN
  • NUMBER_OF_RUNS

The values for both variables for each step are provided in the batch.info file, located in the same folder created during the setup. If no batch.info file is present, a list of batches should have been provided differently.

Independently from the distribution method, here what to do for each batch:

  • open (with an editor, i.e. vim, nano, emacs or whatever you like) the variables.sh file and check the STARTING_RUN and NUMBER_OF_RUNS variables (they should both be 0 the first time)
  • open (with an editor, i.e. vim, nano, emacs or whatever you like) the batch.info file (or the resources provided with such info) and find the step corresponding to the values of the STARTING_RUN and NUMBER_OF_RUNS variables
  • go to the next step in the list and copy the corresponding values in the STARTING_RUN and NUMBER_OF_RUNS variables
  • save variables.sh

Create a proxy with voms extensions

You have to create a proxy with the voms extension, this step should be done every time your are going to submit a batch of jobs.

If you didn't get the needed certificate to generate a proxy, please do so following the instruction provided in the Personal Certificate and VO Enrollment section of this page.

After you got the certificate, or if you already have one, simply run the following command:

voms-proxy-init --voms icarus-exp.org --valid 72:00

You'll be asked to confirm your identity by inserting the GRID pass phrase used during the proxy setup. After inserting it, a proxy with a duration of three days will be created:

voms-proxy-init --voms icarus-exp.org --valid 72:00
Enter GRID pass phrase for this identity:
Contacting vomsigi-na.unina.it:15000 [/DC=org/DC=terena/DC=tcs/C=IT/ST=Napoli/O=Universita degli Studi di Napoli FEDERICO II/CN=vomsigi-na.unina.it] "icarus-exp.org"...
Remote VOMS server contacted succesfully.

vomsigi-na.unina.it:15000: The validity of this VOMS AC in your proxy is shortened to 86400 seconds!

Created proxy in /tmp/x509up_u####.

Your proxy is valid until Fri Oct 17 19:07:56 CEST 2025

Submit the batch of jobs

Heads-up: have you created a proxy with the voms extension? If not:

voms-proxy-init --voms icarus-exp.org --valid 72:00

Then, go to the prod-scripts folder inside the working area created in the intial setup section (i.e cd /storage/gpfs_data/icarus/local/prod/<selected-tag>/prod-scripts). Example:

cd /storage/gpfs_data/icarus/local/prod/run2-v09_72_00_06-122023-variables/prod-scripts

After updating the file variables.sh with the new batch, run the command:

module switch htc

and submit the production with:

./submit_production.sh

The script will automatically submit all the needed jobs. This could take a few minutes during which the shell will look unresponsive (it is not).

If the run list was provided with a Google Sheet document, update the Submitted column to yes.

Check the completion of a batch

Once a batch of runs has been processed, the shifter should check how many files were correctly completed.

This is done by running:

./get_info.sh [RUN_NUMBERS]

where [RUN_NUMBERS] is the list of runs to be checked. Example:

./get_info.sh 11806 11812 11817 11818

The script creates a logs folder in the prod-scripts folder, with these files inside:

./logs/all_raw_files.log         # The list of all submitted files
./logs/duplicated_folders.log    # The list of folders with multiple output files
./logs/missing_files.log         # The list of folders without output files
./logs/missing_folders.log       # The list of missing folders
./logs/ok_files.log              # The list of folders of the correctly processed files
./logs/resubmit_list.log         # The list of runs with failed files
./logs/run_summary.log           # The list of runs and the number of completed/failed files

A print message will tell the shifter whether there are some missing or duplicated files, or if everything is as expected (same number of raw and ok files).

If the run list was provided with a Google Sheet document, check the run_summary.log file. For each run, copy the first number in the corresponding Completed CAFs column in the sheet, and the second number in the Failed Files column.

In any case, if some files were not correctly processed, go to the resubmit step.

Check the completion of the campaign

Once the submission of the production is complete, the shifter should check the completion of the campaign.

For standard campaigns, this is done by running the same script used in the Check the completion of a batch step, without any argument:

./get_info.sh

A print message will tell the shifter whether there are some missing or duplicated files, or if everything is as expected (same number of raw and ok files).

In case some files were not correctly processed, go to the resubmit step.

For non-standard campaigns, specific instructions will be provided and added to this page each time.

Resubmit Raw Data Production

When processing Raw Data, if some jobs didn't end succesfully, it is possible to resubmit them with the resubmit.sh script. To do so, look at the logs/resubmit_list.log file and run the following command for each line in the file:

./resubmit.sh <RUN-NUMBER> <STREAMS> <MEMORY>

where

  • RUN-NUMBER is the number of the run to resubmit, the first argument of each line
  • STREAMS is a list of the streams to resubmit. The list of stream is the second argument of each line of the file, quotes included
  • MEMORY is an optional argument to specify a new memory requirement in GB for the resubmitted jobs (max 15).

Examples:

./resubmit 9888 "BNBMAJORITY BNBMINBIAS"
./resubmit 9435 "BNBMAJORITY" 15

The script will check the logs of the run/stream in the OUT_FOLDER/RUN_NUMBER/STREAM directory, remove the jobs related to the run/stream from the queue, check which files need to be reprocessed and resubmit the jobs with the same configuration used to submit them the first time but a different memory requirement, if specified.

FAQ

TO DO

progetti/icarus/production-guide.txt · Last modified: 2025/10/20 14:33 by vpia@infn.it

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki