progetti:icarus:production-guide
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| progetti:icarus:production-guide [2025/10/14 20:14] – [Configure the job submission for real data production] vpia@infn.it | progetti:icarus:production-guide [2025/10/27 15:24] (current) – [Submit the batch of jobs] vpia@infn.it | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Production Guide ====== | ||
| + | |||
| + | ====== List of Productions ====== | ||
| + | ^ Date ^ Type ^ Tag ^ | ||
| + | | 10/25 | DATA | run3-processing-cnaf-1025-v10_06_00_04p03 | ||
| + | | 03/24 | MC | mc-v09_84_00_01-202403-cnaf-corrsce | ||
| + | | 03/24 | DATA | run2-v09_84_00_01-202403-cnaf | ||
| + | | 03/24 | MC | mc-v09_84_00_01-202403-cnaf | ||
| + | | 02/24 | DATA | data_run2-v09_83_01202402-cnaf | ||
| + | | 02/24 | MC | mc_nucosm-v09_83_01202402-cnaf | ||
| + | | 12/23 | DATA | run2-v09_72_00_06-122023-variables | ||
| + | |||
| + | ====== General Info ====== | ||
| + | This page details the steps needed to submit and monitor production campaigns (hereafter campaigns) at CNAF. Two main types of campaigns are possible: **real** and **MC** data. In both cases, a first setup is needed for each new campaign. After that, the capaign is submitted in multiple steps, with each step submitting a batch of jobs. At the end, a final check of the completion of the campaign is requested. | ||
| + | |||
| + | Below, details on the [[https:// | ||
| + | | ||
| + | |||
| + | ====== Initial Setup ====== | ||
| + | The first step is to download and setup all the needed scripts. This must be done **only once** per campaign. All the needed batches of jobs for the current campaign will be submitted with the **same** scripts. Each production request has its own configuration and will be associated to a (//git//) tag (''< | ||
| + | |||
| + | mkdir / | ||
| + | cd / | ||
| + | mkdir < | ||
| + | cd < | ||
| + | |||
| + | From this folder, download the correct version of the scripts, | ||
| + | | ||
| + | git clone https:// | ||
| + | |||
| + | and access the prod-scripts folder from where all steps will be submitted. | ||
| + | |||
| + | cd prod-scripts | ||
| + | |||
| + | Now, The initial setup is complete. Here is a complete example: | ||
| + | |||
| + | cd / | ||
| + | mkdir run2-v09_72_00_06-122023-variables | ||
| + | cd run2-v09_72_00_06-122023-variables | ||
| + | git clone https:// | ||
| + | cd prod-scripts | ||
| + | |||
| + | ====== What to do while on shift ====== | ||
| + | During the campaign, it's requested to check, **every 6 hours**, the status of the submitted jobs and submit new batch of jobs if needed. | ||
| + | The steps, in sequential order, are: | ||
| + | |||
| + | - [[https:// | ||
| + | - [[https:// | ||
| + | - [[https:// | ||
| + | - [[https:// | ||
| + | |||
| + | Once the submission of the production is complete, the shifter should [[https:// | ||
| + | |||
| + | |||
| + | ===== Check queue' | ||
| + | The first step is to check the number of jobs in the //idle// state. This can be done as follows: | ||
| + | {{ : | ||
| + | - Open the [[https:// | ||
| + | - Scroll to the bottom of the page and check the //Idle// number in the Job Status icarus section | ||
| + | |||
| + | If this number is smaller than **300**, go to the [[https:// | ||
| + | If you don't see the the //Idle// section, also go to the [[https:// | ||
| + | If there are more than 300 Idle jobs, repeat this step in 6 hours. | ||
| + | |||
| + | |||
| + | ===== Configure the next job submission ===== | ||
| + | If the current number of //pending// jobs is smaller than **300**, a new batch of jobs can be submitted. First the shifter has to configure the job submission. To do so, the shifter has to go to the '' | ||
| + | |||
| + | cd / | ||
| + | |||
| + | Then, the shifter has to configure the job submission editing and modifing the '' | ||
| + | |||
| + | ==== Configure the job submission for real data production ==== | ||
| + | Here, it's requested to modify the '' | ||
| + | |||
| + | ===From a Google Sheet document=== | ||
| + | The list of runs is provided in the first column of the sheet. | ||
| + | |||
| + | Here what to do for each batch: | ||
| + | * open (with an editor, i.e. '' | ||
| + | * looks for runs with the **Submitted column** set to **no**. Select runs among these to have a total number of files of about 1000 | ||
| + | * copy the selected run numbners in the **YOUR_CUSTOM_RUN_LIST** variable, between quotes and separated by a space | ||
| + | * save // | ||
| + | |||
| + | {{ : | ||
| + | |||
| + | ==== Configure the job submission for MC data production ==== | ||
| + | Here, it's requested to modify the // | ||
| + | * **STARTING_RUN** | ||
| + | * **NUMBER_OF_RUNS** | ||
| + | |||
| + | The values for both variables for each step are provided in the // | ||
| + | |||
| + | Independently from the distribution method, here what to do for each batch: | ||
| + | * open (with an editor, i.e. '' | ||
| + | * open (with an editor, i.e. '' | ||
| + | * go to the next step in the list and copy the corresponding values in the **STARTING_RUN** and **NUMBER_OF_RUNS** variables | ||
| + | * save // | ||
| + | |||
| + | {{ : | ||
| + | |||
| + | ===== Create a proxy with voms extensions ===== | ||
| + | |||
| + | You have to create a proxy with the voms extension, this step should be done **every time** your are going to submit a batch of jobs. | ||
| + | |||
| + | If you didn't get the needed certificate to generate a proxy, please do so following the instruction provided in the **Personal Certificate and VO Enrollment** section of [[https:// | ||
| + | |||
| + | After you got the certificate, | ||
| + | |||
| + | voms-proxy-init --voms icarus-exp.org --valid 72:00 | ||
| + | |||
| + | You'll be asked to confirm your identity by inserting the GRID pass phrase used during the proxy setup. After inserting it, a proxy with a duration of three days will be created: | ||
| + | |||
| + | voms-proxy-init --voms icarus-exp.org --valid 72:00 | ||
| + | Enter GRID pass phrase for this identity: | ||
| + | Contacting vomsigi-na.unina.it: | ||
| + | Remote VOMS server contacted succesfully. | ||
| + | | ||
| + | vomsigi-na.unina.it: | ||
| + | | ||
| + | Created proxy in / | ||
| + | | ||
| + | Your proxy is valid until Fri Oct 17 19:07:56 CEST 2025 | ||
| + | |||
| + | ===== Submit the batch of jobs ===== | ||
| + | **Heads-up**: | ||
| + | |||
| + | voms-proxy-init --voms icarus-exp.org --valid 72:00 | ||
| + | |||
| + | Then, go to the '' | ||
| + | |||
| + | cd / | ||
| + | |||
| + | After updating the file // | ||
| + | |||
| + | module switch htc | ||
| + | |||
| + | and submit the production with: | ||
| + | |||
| + | ./ | ||
| + | |||
| + | The script will automatically submit all the needed jobs. This could take a few minutes during which the shell will look unresponsive (it is not). | ||
| + | |||
| + | After regaining control of the shell, you can check if the jobs were correctly submitted by either check the same grafana page shown previously (it updates with a **4-5 minutes delay** so you'll not be able to see the new jobs immediatly) or by running the **condor_q** command: | ||
| + | |||
| + | $ condor_q | ||
| + | | ||
| + | Schedd: sn01-htc.cr.cnaf.infn.it : < | ||
| + | OWNER BATCH_NAME | ||
| + | valerpia ID: 11228736 | ||
| + | valerpia ID: 11228746 | ||
| + | valerpia 11854_BNBMAJORITY | ||
| + | valerpia 11854_BNBMINBIAS | ||
| + | valerpia 11873_BNBMAJORITY | ||
| + | valerpia 11873_BNBMINBIAS | ||
| + | | ||
| + | Total for query: 1572 jobs; 1445 completed, 0 removed, 0 idle, 127 running, 0 held, 0 suspended | ||
| + | Total for valerpia: 1572 jobs; 1445 completed, 0 removed, 0 idle, 127 running, 0 held, 0 suspended | ||
| + | Total for all users: 89517 jobs; 60905 completed, 0 removed, 18666 idle, 9666 running, 280 held, 0 suspended | ||
| + | |||
| + | The new jobs should be at the bottom of the list, either in the RUN or IDLE stage. | ||
| + | |||
| + | If the run list was provided with a Google Sheet document, update the **Submitted column** to **yes**. | ||
| + | |||
| + | ====== Check the completion of a batch ====== | ||
| + | Once a batch of runs has been processed, the shifter should check how many files were correctly completed. | ||
| + | |||
| + | This is done by running: | ||
| + | ./ | ||
| + | |||
| + | where [RUN_NUMBERS] is the list of runs to be checked. Example: | ||
| + | |||
| + | ./ | ||
| + | | ||
| + | The script creates a '' | ||
| + | ./ | ||
| + | ./ | ||
| + | ./ | ||
| + | ./ | ||
| + | ./ | ||
| + | ./ | ||
| + | ./ | ||
| + | | ||
| + | A print message will tell the shifter whether there are some missing or duplicated files, or if everything is as expected (same number of //raw// and //ok// files). | ||
| + | |||
| + | If the run list was provided with a Google Sheet document, check the **run_summary.log file**. For each run, copy the first number in the corresponding **Completed CAFs** column in the sheet, and the second number in the **Failed Files** column. | ||
| + | |||
| + | {{ : | ||
| + | |||
| + | In any case, if some files were not correctly processed, go to the resubmit step. | ||
| + | |||
| + | ====== Check the completion of the campaign ====== | ||
| + | Once the submission of the production is complete, the shifter should check the completion of the campaign. | ||
| + | |||
| + | For **standard campaigns**, | ||
| + | |||
| + | ./ | ||
| + | | ||
| + | A print message will tell the shifter whether there are some missing or duplicated files, or if everything is as expected (same number of //raw// and //ok// files). | ||
| + | |||
| + | In case some files were not correctly processed, go to the resubmit step. | ||
| + | |||
| + | For **non-standard campaigns**, | ||
| + | | ||
| + | ======Resubmit Raw Data Production====== | ||
| + | When processing Raw Data, if some jobs didn't end succesfully, | ||
| + | |||
| + | ./ | ||
| + | |||
| + | where | ||
| + | * RUN-NUMBER is the number of the run to resubmit, the first argument of each line | ||
| + | |||
| + | * STREAMS is a list of the streams to resubmit. The list of stream is the second argument of each line of the file, quotes included | ||
| + | |||
| + | * MEMORY is an optional argument to specify a new memory requirement in GB for the resubmitted jobs (max 15). | ||
| + | |||
| + | Examples: | ||
| + | |||
| + | ./resubmit 9888 " | ||
| + | ./resubmit 9435 " | ||
| + | | ||
| + | The script will check the logs of the run/stream in the OUT_FOLDER/ | ||
| + | |||
| + | ====== FAQ ====== | ||
| + | TO DO | ||
