User Tools

Site Tools


cn:csn1:padme:shifter_instruction:running_the_daq

Logging in

To run the PADME DAQ system the shifter must log on l0padme1 as daq. The password is written on the board in the Control Room (or ask anybody from the collaboration).

After logging on, cd to directory DAQ.

[padme@padmecr4 ~]$ ssh -Y daq@l0padme1

daq@l0padme1's password:

Last login: Mon Oct 8 09:39:46 2018 from padmecr4.lnf.infn.it

[daq@l0padme1 ~]$ cd DAQ

[daq@l0padme1 DAQ]$

Starting the RunControl server

All DAQ procedures are handled through the RunControl server. This is a daemon process running on l0padme1.

To verify if the process is running:

[daq@l0padme1 DAQ]$ ps -fu daq | grep RunControl UID PID PPID C STIME TTY TIME CMD … daq 177988 1 0 10:05 ? 00:00:00 /usr/bin/python ./RunControl –server …

If it is not running, please restart it:

[daq@l0padme1 DAQ]$ ./RunControl –server Starting RunControlServer in background

All output from the RunControl server process is written to the log/RunControlServer.log file in the DAQ directory. Looking into this file can help troubleshooting DAQ problems.

Starting the RunControl client

The RunControl client is used to issue commands to the RunControl server (start a new run, stop the run, …). To start the client:

[daq@l0padme1 DAQ]$ ./RunControl –no-gui Connecting to RunControl server on host localhost port 10000 SEND (q or Q to Quit):

This will start the RunControl client in text mode (a GUI is foreseen but is not available yet). All commands will be given from this terminal. The help command can be used to get a list of available commands at any point of the RunControl procedure.

SEND (q or Q to Quit): help Sending help Available commands: help Show this help get_state Show current state of RunControl get_setup Show current setup name get_setup_list Show list of available setups get_board_list Show list of boards in use with current setup get_board_config_daq <b> Show current configuration of board DAQ process <b> get_board_config_zsup <b> Show current configuration of board ZSUP process <b> get_trig_config Show current configuration of trigger process get_run_number Return last run number in DB change_setup <setup> Change run setup to <setup> new_run Initialize system for a new run shutdown Tell RunControl server to exit (use with extreme care!) SEND (q or Q to Quit):

Verifying and changing the setup

Before starting a new run it is wise to verify which setup is currently loaded and change it if needed. For the time being, unless told otherwise by the Run Coordinator, the correct setup is full201809 which will enable all ADC boards and acquire data from all PADME detectors. Before starting any run, please make sure that the setup is correct:

SEND (q or Q to Quit): get_setup Sending get_setup full201809

If asked by the Run Coordinator, you can change the setup:

SEND (q or Q to Quit): get_setup_list Sending get_setup_list ['full201809', 'sac201807', 'single201810', 'target201809', 'test201806', 'test201809', 'veto201809'] SEND (q or Q to Quit): change_setup target201809 Sending change_setup target201809 target201809

The change_setup command is also used to reload a setup if any of its files did change (WARNING: only the Run Coordinator is allowed to edit the setup files):

SEND (q or Q to Quit): get_setup Sending get_setup full201809 SEND (q or Q to Quit): change_setup full201809 Sending change_setup full201809 full201809

Initializing a new run

SEND (q or Q to Quit): new_run Sending new_run Run number (next or dummy): dummy Sending dummy new_run - new run will have number 0

WARNING: for the time being, only dummy is supported.

Run type: TEST Sending TEST new_run - new run will have type TEST

Note: supported run types are TEST, DAQ, CALIBRATION, COSMICS, RANDOM. Uppercase is mandatory.

Shift crew: Emanuele Sending Emanuele Start of run comment: My first test run Sending My first test run

Both "Shift crew" and "Start of run comment" accept free format text of (almost) indefinite length. Try to be as detailed as possible in describing the run conditions (beam status, HV status, ADC boards included, special conditions, etc…).

Now the run initialization procedure can start. Expect a delay of several seconds before the first message is shown.

level1 0 ready level1 1 ready … merger ready trigger ready adc 0 zsup_ready adc 1 zsup_ready … adc 9 zsup_ready adc 0 init adc 1 init … adc 9 init adc 0 ready adc 1 ready … adc 9 ready New run initialization completed correctly init_ready

The initialization procedure for the full experiment (29 ADC boards) takes up to 2 minutes, so wait patiently.

If one of the boards gets stuck, it will not respond to the initialization procedure. In this case the list of adc NN ready messages will be incomplete and the system will not give back control to the operator for a long time (several minutes). If this happens, the best strategy is to follow the procedure described in Exiting from the system below, execute the usual clean-up procedure, then reset the VME crates and finally restart the whole system starting with the RunControl server. This whole procedure should be tried a couple of times before giving up. If after this the initialization keeps failing, it is time to call an expert.

Starting a new run after initialization

SEND (q or Q to Quit): start_run Sending start_run Run started correctly run_started

Stopping a run

SEND (q or Q to Quit): stop_run Sending stop_run End of run comment: My end of run Sending My end of run adc 0 daq_terminate_ok adc 0 zsup_terminate_ok trigger terminate_ok merger terminate_ok level1 0 terminate_ok Run terminated correctly terminate_ok

Moving the DAQ client window to another terminal

Only a single RunControl client can connect to the RunControl server at any given time. If you want to move the client from one terminal to another, issue the Q command on the original client and then start the new one with the usual command: this procedure will not affect the RunControl server in any way (e.g. if a run is in progress it will keep taking data).

Please note that the client MUST NOT be stopped while the new_run procedure is in progress: this would leave the RunControl server in an indefinite state and will require stopping and restarting it.

If you are leaving after your shift and none is coming after you, please close the client window: in this way, everyone who is not in control room can manage runs.

Exiting from the system

Use this procedure only if you want to stop the main RunControl server. This should be done only if the initialization or stop_run procedures fail and/or the system gets in a pathological state (no response to the client).

SEND (q or Q to Quit): shutdown Sending shutdown exiting Server's gone. I'll take my leave as well… Closing socket

If the server is stuck and does not respond to user commands, it can be killed with the Unix kill (or kill -9 if needed) command:

[daq@l0padme1 DAQ]$ ps -fu daq | grep RunControl UID PID PPID C STIME TTY TIME CMD … daq 177988 1 0 10:05 ? 00:00:00 /usr/bin/python ./RunControl –server … [daq@l0padme1 DAQ]$ kill 177988

Log files

All active processes created during the DAQ produce individual log files which can be very useful to verify if the DAQ is running smoothly. All log files for a given run are stored in a single directory named after the run (e.g. run_0000000_20181005_094240/log). This directory is created inside the DAQ/runs subdirectory (i.e. DAQ/runs/run_0000000_20181005_094240/log for the previous example).

To check if the trigger board is correctly receiving the trigger from the BTF:

[daq@l0padme1 log]$ tail -f run_0000000_20181005_094240_trigger.log … Some setup messages … 2018/10/05 09:44:54 - Starting trigger generation - Opening output stream '/home/daq/DAQ/local/streams/run_0000000_20181005_094240/run_0000000_20181005_094240_trigger' Current masks: trig 0x01 busy 0x00 dummy 0x00 0x00 - Trigger 0 0x3a418cf3f0c8fb 605388065019 0x1 233 - Trigger 100 0x53418d0bc69413 605787952147 0x1 333 4998.588867ms 5s - Trigger 200 0x6c418d239c546c 606187836524 0x1 433 4998.554688ms 5s … Trigger number keeps growing steadly …

To check if the event merger is receiving data from all boards with the correct synchronization:

[daq@l0padme1 log]$ tail -f run_0000000_20181005_094240_merger.log … Some setup messages … Board 26 has id 7 and SN 203 Board 27 has id 8 and SN 187 Board 28 has id 9 and SN 188 - Written 100 events - Written 200 events … Number of written events keeps growing steadily …

When the event merger is not synchronized with the event trigger, please check the setup: for example, if we are taking data at 50 Hz with the complete set of detectors, "full201809" is the setup to choose. Another setup at this rate could cause a de-synchronization.

Any problem in the DAQ will immediately show up in the event merger log file. In this case the run should be stopped and the clean-up procedure should be applied before starting a new run.

An example of problems linked to the trigger board loosing packets looks like this:

… all good up to now … - Written 7000 events - Written 7100 events * Board 0 - Board time 357818173696 less than Trigger time 394468561288: skip event and try to recover * Board 0 - Board time 357838171793 less than Trigger time 394468561288: skip event and try to recover *** Board 0 - Board time 357878168337 less than Trigger time 394468561288: skip event and try to recover … problem messages keep repeating over and over …

cn/csn1/padme/shifter_instruction/running_the_daq.txt · Last modified: 2018/12/21 09:27 by gianotti@infn.it

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki