progetti:htcondor-tf:htcondor-ce5
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
progetti:htcondor-tf:htcondor-ce5 [2022/03/13 10:23] – dalpra@infn.it | progetti:htcondor-tf:htcondor-ce5 [2022/03/18 08:37] (current) – dalpra@infn.it | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ==== HTCondor-CE 5 - manual setup ==== | ||
+ | |||
+ | Current official documentation has been improved. For further details Refer to the official documentation: | ||
+ | |||
+ | **Note:** WLCG is going to dismiss '' | ||
+ | |||
+ | === Prerequisites | ||
+ | |||
+ | The HTCondor-CE (htc-ce) must be installed on a HTCondor Submit Node (schedd), that is a machine where a SCHEDD daemon runs: | ||
+ | < | ||
+ | MASTER, SCHEDD | ||
+ | </ | ||
+ | The schedd shoud be already tested, a local user shoud be able to submit jobs to the HTCondor slaves (WNs). | ||
+ | |||
+ | Furthermore: | ||
+ | |||
+ | * The htc-ce must hold a valid X509 grid server certificate (IGTF) | ||
+ | * The htc-ce must have a public IP and be reachable from everywhere to the TCP port 9619. | ||
+ | * The htc-ce relies on Argus for authorization, | ||
+ | |||
+ | **Suggestion: | ||
+ | |||
+ | == Common HTCondor rpm installation (valid not only for the condor-ce) == | ||
+ | The repo for the latest HTCondor-CE release (5.1.3 as of writing) is in the same repository of HTCondor 9.0.x | ||
+ | |||
+ | **Note:** As of HTCondor 9 the naming convention and version numbering has changed: | ||
+ | '' | ||
+ | '' | ||
+ | We refer to Long Term Support here | ||
+ | |||
+ | |||
+ | === HTCondor-CE setup === | ||
+ | |||
+ | == Certificate configuration == | ||
+ | < | ||
+ | [root@htc-ce ~]# cd / | ||
+ | [root@htc-ce ~]# ll | ||
+ | -rw-r--r-- 1 root root 2366 Aug 12 14:40 hostcert.pem | ||
+ | -rw----- 1 root root 1675 Aug 12 14:40 hostkey.pem</ | ||
+ | |||
+ | Install grid CA certificates and VO data into / | ||
+ | it needs the EGI-trustanchors.repo [[http:// | ||
+ | < | ||
+ | Copy the "VO data" / | ||
+ | < | ||
+ | |||
+ | == Rpms installation == | ||
+ | * Needed RPMs are: | ||
+ | | ||
+ | < | ||
+ | Check the status of the service and eventually enable it | ||
+ | < | ||
+ | at the end of the configuration remember to start the condor-ce | ||
+ | < | ||
+ | Run the following command to update the CRLs the first time, it will take a while (several minutes) to complete so you can run it in background or from another shell: | ||
+ | < | ||
+ | [root@htc-ce ~]# systemctl enable fetch-crl-cron | ||
+ | [root@htc-ce ~]# systemctl start fetch-crl-cron</ | ||
+ | |||
+ | == GSI and authorization == | ||
+ | HTCondor-CE relies on Argus for authorization. Refer to the official documentation: | ||
+ | It needs the UMD-4-updates.repo in / | ||
+ | < | ||
+ | As root create the file / | ||
+ | < | ||
+ | globus_mapping / | ||
+ | It needs a copy of the certificates that will be accessed with non-root credentials: | ||
+ | < | ||
+ | [root@htc-ce ~]# cp -a / | ||
+ | [root@htc-ce ~]# chown condor: | ||
+ | As root create the file / | ||
+ | < | ||
+ | pep_ssl_server_capath / | ||
+ | pep_ssl_client_cert / | ||
+ | pep_ssl_client_key / | ||
+ | pep_url https:// | ||
+ | pep_timeout 30 # seconds | ||
+ | xacml_resourceid http://< | ||
+ | |||
+ | == HTCondor-CE registration in the Argus server == | ||
+ | The HTCondor-CE has to be registered in the Argus server.\\ | ||
+ | In the Argus server, as root, execute:\\ | ||
+ | < | ||
+ | [root@argus ~]# cp policies.txt policies-htcceAdd.txt</ | ||
+ | |||
+ | Add the new resource to the service (here an example, modify it as needed): | ||
+ | < | ||
+ | |||
+ | | ||
+ | | ||
+ | } | ||
+ | |||
+ | | ||
+ | rule permit { vo=" | ||
+ | rule permit { vo=" | ||
+ | rule permit { vo=" | ||
+ | rule permit { vo=" | ||
+ | rule permit { vo=" | ||
+ | rule permit { vo=" | ||
+ | rule permit { vo=" | ||
+ | } | ||
+ | | ||
+ | | ||
+ | Reset old policies and import new one: | ||
+ | < | ||
+ | [root@argus ~]# pap-admin lp ## list policies </ | ||
+ | |||
+ | You can consider installing the Argus service on the HTCondor-CE host itself. This would probably ease early setup.\\ | ||
+ | Be aware that Argus need read/write access to the / | ||
+ | |||
+ | == Argus client installation == | ||
+ | Install an Argus pep client on your HTCondor-CE | ||
+ | < | ||
+ | |||
+ | == Verify Argus service registration == | ||
+ | To verify that your Argus service is properly configured to work with your htc-ce you have to: | ||
+ | * Create a valid proxy of a supported VO i.e.: < | ||
+ | * Copy the proxy on the root dir of your htc-ce as '' | ||
+ | * Execute the following example command (adapt to your case) | ||
+ | < | ||
+ | |||
+ | On a working setup you should see an output like: | ||
+ | |||
+ | < | ||
+ | Decision: Permit | ||
+ | Obligation: http:// | ||
+ | Username: cms195 | ||
+ | Group: cms | ||
+ | Secondary Groups: cms</ | ||
+ | |||
+ | As a further check (i.e. Argus service) you should see that the empty file | ||
+ | ''/ | ||
+ | |||
+ | < | ||
+ | 383663 -rw-r--r-- 2 root root 0 26 giu 2019 / | ||
+ | [root@argus ~]# ls -li / | ||
+ | | ||
+ | | ||
+ | |||
+ | == HTCondor-CE mapfile == | ||
+ | |||
+ | // | ||
+ | The only files to edit should be '' | ||
+ | // | ||
+ | |||
+ | < | ||
+ | [root@ce07-htc mapfiles.d]# | ||
+ | GSI / | ||
+ | </ | ||
+ | |||
+ | **Note1**: in the example above, '' | ||
+ | |||
+ | **Note2**: The file '' | ||
+ | |||
+ | === Notes for token authentication === | ||
+ | **Note1** There is no callout service such as '' | ||
+ | The general mapping is | ||
+ | < | ||
+ | # SCITOKENS /<TOKEN ISSUER>,< | ||
+ | </ | ||
+ | |||
+ | |||
+ | leaving An empty ''< | ||
+ | **Hint**: the username can remind the CE hostname, as in the example below: | ||
+ | |||
+ | < | ||
+ | [root@ce07-htc mapfiles.d]# | ||
+ | # For the subject list by VO refer to https:// | ||
+ | SCITOKENS / | ||
+ | SCITOKENS / | ||
+ | SCITOKENS / | ||
+ | |||
+ | SCITOKENS / | ||
+ | SCITOKENS / | ||
+ | SCITOKENS / | ||
+ | |||
+ | # Personal token for testing purposes | ||
+ | SCITOKENS / | ||
+ | </ | ||
+ | |||
+ | **Note** the host that is being installed as a HTCondor-CE is also a HTCondor Submit Node. For this reason in both the condor configuration files the value of UID_DOMAIN has to be the same. | ||
+ | |||
+ | = HTCondor-CE config files = | ||
+ | The default configuration path is ''/ | ||
+ | Default configuration from the rpm is almost fine already and only few local settings are to be set or redefined. Best practice is to set them in a | ||
+ | ''/ | ||
+ | |||
+ | See below an example configuration, | ||
+ | |||
+ | < | ||
+ | [root@ce07-htc ~]# cat / | ||
+ | UID_DOMAIN = t1htc_90 | ||
+ | CENTRAL_MANAGER = htc-1 | ||
+ | JOB_ROUTER_SCHEDD2_POOL = $(CENTRAL_MANAGER).cr.cnaf.infn.it: | ||
+ | |||
+ | |||
+ | AUTH_SSL_SERVER_CERTFILE = / | ||
+ | AUTH_SSL_SERVER_KEYFILE = / | ||
+ | AUTH_SSL_SERVER_CADIR = / | ||
+ | AUTH_SSL_SERVER_CAFILE = | ||
+ | AUTH_SSL_CLIENT_CERTFILE = / | ||
+ | AUTH_SSL_CLIENT_KEYFILE = / | ||
+ | AUTH_SSL_CLIENT_CADIR = / | ||
+ | AUTH_SSL_CLIENT_CAFILE = | ||
+ | |||
+ | #disable cache (useful when troubleshooting) | ||
+ | # | ||
+ | |||
+ | FRIENDLY_DAEMONS = (FRIENDLY_DAEMONS) *@(UID_DOMAIN) | ||
+ | |||
+ | # Higher log verbosity | ||
+ | #ALL_DEBUG = D_ALWAYS:2 D_CAT | ||
+ | |||
+ | MASTER.SEC_DEFAULT_AUTHENTICATION_METHODS = | ||
+ | |||
+ | # useful to override common values on a single machine | ||
+ | include ifexist : / | ||
+ | |||
+ | </ | ||
+ | |||
+ | == JobRouter == | ||
+ | |||
+ | With HTCondor-CE 5 JobRouter rules can be written with the new syntax. | ||
+ | Here is an excerpt from '' | ||
+ | |||
+ | < | ||
+ | # By default the old syntax is enabled; this is to enable the new syntax | ||
+ | JOB_ROUTER_USE_DEPRECATED_ROUTER_ENTRIES = False | ||
+ | |||
+ | # Jobs can only start once. | ||
+ | JOB_ROUTER_TRANSFORM_PeriodicHold @=jrt | ||
+ | SET Periodic_Hold = (NumJobStarts >= 1 && JobStatus == 1) || NumJobStarts > 1 | ||
+ | @jrt | ||
+ | |||
+ | |||
+ | # Workaround for bug: | ||
+ | # https:// | ||
+ | # | ||
+ | JOB_ROUTER_TRANSFORM_NumCores @=jrt | ||
+ | REQUIREMENTS MY.xcount > 1 || MY.RequestCpus > 1 || False | ||
+ | SET RequestCpus max({(MY.xcount:1),(MY.RequestCpus: | ||
+ | @jrt | ||
+ | |||
+ | JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES = $(JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES) PeriodicHold NumCores | ||
+ | |||
+ | JOB_ROUTER_ENTRIES = | ||
+ | JOB_ROUTER_ENTRIES_FILE = | ||
+ | JOB_ROUTER_ENTRIES_CMD = | ||
+ | |||
+ | #This matches x509 dteam jobs OR wlcg jobs by token subject, for testing purposes | ||
+ | JOB_ROUTER_ROUTE_dteam @=jrt | ||
+ | REQUIREMENTS (x509UserProxyVoName =?= " | ||
+ | UNIVERSE VANILLA | ||
+ | #SET MaxJobs 100 | ||
+ | #SET MaxIdleJobs 4 | ||
+ | @jrt | ||
+ | |||
+ | # This is to match the virgo jobs having: x509UserProxyFirstFQAN = "/ | ||
+ | JOB_ROUTER_ROUTE_virgovirgo @=jrt | ||
+ | REQUIREMENTS x509UserProxyVoName == " | ||
+ | UNIVERSE VANILLA | ||
+ | SET MaxJobs 100 | ||
+ | SET MaxIdleJobs 4 | ||
+ | @jrt | ||
+ | |||
+ | # This is to match the virgo jobs having: x509UserProxyFirstFQAN = "/ | ||
+ | JOB_ROUTER_ROUTE_virgoligo @=jrt | ||
+ | REQUIREMENTS x509UserProxyVoName == " | ||
+ | UNIVERSE VANILLA | ||
+ | SET MaxJobs 100 | ||
+ | SET MaxIdleJobs 4 | ||
+ | @jrt | ||
+ | |||
+ | JOB_ROUTER_ROUTE_ops @=jrt | ||
+ | REQUIREMENTS x509UserProxyVoName == " | ||
+ | UNIVERSE VANILLA | ||
+ | SET Requirements (TARGET.GPFS_OK =!= False) && (TARGET.t1_allow_sam =?= true) | ||
+ | @jrt | ||
+ | |||
+ | # This route should match both x509 and scitokens | ||
+ | JOB_ROUTER_ROUTE_atlas_sam @=jrt | ||
+ | REQUIREMENTS (x509UserProxyVoName =?= " | ||
+ | // | ||
+ | UNIVERSE VANILLA | ||
+ | SET Requirements (TARGET.t1_allow_sam =?= true) && (!StringListMember(" | ||
+ | @jrt | ||
+ | |||
+ | JOB_ROUTER_ROUTE_atlas @=jrt | ||
+ | REQUIREMENTS (((x509UserProxyVoName =?= " | ||
+ | UNIVERSE VANILLA | ||
+ | SET Requirements (!StringListMember(" | ||
+ | SET MaxJobs 3500 | ||
+ | SET MaxIdleJobs 1280 | ||
+ | @jrt | ||
+ | |||
+ | JOB_ROUTER_ROUTE_alice @=jrt | ||
+ | REQUIREMENTS (x509UserProxyVoName == " | ||
+ | UNIVERSE VANILLA | ||
+ | SET Requirements (t1_allow_sam =!= true) | ||
+ | SET MaxJobs 3500 | ||
+ | SET MaxIdleJobs 1280 | ||
+ | @jrt | ||
+ | |||
+ | JOB_ROUTER_ROUTE_cms @=jrt | ||
+ | REQUIREMENTS ((x509UserProxyVoName =?= " | ||
+ | UNIVERSE VANILLA | ||
+ | SET Requirements (!StringListMember(" | ||
+ | SET MaxJobs 3500 | ||
+ | SET MaxIdleJobs 1280 | ||
+ | @jrt | ||
+ | |||
+ | JOB_ROUTER_ROUTE_cms_sam @=jrt | ||
+ | REQUIREMENTS ((x509UserProxyVoName =?= " | ||
+ | UNIVERSE VANILLA | ||
+ | SET Requirements (!StringListMember(" | ||
+ | SET MaxJobs 3500 | ||
+ | SET MaxIdleJobs 1280 | ||
+ | @jrt | ||
+ | |||
+ | JOB_ROUTER_ROUTE_NAMES = $(JOB_ROUTER_ROUTE_NAMES) virgovirgo virgoligo atlas_sam atlas cms_sam cms | ||
+ | |||
+ | </ | ||
+ | |||
+ | **Note** More *experimental* configurations can be seen [[https:// | ||
+ | |||
+ | |||
+ | == BDI configuration in the HTCondor batch system == | ||
+ | |||
+ | The rpm creates two configuration files and python script: | ||
+ | < | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | |||
+ | **Note:**: the path is under ''/ | ||
+ | |||
+ | In the / | ||
+ | Here the example related to Legnaro | ||
+ | < | ||
+ | HTCONDORCE_SiteName = INFN-LNL-2 | ||
+ | HTCONDORCE_HEPSPEC_INFO = 10.63-HEP-SPEC06 | ||
+ | HTCONDORCE_SPEC = [ specfp2000 = 2506; hep_spec06 = 10.63; specint2000 = 2656 ] | ||
+ | # CPU Benchmarks | ||
+ | HTCONDORCE_VONames = alice, cms, lhcb, dteam | ||
+ | HTCONDORCE_BDII_ELECTION = LEADER | ||
+ | HTCONDORCE_BDII_LEADER = t2-cce-02.lnl.infn.it | ||
+ | HTCONDORCE_CORES = 16 # cores per node | ||
+ | GLUE2DomainID = $(HTCONDORCE_SiteName) | ||
+ | </ | ||
+ | |||
+ | To check that the configuration is formally fine just execute | ||
+ | < | ||
+ | a dump of the glue2 schema shoud appear on stdout. | ||
+ | |||
+ | Finally, activate the service with | ||
+ | < | ||
+ | [root@htc-ce ~]# systemctl start bdii </ | ||
+ | |||
+ | == Create grid-users in the HTCondor-CE == | ||
+ | You must define all the local users that the Argus will map on the CE and all the WNs (it is supposed that you already gridified the WNs with also the same local users) | ||
+ | Here an example based on yaim (CE side): | ||
+ | < | ||
+ | [root@htc-ce ~]# / | ||
+ | 2>&1 | tee / | ||
+ | as default this command returns “INFO: Assuming the node types: UI”. Press “y” because the config_users isn’t supported in the UI profile. | ||
+ | It’s supposed that your WNs are all “grid-ified” (meddleware, | ||
+ | |||
+ | == Start HTCondor-CE process == | ||
+ | < | ||
+ | |||
+ | == Testing the HTCondor-CE == | ||
+ | |||
+ | === Authenticating with GSI === | ||
+ | |||
+ | From a User Interface having the htcondor-ce-client rpm, after generating a valid proxy, the htc-ce can be tested with '' | ||
+ | < | ||
+ | use the -debug option to get more details. | ||
+ | |||
+ | If the previous test fails because of authorization or authentication problems, An useful tool is '' | ||
+ | |||
+ | < | ||
+ | [sdalpra@ui-htc ~]$ export _condor_SEC_CLIENT_AUTHENTICATION_METHODS=GSI ; condor_ce_ping -pool ce01t-htc.cr.cnaf.infn.it: | ||
+ | WRITE command using (AES, AES, and GSI) succeeded as dteam042@htc_t1test to schedd ce01t-htc.cr.cnaf.infn.it. | ||
+ | </ | ||
+ | add the '' | ||
+ | |||
+ | === Authenticating with SCITOKENS === | ||
+ | |||
+ | First give yourself a token file | ||
+ | < | ||
+ | [sdalpra@ui-htc ~]$ oidc-add sdptok | ||
+ | Enter decryption password for account config ' | ||
+ | success | ||
+ | [sdalpra@ui-htc ~]$ umask 0177 && oidc-token sdptok > sdptok | ||
+ | </ | ||
+ | |||
+ | Then let HTCondor-CE know where your token is. Three ways are possible: | ||
+ | - < | ||
+ | - < | ||
+ | - < | ||
+ | |||
+ | Then verify '' | ||
+ | |||
+ | < | ||
+ | [sdalpra@ui-htc ~]$ export _condor_SEC_CLIENT_AUTHENTICATION_METHODS=SCITOKENS ; condor_ce_ping -pool ce01t-htc.cr.cnaf.infn.it: | ||
+ | WRITE command using (AES, AES, and SCITOKENS) succeeded as dteam001@htc_t1test to schedd ce01t-htc.cr.cnaf.infn.it. | ||
+ | </ | ||
+ | |||
+ | If authentication succeeds you can test further with '' | ||
+ | **Note** of course the < | ||
+ | |||
+ | === Tips and trick === | ||
+ | |||
+ | == generating a token for testing purposes == | ||
+ | |||
+ | Assuming you work from a User Interface as unprivileged user, add the following snippet to your .bashrc | ||
+ | |||
+ | < | ||
+ | |||
+ | OIDC_ENV=$HOME/ | ||
+ | mkdir -p $OIDC_ENV | ||
+ | |||
+ | export OIDC_SOCK=`sed -ne ' | ||
+ | export OIDCD_PID=`sed -ne ' | ||
+ | |||
+ | # echo " | ||
+ | # echo " | ||
+ | # echo " | ||
+ | |||
+ | kill -0 $OIDCD_PID || eval `oidc-agent` | ||
+ | |||
+ | # echo " | ||
+ | # echo " | ||
+ | # echo " | ||
+ | |||
+ | echo " | ||
+ | echo " | ||
+ | </ | ||
+ | |||
+ | Then generate a token: | ||
+ | |||
+ | < | ||
+ | [sdalpra@ui-htc ~]$ oidc-gen wlcg | ||
+ | [1] https:// | ||
+ | [...] | ||
+ | [17] https:// | ||
+ | Issuer [https:// | ||
+ | The following scopes are supported: openid profile email offline_access wlcg wlcg.groups storage.read:/ | ||
+ | mpute.create compute.cancel storage.modify:/ | ||
+ | Scopes or ' | ||
+ | reate compute.cancel | ||
+ | Registering Client ... | ||
+ | Generating account configuration ... | ||
+ | accepted | ||
+ | |||
+ | Using a browser on any device, visit: | ||
+ | https:// | ||
+ | |||
+ | And enter the code: ****** | ||
+ | |||
+ | |||
+ | Enter encryption password for account configuration ' | ||
+ | Confirm encryption Password: | ||
+ | Everything setup correctly! | ||
+ | </ | ||
+ | |||
+ | |||
+ | == How to change the uid and gid of an already created condor user == | ||
+ | 993 old condor uid,will be mapped to 601, 990 old condor group will be mapped to 601 | ||
+ | < | ||
+ | id condor | ||
+ | find / -xdev -uid 993 -exec ls -lnd {} \; | wc -l | ||
+ | find / -xdev -gid 990 -exec ls -lnd {} \; | wc -l | ||
+ | vim /etc/passwd | ||
+ | vim /etc/group | ||
+ | find / -xdev -uid 993 -exec chown 601 {} \; | ||
+ | find / -xdev -gid 990 -exec chgrp 601 {} \; | ||
+ | find / -xdev -uid 601 -exec ls -lnd {} \; | wc -l | ||
+ | find / -xdev -gid 601 -exec ls -lnd {} \; | wc -l | ||
+ | reboot </ | ||
+ | |||
+ | == If condor does not start correctly (config from shared FS) == | ||
+ | Sometimes, if you have the HTCondor configuration of nodes in a shared (nfs, gpfs,…) file system, it could happen the nfs client is ready even if not all mount points still completely mounted.\\ | ||
+ | The solution is to add the mount point (in the example is named “sx”) in the / | ||
+ | < | ||
+ | nfs.client.target autofs.service sx.mount </ | ||
+ | The example is referred to a mount point described in the fstab as: | ||
+ | < | ||
+ | |||
+ | == Alternative solution to Argus service (to be implemented) == | ||
+ | Assuming to use LCMAPS, the configuration of files is like the configuration of cream-ce | ||
+ | < | ||
+ | globus_mapping / | ||
+ | [root@htc-ce config.d]# cat / | ||
+ | globus_mapping liblcas_lcmaps_gt4_mapping.so lcmaps_callout </ | ||
+ | (gsi-authz.conf must end with a new line as reported in [[https:// | ||
+ | |||
+ | == / | ||
+ | If you have the Argus server installed on another host, the only files you need are: | ||
+ | < | ||
+ | -rw-r--r-- | ||
+ | -rw------- | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | -rw------- | ||
+ | drwxr-xr-x 13 root | ||
+ | |||
+ | In case you install the Argus server on the same node, or you plan to use lcas/ | ||
+ | < | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | -rw-r--r-- | ||
+ | The gridmapdir is shared among all nodes that actually map a grid user to a local user.\\ | ||
+ | The voms-grid-mapfile and voms-mapfile generally are a copy of the grid-mapfile and voms-mapfile; | ||
+ | |||
+ | == Alias for HTCondor command == | ||
+ | It’s useful to define the alias in / | ||
+ | < | ||
+ | alias cccv=' | ||
+ | |||
+ | == The restart of HTCondor startd (WNs) kills running jobs == | ||
+ | Instead of the restart of services, in order to apply a new configuration, | ||
+ | |||
+ | == gsi-pep-callout.conf (file name details) == | ||
+ | If you already have a production Argus, your defaults could be different and depends of what decided by your farming department that manage the hosts installation and configuration processes (puppet/ | ||
+ | The name of this file come from the definition file here (the version numbers could be different): | ||
+ | ''/ | ||
+ | (installed by argus-gsi-pep-callout-1.3.1-2.el7.centos.x86_64.rpm) and could be modified in:\\ | ||
+ | ''/ | ||
+ | |||
+ | == Se also == | ||
+ | Other references on configuring a HTCondor-CE | ||
+ | |||
+ | * [[https:// | ||
+ | * [[http:// | ||