User Tools

Site Tools


Sidebar

progetti:cloud-areapd:egi_federated_cloud:rocky-centos7_testbed

Rocky-CentOS7 Testbed

Fully integrated Resource Provider INFN-PADOVA-STACK in production since 4 February 2019.

EGI Monitoring/Accounting

Local Monitoring/Accounting

Local dashboard

Layout

  • Controller + Network node: egi-cloud.pd.infn.it
  • Compute nodes: cloud-01:07.pn.pd.infn.it
  • Storage node (images and block storage): cld-stg-01.pd.infn.it
  • OneData provider: one-data-01.pd.infn.it
  • Cloudkeeper, Cloudkeeper-OS, cASO and cloudBDII: egi-cloud-ha.pd.infn.it
  • Cloud site-BDII: egi-cloud-sbdii.pd.infn.it
  • Accounting SSM sender: cert-37.pd.infn.it
  • Network layout available here (authorized users only)

OpenStack configuration

Controller/Network node and Compute nodes were installed according to OpenStack official documentation

We created one project for each EGI FedCloud VO supported, a router and various nets and subnets obtaining the following network topology:

We mount the partitions for the glance and cinder services (cinder not in the fstab file) from 192.168.61.100 with nfs driver:

yum install -y nfs-utils
mkdir -p /var/lib/glance/images
cat<<EOF>>/etc/fstab
192.168.61.100:/glance-egi /var/lib/glance/images     nfs defaults      
EOF
mount -a

We use some specific configurations for cinder services using the following documentation cinder with NFS backend.

EGI FedCloud specific configuration

(see EGI Doc)

Install CAs Certificates and the software for fetching the CRLs in both Controller (egi-cloud) Compute (cloud-01:07) nodes and egi-cloud-ha node:

systemctl stop httpd
curl -L http://repository.egi.eu/sw/production/cas/1/current/repo-files/EGI-trustanchors.repo | sudo tee /etc/yum.repos.d/EGI-trustanchors.repo
yum install -y ca-policy-egi-core fetch-crl  http://artifacts.pd.infn.it/packages/CAP/misc/CentOS7/noarch/ca_TERENA-SSL-CA-3-1.0-1.el7.centos.noarch.rpm
systemctl enable fetch-crl-cron.service
systemctl start fetch-crl-cron.service
cd /etc/pki/ca-trust/source/anchors
ln -s /etc/grid-security/certificates/*.pem .
update-ca-trust extract

On egi-cloud-ha node also install CMD-OS repo:

yum -y install http://repository.egi.eu/sw/production/cmd-os/1/centos7/x86_64/base/cmd-os-release-1.0.1-1.el7.centos.noarch.rpm

Install AAI integration and VOMS support components

Taken from official EGI doc.

To be executed on egi-cloud.pd.infn.it node:

vo=(ops dteam fedcloud.egi.eu enmr.eu)
volast=enmr.eu
EGIHOST=egi-cloud.pd.infn.it
KYPORT=443
HZPORT=8443
yum install -y gridsite mod_auth_openidc
            sed -i "s|443|8443|g" /etc/httpd/conf.d/ssl.conf
            sed -i "s|/etc/pki/tls/certs/localhost.crt|/etc/grid-security/hostcert.pem|g" /etc/httpd/conf.d/ssl.conf 
            sed -i "s|/etc/pki/tls/private/localhost.key|/etc/grid-security/hostkey.pem|g" /etc/httpd/conf.d/ssl.conf 
 
            openstack-config --set /etc/keystone/keystone.conf auth methods password,token,openid,mapped
            openstack-config --set /etc/keystone/keystone.conf openid remote_id_attribute HTTP_OIDC_ISS
            openstack-config --set /etc/keystone/keystone.conf federation trusted_dashboard https://$EGIHOST:$HZPORT/dashboard/auth
/websso/
            curl -L https://raw.githubusercontent.com/openstack/keystone/master/etc/sso_callback_template.html > /etc/keystone/sso_
callback_template.html
            systemctl restart httpd.service
            source admin-openrc.sh
            openstack identity provider create --remote-id https://aai-dev.egi.eu/oidc/ egi.eu
            echo [ > mapping.egi.json
            echo [ > mapping.voms.json
            for i in ${vo[@]} 
            do
             openstack group create $i
             openstack role add member --group $i --project VO:$i
             groupid=$(openstack group show $i -f value -c id)
             cat <<EOF>>mapping.egi.json
    {
        "local": [
            {
                "user": {
                   "type":"ephemeral",
                   "name":"{0}"
        },
                "group": {
                   "id": "$groupid"
                }
            }
        ],
        "remote": [
            {
                "type": "HTTP_OIDC_SUB"
            },
            {
                "type": "HTTP_OIDC_ISS",
                "any_one_of": [
                    "https://aai-dev.egi.eu/oidc/"
                ]
            },
            {
                "type": "OIDC-edu_person_entitlements",
                "regex": true,
                "any_one_of": [
                    "^urn:mace:egi.eu:group:$i:role=vm_operator#aai.egi.eu$"
                ]
            }
        ]
EOF
             [ $i = $volast ] || ( echo "}," >> mapping.egi.json )
             [ $i = $volast ] && ( echo "}" >> mapping.egi.json )
             [ $i = $volast ] && ( echo "]" >> mapping.egi.json )  
             cat <<EOF>>mapping.voms.json
    {
        "local": [
            {
                "user": {
                   "type":"ephemeral",
                   "name":"{0}"
                },
                "group": {
                   "id":"$groupid"
                }
            }
        ],
        "remote": [
            {
                "type":"GRST_CONN_AURI_0"
            },
            {
                "type":"GRST_VOMS_FQANS",
                "any_one_of":[
                    "^/$i/.*"
                ],
                "regex":true
            }
        ]
EOF
            [ $i = $volast ] || ( echo "}," >> mapping.voms.json )
            [ $i = $volast ] && ( echo "}" >> mapping.voms.json )
            [ $i = $volast ] && ( echo "]" >> mapping.voms.json )
            done
            openstack mapping create --rules mapping.egi.json egi-mapping
            openstack federation protocol create --identity-provider egi.eu --mapping egi-mapping openid
            openstack mapping create --rules mapping.voms.json voms
            openstack  federation protocol create --identity-provider egi.eu --mapping voms  mapped
 
mkdir -p /etc/grid-security/vomsdir/${vo[0]}
cat > /etc/grid-security/vomsdir/${vo[0]}/lcg-voms2.cern.ch.lsc <<EOF
/DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch
/DC=ch/DC=cern/CN=CERN Grid Certification Authority
EOF
cat > /etc/grid-security/vomsdir/${vo[0]}/voms2.cern.ch.lsc <<EOF
/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch
/DC=ch/DC=cern/CN=CERN Grid Certification Authority
EOF
mkdir -p /etc/grid-security/vomsdir/${vo[1]}
cat > /etc/grid-security/vomsdir/${vo[1]}/voms2.hellasgrid.gr.lsc <<EOF
/C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr
/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2016
EOF
cat > /etc/grid-security/vomsdir/${vo[1]}/voms.hellasgrid.gr.lsc <<EOF
/C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr
/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2016
EOF
mkdir -p /etc/grid-security/vomsdir/${vo[2]}
cat > /etc/grid-security/vomsdir/${vo[2]}/voms1.grid.cesnet.cz.lsc <<EOF 
/DC=cz/DC=cesnet-ca/O=CESNET/CN=voms1.grid.cesnet.cz
/DC=cz/DC=cesnet-ca/O=CESNET CA/CN=CESNET CA 3
EOF
cat > /etc/grid-security/vomsdir/${vo[0]}/voms2.grid.cesnet.cz.lsc <<EOF 
/DC=cz/DC=cesnet-ca/O=CESNET/CN=voms2.grid.cesnet.cz
/DC=cz/DC=cesnet-ca/O=CESNET CA/CN=CESNET CA 3
EOF
mkdir -p /etc/grid-security/vomsdir/${vo[3]}
cat > /etc/grid-security/vomsdir/${vo[3]}/voms2.cnaf.infn.it.lsc <<EOF
/C=IT/O=INFN/OU=Host/L=CNAF/CN=voms2.cnaf.infn.it
/C=IT/O=INFN/CN=INFN Certification Authority
EOF
cat > /etc/grid-security/vomsdir/${vo[3]}/voms-02.pd.infn.it.lsc <<EOF
/DC=org/DC=terena/DC=tcs/C=IT/L=Frascati/O=Istituto Nazionale di Fisica Nucleare/CN=voms-02.pd.infn.it
/C=NL/ST=Noord-Holland/L=Amsterdam/O=TERENA/CN=TERENA eScience SSL CA 3
EOF
#
cat <<EOF>/etc/httpd/conf.d/wsgi-keystone-oidc-voms.conf
Listen $KYPORT
 
<VirtualHost *:$KYPORT>
    OIDCSSLValidateServer Off
    OIDCProviderTokenEndpointAuth client_secret_basic
    OIDCResponseType "code"
    OIDCClaimPrefix "OIDC-"
    OIDCClaimDelimiter ;
    OIDCScope "openid profile email refeds_edu eduperson_entitlement"
    OIDCProviderMetadataURL https://aai-dev.egi.eu/oidc/.well-known/openid-configuration
    OIDCClientID <your OIDC client token>
    OIDCClientSecret <yout OIDC client secret>
    OIDCCryptoPassphrase somePASSPHRASE
    OIDCRedirectURI https://$EGIHOST:$KYPORT/v3/auth/OS-FEDERATION/websso/openid/redirect
 
# OAuth for CLI access
    OIDCOAuthIntrospectionEndpoint  https://aai-dev.egi.eu/oidc/introspect
    OIDCOAuthClientID <yout OIDC client token>
    OIDCOAuthClientSecret <yout OIDC client secret>
#    OIDCOAuthRemoteUserClaim        sub
 
# Increase Shm cache size for supporting long entitlements
    OIDCCacheShmEntrySizeMax 33297
 
# Use the IGTF trust anchors for CAs and CRLs
    SSLCACertificatePath /etc/grid-security/certificates/
    SSLCARevocationPath /etc/grid-security/certificates/
    SSLCACertificateFile $CA_CERT 
    SSLEngine               on
    SSLCertificateFile      /etc/grid-security/hostcert.pem
    SSLCertificateKeyFile   /etc/grid-security/hostkey.pem
# Verify clients if they send their certificate
    SSLVerifyClient         optional
    SSLVerifyDepth          10
    SSLOptions              +StdEnvVars +ExportCertData
    SSLProtocol             all -SSLv2
    SSLCipherSuite          ALL:!ADH:!EXPORT:!SSLv2:RC4+RSA:+HIGH:+MEDIUM:+LOW
    WSGIDaemonProcess keystone-public processes=5 threads=1 user=keystone group=keystone display-name=%{GROUP}
    WSGIProcessGroup keystone-public
    WSGIScriptAlias / /usr/bin/keystone-wsgi-public
    WSGIApplicationGroup %{GLOBAL}
    WSGIPassAuthorization On
    LimitRequestBody 114688
    <IfVersion >= 2.4>
      ErrorLogFormat "%{cu}t %M"
    </IfVersion>
    ErrorLog /var/log/httpd/keystone.log
    CustomLog /var/log/httpd/keystone_access.log combined
    <Directory /usr/bin>
        <IfVersion >= 2.4>
            Require all granted
        </IfVersion>
        <IfVersion < 2.4>
            Order allow,deny
            Allow from all
        </IfVersion>
    </Directory>
    <Location /v3/OS-FEDERATION/identity_providers/egi.eu/protocols/mapped/auth>
      # populate ENV variables
      GridSiteEnvs on
      # turn off directory listings
      GridSiteIndexes off
      # accept GSI proxies from clients
      GridSiteGSIProxyLimit 4
      # disable GridSite method extensions
      GridSiteMethods ""
 
      Require all granted
      Options -MultiViews
    </Location>
    <Location ~ "/v3/auth/OS-FEDERATION/websso/openid">
        AuthType  openid-connect
        Require   valid-user
        #Require  claim iss:https://aai-dev.egi.eu/
        LogLevel  debug
    </Location>
 
    <Location ~ "/v3/OS-FEDERATION/identity_providers/egi.eu/protocols/openid/auth">
        Authtype oauth20
        Require   valid-user
        #Require  claim iss:https://aai-dev.egi.eu/
        LogLevel  debug
    </Location>
</VirtualHost>
Alias /identity /usr/bin/keystone-wsgi-public
<Location /identity>
    SetHandler wsgi-script
    Options +ExecCGI
 
    WSGIProcessGroup keystone-public
    WSGIApplicationGroup %{GLOBAL}
    WSGIPassAuthorization On
</Location>
EOF
            sed -i "s|http://$EGIHOST:$KYPORT|https://$EGIHOST|g" /etc/*/*.conf
            source admin-openrc.sh
            for i in public internal admin
            do
             keyendid=$(openstack endpoint list --service keystone --interface $i -f value -c ID) 
             openstack endpoint set --url https://$EGIHOST/v3 $keyendid
            done
            systemctl restart httpd.service
 

OpenStack Dashboard (Horizon) Configuration:

  • Edit /etc/openstack-dashboard/local_settings file and set:
OPENSTACK_KEYSTONE_URL = "https://%s/v3" % OPENSTACK_HOST
WEBSSO_ENABLED = True
WEBSSO_INITIAL_CHOICE = "credentials"
 
WEBSSO_CHOICES = (
    ("credentials", _("Keystone Credentials")),
    (""openid", _("EGI Check-in"))
)

To change the dashboard logo, copy the right svg file in /usr/share/openstack-dashboard/openstack_dashboard/static/dashboard/img/logo-splash.svg

For publicly exposing on https some OpenStack services do not forget to create the files /etc/httpd/conf.d/wsgi-nova,neutron,glance,cinder.conf and set the corresponding endpoints before to restart everyhting.

Install FedCloud BDII

(See EGI integration guide and BDII configuration guide) Installing the resource bdii and the cloud-info-provider in egi-cloud-ha (with CMD-OS repo already installed):

yum -y install bdii cloud-info-provider cloud-info-provider-openstack

Customize the configuration file /etc/cloud-info-provider/sample.openstack.yaml with the local sites' infos, and rename it /etc/cloud-info-provider/openstack.yaml

Customize the file /etc/cloud-info-provider/openstack.rc with the right credential, for example:

export OS_AUTH_URL=https://egi-cloud.pd.infn.it:443/v3
export OS_PROJECT_DOMAIN_ID=default
export OS_REGION_NAME=RegionOne
export OS_USER_DOMAIN_ID=default
export OS_PROJECT_NAME=admin
export OS_IDENTITY_API_VERSION=3
export OS_USERNAME=accounting
export OS_PASSWORD=<the user password>
export OS_AUTH_TYPE=password
export OS_CACERT=/etc/pki/tls/certs/ca-bundle.crt

Create the file /var/lib/bdii/gip/provider/cloud-info-provider that calls the provider with the correct options for your site, for example:

cat<<EOF>/var/lib/bdii/gip/provider/cloud-info-provider
#!/bin/sh
 
. /etc/cloud-info-provider/openstack.rc
 
for P in $(openstack project list -c Name -f value); do
    cloud-info-provider-service --yaml /etc/cloud-info-provider/openstack.yaml \
                                --os-tenant-name $P \
                                --middleware openstack
done
EOF

Run manually the cloud-info-provider script and check that the output return the complete LDIF. To do so, execute:

chmod +x /var/lib/bdii/gip/provider/cloud-info-provider
/var/lib/bdii/gip/provider/cloud-info-provider
/sbin/chkconfig bdii on

Now you can start the bdii service:

systemctl start bdii

Use the command below to see if the information is being published:

ldapsearch -x -h localhost -p 2170 -b o=glue

Do not forget to open port 2170:

firewall-cmd --add-port=2170/tcp
firewall-cmd --permanent --add-port=2170/tcp
systemctl restart firewalld

Information on how to set up the site-BDII in egi-cloud-sbdii.pd.infn.it is available here

Add your cloud-info-provider to your site-BDII egi-cloud-sbdii.pd.infn.it by adding new lines in the site.def like this:

BDII_REGIONS="CLOUD BDII"
BDII_CLOUD_URL="ldap://egi-cloud-ha.pn.pd.infn.it:2170/GLUE2GroupID=cloud,o=glue"
BDII_BDII_URL="ldap://egi-cloud-sbdii.pd.infn.it:2170/mds-vo-name=resource,o=grid"

Use the same APEL/SSM of grid site

Cloud usage records are sent to APEL through the ssmsend program installed in cert-37.pd.infn.it:

[root@cert-37 ~]# cat /etc/cron.d/ssm-cloud 
# send buffered usage records to APEL
30 */24 * * * root /usr/bin/ssmsend -c /etc/apel/sender-cloud.cfg

It is therefore neede to install and configure NFS on egi-cloud-ha:

[root@egi-cloud-ha ~]# yum -y install nfs-utils
[root@egi-cloud-ha ~]# mkdir -p /var/spool/apel/outgoing/openstack
[root@egi-cloud-ha ~]# cat<<EOF>>/etc/exports 
/var/spool/apel/outgoing/openstack cert-37.pd.infn.it(rw,sync)
EOF
[root@egi-cloud-ha ~]# systemctl start nfs-server

In case of APEL nagios probe failure, check if /var/spool/apel/outgoing/openstack is properly mounted by cert-37

To check if accounting records are properly received by APEL server look at this site

Install the accounting system (cASO)

(see cASO installation guide )

On egi-cloud create accounting user and role, and set the proper policies:

openstack user create --domain default --password <ACCOUNTIN_PASSWORD> accounting
openstack role create accounting
for i in VO:fedcloud.egi.eu VO:enmr.eu VO:ops; do openstack role add --project $i --user accounting accounting; done
cat<<EOF>>/etc/keystone/policy.json
"accounting_role": "role:accounting"
"identity:list_users": "rule:admin_required or rule:accounting_role"
EOF

Install cASO on egi-cloud-ha (with CMD-OS repo already installed):

yum -y install caso

Edit the /etc/caso/caso.conf file

openstack-config --set /etc/caso/caso.conf DEFAULT site_name INFN-PADOVA-STACK
openstack-config --set /etc/caso/caso.conf DEFAULT projects VO:ops,VO:fedcloud.egi.eu,VO:enmr.eu
openstack-config --set /etc/caso/caso.conf DEFAULT messengers caso.messenger.ssm.SSMMessengerV02
openstack-config --set /etc/caso/caso.conf DEFAULT log_dir /var/log/caso
openstack-config --set /etc/caso/caso.conf DEFAULT log_file caso.log
openstack-config --set /etc/caso/caso.conf keystone_auth auth_type password
openstack-config --set /etc/caso/caso.conf keystone_auth username accounting
openstack-config --set /etc/caso/caso.conf keystone_auth password ACCOUNTING_PASSWORD
openstack-config --set /etc/caso/caso.conf keystone_auth auth_url https://egi-cloud.pd.infn.it/v3
openstack-config --set /etc/caso/caso.conf keystone_auth cafile /etc/pki/tls/certs/ca-bundle.crt
openstack-config --set /etc/caso/caso.conf keystone_auth project_domain_id default
openstack-config --set /etc/caso/caso.conf keystone_auth project_domain_name default
openstack-config --set /etc/caso/caso.conf keystone_auth user_domain_id default
openstack-config --set /etc/caso/caso.conf keystone_auth user_domain_name default

Create the directories

mkdir /var/spool/caso /var/log/caso /var/spool/apel/outgoing/openstack/

Test it

caso-extract -v -d

Create the cron job

cat <<EOF>/etc/cron.d/caso 
# extract and send usage records to APEL/SSM 
10 * * * * root /usr/bin/caso-extract >> /var/log/caso/caso.log 2>&1 ; chmod go+w -R /var/spool/apel/outgoing/openstack/
EOF

Install Cloudkeeper and Cloudkeeper-OS

On egi-cloud.pd.infn.it create a cloudkeeper user in keystone:

openstack user create --domain default --password CLOUDKEEPER_PASS cloudkeeper

and, for each project, add the cloudkeeper user with the user role

for i in VO:ops VO:fedcloud.egi.eu VO:enmr.eu; do openstack role add --project $i --user cloudkeeper user; done

Install Cloudkeeper and Cloudkeeper-OS on egi-cloud-ha (with CMD-OS repo already installed):

yum -y install cloudkeeper cloudkeeper-os

Edit /etc/cloudkeeper/cloudkeeper.yml and add the list of VO image lists and the IP address where needed:

  - https://PERSONAL_ACCESS_TOKEN:x-oauth-basic@vmcaster.appdb.egi.eu/store/vo/fedcloud.egi.eu/image.list
  - https://PERSONAL_ACCESS_TOKEN:x-oauth-basic@vmcaster.appdb.egi.eu/store/vo/ops/image.list
  - https://PERSONAL_ACCESS_TOKEN:x-oauth-basic@vmcaster.appdb.egi.eu/store/vo/enmr.eu/image.list

Edit the /etc/cloudkeeper-os/cloudkeeper-os.conf file

openstack-config --set /etc/cloudkeeper-os/cloudkeeper-os.conf DEFAULT log_file cloudkeeper-os.log
openstack-config --set /etc/cloudkeeper-os/cloudkeeper-os.conf DEFAULT log_dir /var/log/cloudkeeper-os/
openstack-config --set /etc/cloudkeeper-os/cloudkeeper-os.conf keystone_authtoken auth_url https://egi-cloud.pd.infn.it/v3
openstack-config --set /etc/cloudkeeper-os/cloudkeeper-os.conf keystone_authtoken username cloudkeeper
openstack-config --set /etc/cloudkeeper-os/cloudkeeper-os.conf keystone_authtoken password CLOUDKEEPER_PASS
openstack-config --set /etc/cloudkeeper-os/cloudkeeper-os.conf keystone_authtoken cafile /etc/pki/tls/certs/ca-bundle.crt
openstack-config --set /etc/cloudkeeper-os/cloudkeeper-os.conf keystone_authtoken cacert /etc/pki/tls/certs/ca-bundle.crt
openstack-config --set /etc/cloudkeeper-os/cloudkeeper-os.conf keystone_authtoken user_domain_name  default
openstack-config --set /etc/cloudkeeper-os/cloudkeeper-os.conf keystone_authtoken project_domain_name default

Creating the /etc/cloudkeeper-os/voms.json mapping file:

cat<<EOF>/etc/cloudkeeper-os/voms.json
{
    "ops": {
        "tenant": "VO:ops"
    },
    "enmr.eu": {
        "tenant": "VO:enmr.eu"
    },
    "fedcloud.egi.eu": {
        "tenant": "VO:fedcloud.egi.eu"
    }
}
EOF

Enable and start the services

systemctl enable cloudkeeper-os
systemctl start cloudkeeper-os
systemctl enable cloudkeeper.timer
systemctl start cloudkeeper.timer

Installing Squid for CVMFS (optional)

Install and configure squid on cloud-01 and cloud-02 for use from VMs (see https://cvmfs.readthedocs.io/en/stable/cpt-squid.html):

yum install -y squid
sed -i "s|/var/spool/squid|/export/data/spool/squid|g" /etc/squid/squid.conf
cat<<EOF>>/etc/squid/squid.conf
minimum_expiry_time 0
 
max_filedesc 8192
maximum_object_size 1024 MB
 
cache_mem 128 MB
maximum_object_size_in_memory 128 KB
# 50 GB disk cache
cache_dir ufs /export/data/spool/squid 50000 16 256
acl cvmfs dst cvmfs-stratum-one.cern.ch
acl cvmfs dst cernvmfs.gridpp.rl.ac.uk
acl cvmfs dst cvmfs.racf.bnl.gov
acl cvmfs dst cvmfs02.grid.sinica.edu.tw
acl cvmfs dst cvmfs.fnal.gov
acl cvmfs dst cvmfs-atlas-nightlies.cern.ch
acl cvmfs dst cvmfs-egi.gridpp.rl.ac.uk
acl cvmfs dst klei.nikhef.nl
acl cvmfs dst cvmfsrepo.lcg.triumf.ca
acl cvmfs dst cvmfsrep.grid.sinica.edu.tw
acl cvmfs dst cvmfs-s1bnl.opensciencegrid.org
acl cvmfs dst cvmfs-s1fnal.opensciencegrid.org
http_access allow cvmfs
EOF
rm -rf /var/spool/squid
mkdir -p /export/data/spool/squid
chown -R squid.squid /export/data/spool/squid
squid -k parse
squid -z
ulimit -n 8192
systemctl start squid
firewall-cmd --permanent --add-port 3128/tcp
systemctl restart firewalld

Use CVMFS_HTTP_PROXY="http://cloud-01.pn.pd.infn.it:3128|http://cloud-02.pn.pd.infn.it:3128" on the CVMFS clients.

Actually, better to use already existing squids: CVMFS_HTTP_PROXY="http://squid-01.pd.infn.it:3128|http://squid-02.pd.infn.it:3128"

Local Accounting

A local accounting system based on Grafana, InfluxDB and Collectd has been set up following the instructions here.

Local Monitoring

Ganglia

  • Install ganglia-gmond on all servers
  • Configure cluster and host fields in /etc/ganglia/gmond.conf to point to cld-ganglia.cloud.pd.infn.it server
  • Finally: systemctl enable gmond.service; systemctl start gmond.service

Nagios

  • Install on compute nodes nsca-client, nagios, nagios-plugins-disk, nagios-plugins-procs, nagios-plugins, nagios-common, nagios-plugins-load
  • Copy the file cld-nagios:/var/spool/nagios/.ssh/id_rsa.pub in a file named /home/nagios/.ssh/authorized_keys of the controller and all compute nodes, and in a file named /root/.ssh/authorized_keys of the controller. Be also sure that /home/nagios is the default directory in the /etc/passwd file.
  • Then do in all compute nodes:
$ echo encryption_method=1 >> /etc/nagios/send_nsca.cfg
$ usermod -a -G libvirt nagios
$ sed -i 's|#password=|password=NSCA_PASSWORD|g' /etc/nagios/send_nsca.cfg
# then be sure the files below are in /usr/local/bin:
$ ls /usr/local/bin/
check_kvm  check_kvm_wrapper.sh
$ cat <<EOF > crontab.txt 
# Puppet Name: nagios_check_kvm
0 */1 * * * /usr/local/bin/check_kvm_wrapper.sh
EOF
$ crontab crontab.txt
$ crontab -l
  • On the contoller node, add in /etc/nova/policy.json the line:
"os_compute_api:servers:create:forced_host": "" 

and in /etc/cinder/policy.json the line:

"volume_extension:quotas:show": ""
  • Create in the VO:dteam project a cirros VM with tiny flavour named nagios-probe and access key named dteam-key (saving the private key file dteam-key.pem in egi-cloud /root directory), and take note of its ID and private IP. Then on the cld-nagios server put its ID in the file /var/spool/nagios/egi-cloud-vm-volume_compute_node.sh and its IP in the file /etc/nagios/objects/egi_fedcloud.cfg, at the command of the service Openstack Check Metadata (e.g.: check_command check_metadata_egi!dteam-key.pem!10.0.2.20).
  • On the cld-nagios server check/modify the content of /var/spool/nagios/*egi*.sh, of the files /etc/nagios/objects/egi* and /usr/lib64/nagios/plugins/*egi*, and of the files owned by nagios user found in /var/spool/nagios when doing "su - nagios"

Security incindents and IP traceability

See here for the description of the full process On egi-cloud do install the CNRS tools, they allow to track the usage of floating IPs as in the example below:

[root@egi-cloud ~]# os-ip-trace 90.147.77.229
+--------------------------------------+-----------+---------------------+---------------------+
|              device id               | user name |   associating date  | disassociating date |
+--------------------------------------+-----------+---------------------+---------------------+
| 3002b1f1-bca3-4e4f-b21e-8de12c0b926e |   admin   | 2016-11-30 14:01:38 | 2016-11-30 14:03:02 |
+--------------------------------------+-----------+---------------------+---------------------+

Save and archive important log files:

  • On egi-cloud and each compute node cloud-0%, add the line "*.* @@192.168.60.31:514" in the file /etc/rsyslog.conf, and restart rsyslog service with "systemctl restart rsyslog". It logs /var/log/secure,messages files in cld-foreman:/var/mpathd/log/egi-cloud,cloud-0%.
  • In cld-foreman, check that the file /etc/cron.daily/vm-log.sh logs the /var/log/libvirt/qemu/*.log files of egi-cloud and each cloud-0% compute node (passwordless ssh must be enabled from cld-foreman to each node)

Install ulogd in the controller node

yum install -y libnetfilter_log
yum localinstall -y http://repo.iotti.biz/CentOS/7/x86_64/ulogd-2.0.5-2.el7.lux.x86_64.rpm
yum localinstall -y http://repo.iotti.biz/CentOS/7/x86_64/libnetfilter_acct-1.0.2-3.el7.lux.1.x86_64.rpm

and configure /etc/ulogd.conf by replacing properly accept_src_filter variable (accept_src_filter=10.0.0.0/16) starting from the one in cld-ctrl-01:/etc/ulogd.conf. Then copy cld-ctrl-01:/root/ulogd/start-ulogd to egi-cloud:/root/ulogd/start-ulogd, replace the qrouter ID and execute /root/ulogd/start-ulogd. Then add to /etc/rc.d/rc.local the line /root/ulogd/start-ulogd &, and make rc.local executable. Start the service

systemctl enable ulogd
systemctl start ulogd

Finally, be sure that /etc/rsyslog.conf file has the lines "local6.* /var/log/ulogd.log" and "*.info;mail.none;authpriv.none;cron.none;local6.none /var/log/messages", and restart rsyslog service.

Troubleshooting

  • Passwordless ssh access to egi-cloud from cld-nagios and from egi-cloud to cloud-0* has been already configured
  • If cld-nagios does not ping egi-cloud, be sure that the rule "route add -net 192.168.60.0 netmask 255.255.255.0 gw 192.168.114.1" has been added in egi-cloud (/etc/sysconfig/network-script/route-em1 file should contain the line: 192.168.60.0/24 via 192.168.114.1)
  • In case of Nagios alarms, try to restart all cloud services doing the following:
$ ssh root@egi-cloud
[root@egi-cloud ~]# ./rocky_controller.sh restart
[root@egi-cloud ~]# for i in $(seq 1 7); do ssh cloud-0$i ./rocky_compute.sh restart; done
  • Resubmit the Nagios probe and check if it works again
  • In case the problem persist, check the consistency of the DB by executing (this also fix the issue when quota overview in the dashboard is not consistent with actual VMs active):
[root@egi-cloud ~]# python nova-quota-sync.py
  • In case of EGI Nagios alarm, check that the user running the Nagios probes is not belonging also to tenants other than "ops". Also check that the right image and flavour is set in URL of the service published in the GOCDB.
  • in case of reboot of egi-cloud server:
    • check its network configuration (use IPMI if not reachable): all 4 interfaces must be up and the default gateway must be 90.147.77.254.
    • check DNS in /etc/resolv.conf and GATEWAY in /etc/sysconfig/network
    • check routing with $route -n, if needed do: $ip route replace default via 90.147.77.254. Also be sure to have a route for 90.147.77.0 network.
    • check if storage mountpoints 192.168.61.100:/glance-egi and cinder-egi are properly mounted (do: $ df -h)
    • check if port 8472 is open on the local firewall (it is used by linuxbridge vxlan networks)
  • in case of reboot of cloud-0* server (use IPMI if not reachable): all 3 interfaces must be up and the default destination must have 192.168.114.1 as gateway
    • check its network configuration
    • check if all partitions in /etc/fstab are properly mounted (do: $ df -h)
  • In case of network instabilities, check if GRO if off for all interfaces, e.g.:
[root@egi-cloud ~]# /sbin/ethtool -k em3 | grep -i generic-receive-offload
generic-receive-offload: off
  • Also check if /sbin/ifup-local is there:
[root@egi-cloud ~]# cat /sbin/ifup-local 
#!/bin/bash
case "$1" in
em1)
/sbin/ethtool -K $1 gro off
;;
em2)
/sbin/ethtool -K $1 gro off
;;
em3)
/sbin/ethtool -K $1 gro off
;;
em4)
/sbin/ethtool -K $1 gro off
;;
esac
exit 0
  • If you need to change the project quotas, check "openstack help quota set", e.g.:
[root@egi-cloud ~]# source admin-openrc.sh
[root@egi-cloud ~]# openstack quota set --cores 184 VO:enmr.eu
progetti/cloud-areapd/egi_federated_cloud/rocky-centos7_testbed.txt · Last modified: 2020/02/03 16:53 by verlato@infn.it