progetti:cloud-areapd:egi_federated_cloud:rocky-centos7_testbed
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
progetti:cloud-areapd:egi_federated_cloud:rocky-centos7_testbed [2019/01/09 10:52] – created verlato@infn.it | progetti:cloud-areapd:egi_federated_cloud:rocky-centos7_testbed [2023/11/28 13:26] (current) – verlato@infn.it | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Rocky-CentOS7 Testbed ====== | ||
+ | Fully integrated Resource Provider [[https:// | ||
+ | === EGI Monitoring/ | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | === Local Monitoring/ | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | === Local dashboard === | ||
+ | * [[https:// | ||
+ | |||
+ | ===== Layout ===== | ||
+ | |||
+ | * Controller + Network node: **egi-cloud.pd.infn.it** | ||
+ | |||
+ | * Compute nodes: **cloud-01: | ||
+ | | ||
+ | * Storage node (images and block storage): **cld-stg-01.pd.infn.it** | ||
+ | |||
+ | * OneData provider: **one-data-01.pd.infn.it** | ||
+ | |||
+ | * Cloudkeeper, | ||
+ | |||
+ | * Cloud site-BDII: **egi-cloud-sbdii.pd.infn.it** (VM on cert-03 server) | ||
+ | |||
+ | * Accounting SSM sender: **cert-37.pd.infn.it** (VM on cert-03 server) | ||
+ | |||
+ | * Network layout available [[http:// | ||
+ | |||
+ | |||
+ | ===== OpenStack configuration ===== | ||
+ | |||
+ | Controller/ | ||
+ | |||
+ | We created one project for each EGI FedCloud VO supported, a router and various nets and subnets obtaining the following network topology: | ||
+ | |||
+ | {{: | ||
+ | |||
+ | We mount the partitions for the glance and cinder services (cinder not in the fstab file) from 192.168.61.100 with nfs driver: | ||
+ | <code bash> | ||
+ | yum install -y nfs-utils | ||
+ | mkdir -p / | ||
+ | cat<< | ||
+ | 192.168.61.100:/ | ||
+ | EOF | ||
+ | mount -a | ||
+ | </ | ||
+ | We use some specific configurations for cinder services using the following documentation [[http:// | ||
+ | | ||
+ | ===== EGI FedCloud specific configuration ===== | ||
+ | |||
+ | (see [[https:// | ||
+ | |||
+ | Install CAs Certificates and the software for fetching the CRLs in both Controller (egi-cloud) Compute (cloud-01: | ||
+ | <code bash> | ||
+ | systemctl stop httpd | ||
+ | curl -L http:// | ||
+ | yum install -y ca-policy-egi-core fetch-crl | ||
+ | systemctl enable fetch-crl-cron.service | ||
+ | systemctl start fetch-crl-cron.service | ||
+ | cd / | ||
+ | ln -s / | ||
+ | update-ca-trust extract | ||
+ | </ | ||
+ | On **egi-cloud-ha** node also install CMD-OS repo: | ||
+ | <code bash> | ||
+ | yum -y install http:// | ||
+ | </ | ||
+ | ==== Install AAI integration and VOMS support components ==== | ||
+ | Taken from [[https:// | ||
+ | |||
+ | To be executed on **egi-cloud.pd.infn.it** node: | ||
+ | <code bash> | ||
+ | vo=(ops dteam fedcloud.egi.eu enmr.eu) | ||
+ | volast=enmr.eu | ||
+ | EGIHOST=egi-cloud.pd.infn.it | ||
+ | KYPORT=443 | ||
+ | HZPORT=8443 | ||
+ | yum install -y gridsite mod_auth_openidc | ||
+ | sed -i " | ||
+ | sed -i " | ||
+ | sed -i " | ||
+ | |||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | /websso/ | ||
+ | curl -L https:// | ||
+ | callback_template.html | ||
+ | systemctl restart httpd.service | ||
+ | source admin-openrc.sh | ||
+ | openstack identity provider create --remote-id https:// | ||
+ | echo [ > mapping.egi.json | ||
+ | echo [ > mapping.voms.json | ||
+ | for i in ${vo[@]} | ||
+ | do | ||
+ | | ||
+ | | ||
+ | | ||
+ | cat << | ||
+ | { | ||
+ | " | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | } | ||
+ | ], | ||
+ | " | ||
+ | { | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | ] | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | ] | ||
+ | } | ||
+ | ] | ||
+ | EOF | ||
+ | [ $i = $volast ] || ( echo " | ||
+ | [ $i = $volast ] && ( echo " | ||
+ | [ $i = $volast ] && ( echo " | ||
+ | cat << | ||
+ | { | ||
+ | " | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | } | ||
+ | ], | ||
+ | " | ||
+ | { | ||
+ | " | ||
+ | }, | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | " | ||
+ | ], | ||
+ | " | ||
+ | } | ||
+ | ] | ||
+ | EOF | ||
+ | [ $i = $volast ] || ( echo " | ||
+ | [ $i = $volast ] && ( echo " | ||
+ | [ $i = $volast ] && ( echo " | ||
+ | done | ||
+ | openstack mapping create --rules mapping.egi.json egi-mapping | ||
+ | openstack federation protocol create --identity-provider egi.eu --mapping egi-mapping openid | ||
+ | openstack mapping create --rules mapping.voms.json voms | ||
+ | openstack | ||
+ | |||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | # | ||
+ | cat << | ||
+ | Listen $KYPORT | ||
+ | |||
+ | < | ||
+ | OIDCSSLValidateServer Off | ||
+ | OIDCProviderTokenEndpointAuth client_secret_basic | ||
+ | OIDCResponseType " | ||
+ | OIDCClaimPrefix " | ||
+ | OIDCClaimDelimiter ; | ||
+ | OIDCScope " | ||
+ | OIDCProviderMetadataURL https:// | ||
+ | OIDCClientID <your OIDC client token> | ||
+ | OIDCClientSecret <yout OIDC client secret> | ||
+ | OIDCCryptoPassphrase somePASSPHRASE | ||
+ | OIDCRedirectURI https:// | ||
+ | |||
+ | # OAuth for CLI access | ||
+ | OIDCOAuthIntrospectionEndpoint | ||
+ | OIDCOAuthClientID <yout OIDC client token> | ||
+ | OIDCOAuthClientSecret <yout OIDC client secret> | ||
+ | # OIDCOAuthRemoteUserClaim | ||
+ | |||
+ | # Increase Shm cache size for supporting long entitlements | ||
+ | OIDCCacheShmEntrySizeMax 33297 | ||
+ | |||
+ | # Use the IGTF trust anchors for CAs and CRLs | ||
+ | SSLCACertificatePath / | ||
+ | SSLCARevocationPath / | ||
+ | SSLCACertificateFile $CA_CERT | ||
+ | SSLEngine | ||
+ | SSLCertificateFile | ||
+ | SSLCertificateKeyFile | ||
+ | # Verify clients if they send their certificate | ||
+ | SSLVerifyClient | ||
+ | SSLVerifyDepth | ||
+ | SSLOptions | ||
+ | SSLProtocol | ||
+ | SSLCipherSuite | ||
+ | WSGIDaemonProcess keystone-public processes=5 threads=1 user=keystone group=keystone display-name=%{GROUP} | ||
+ | WSGIProcessGroup keystone-public | ||
+ | WSGIScriptAlias / / | ||
+ | WSGIApplicationGroup %{GLOBAL} | ||
+ | WSGIPassAuthorization On | ||
+ | LimitRequestBody 114688 | ||
+ | < | ||
+ | ErrorLogFormat " | ||
+ | </ | ||
+ | ErrorLog / | ||
+ | CustomLog / | ||
+ | < | ||
+ | < | ||
+ | Require all granted | ||
+ | </ | ||
+ | < | ||
+ | Order allow,deny | ||
+ | Allow from all | ||
+ | </ | ||
+ | </ | ||
+ | < | ||
+ | # populate ENV variables | ||
+ | GridSiteEnvs on | ||
+ | # turn off directory listings | ||
+ | GridSiteIndexes off | ||
+ | # accept GSI proxies from clients | ||
+ | GridSiteGSIProxyLimit 4 | ||
+ | # disable GridSite method extensions | ||
+ | GridSiteMethods "" | ||
+ | |||
+ | Require all granted | ||
+ | Options -MultiViews | ||
+ | </ | ||
+ | < | ||
+ | AuthType | ||
+ | Require | ||
+ | # | ||
+ | LogLevel | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | Authtype oauth20 | ||
+ | Require | ||
+ | # | ||
+ | LogLevel | ||
+ | </ | ||
+ | </ | ||
+ | Alias /identity / | ||
+ | < | ||
+ | SetHandler wsgi-script | ||
+ | Options +ExecCGI | ||
+ | |||
+ | WSGIProcessGroup keystone-public | ||
+ | WSGIApplicationGroup %{GLOBAL} | ||
+ | WSGIPassAuthorization On | ||
+ | </ | ||
+ | EOF | ||
+ | sed -i " | ||
+ | source admin-openrc.sh | ||
+ | for i in public internal admin | ||
+ | do | ||
+ | | ||
+ | | ||
+ | done | ||
+ | systemctl restart httpd.service | ||
+ | | ||
+ | </ | ||
+ | OpenStack Dashboard (Horizon) Configuration: | ||
+ | * Edit / | ||
+ | <code bash> | ||
+ | OPENSTACK_KEYSTONE_URL = " | ||
+ | WEBSSO_ENABLED = True | ||
+ | WEBSSO_INITIAL_CHOICE = " | ||
+ | |||
+ | WEBSSO_CHOICES = ( | ||
+ | (" | ||
+ | ("" | ||
+ | ) | ||
+ | </ | ||
+ | To change the dashboard logo, copy the right svg file in / | ||
+ | |||
+ | For publicly exposing on https some OpenStack services do not forget to create the files / | ||
+ | |||
+ | ==== Install FedCloud BDII ==== | ||
+ | |||
+ | (See [[https:// | ||
+ | Installing the resource bdii and the cloud-info-provider in **egi-cloud-ha** (with CMD-OS repo already installed): | ||
+ | <code bash> | ||
+ | yum -y install bdii cloud-info-provider cloud-info-provider-openstack | ||
+ | </ | ||
+ | Customize the configuration file / | ||
+ | |||
+ | Customize the file / | ||
+ | <code bash> | ||
+ | export OS_AUTH_URL=https:// | ||
+ | export OS_PROJECT_DOMAIN_ID=default | ||
+ | export OS_REGION_NAME=RegionOne | ||
+ | export OS_USER_DOMAIN_ID=default | ||
+ | export OS_PROJECT_NAME=admin | ||
+ | export OS_IDENTITY_API_VERSION=3 | ||
+ | export OS_USERNAME=accounting | ||
+ | export OS_PASSWORD=< | ||
+ | export OS_AUTH_TYPE=password | ||
+ | export OS_CACERT=/ | ||
+ | </ | ||
+ | Create the file / | ||
+ | <code bash> | ||
+ | cat<< | ||
+ | #!/bin/sh | ||
+ | |||
+ | . / | ||
+ | |||
+ | for P in $(openstack project list -c Name -f value); do | ||
+ | cloud-info-provider-service --yaml / | ||
+ | --os-tenant-name $P \ | ||
+ | --middleware openstack | ||
+ | done | ||
+ | EOF | ||
+ | </ | ||
+ | Run manually the cloud-info-provider script and check that the output return the complete LDIF. To do so, execute: | ||
+ | <code bash> | ||
+ | chmod +x / | ||
+ | / | ||
+ | / | ||
+ | </ | ||
+ | Now you can start the bdii service: | ||
+ | <code bash> | ||
+ | systemctl start bdii | ||
+ | </ | ||
+ | Use the command below to see if the information is being published: | ||
+ | <code bash> | ||
+ | ldapsearch -x -h localhost -p 2170 -b o=glue | ||
+ | </ | ||
+ | Do not forget to open port 2170: | ||
+ | <code bash> | ||
+ | firewall-cmd --add-port=2170/ | ||
+ | firewall-cmd --permanent --add-port=2170/ | ||
+ | systemctl restart firewalld | ||
+ | </ | ||
+ | Information on how to set up the site-BDII in **egi-cloud-sbdii.pd.infn.it** is available [[https:// | ||
+ | |||
+ | Add your cloud-info-provider to your site-BDII **egi-cloud-sbdii.pd.infn.it** by adding new lines in the site.def like this: | ||
+ | <code bash> | ||
+ | BDII_REGIONS=" | ||
+ | BDII_CLOUD_URL=" | ||
+ | BDII_BDII_URL=" | ||
+ | </ | ||
+ | |||
+ | ==== Use the same APEL/SSM of grid site ==== | ||
+ | |||
+ | Cloud usage records are sent to APEL through the ssmsend program installed in **cert-37.pd.infn.it**: | ||
+ | <code bash> | ||
+ | [root@cert-37 ~]# cat / | ||
+ | # send buffered usage records to APEL | ||
+ | 30 */24 * * * root / | ||
+ | </ | ||
+ | It is therefore neede to install and configure NFS on **egi-cloud-ha**: | ||
+ | <code bash> | ||
+ | [root@egi-cloud-ha ~]# yum -y install nfs-utils | ||
+ | [root@egi-cloud-ha ~]# mkdir -p / | ||
+ | [root@egi-cloud-ha ~]# cat<< | ||
+ | / | ||
+ | EOF | ||
+ | [root@egi-cloud-ha ~]# systemctl start nfs-server | ||
+ | </ | ||
+ | In case of APEL nagios probe failure, check if / | ||
+ | |||
+ | To check if accounting records are properly received by APEL server look at [[http:// | ||
+ | |||
+ | ==== Install the accounting system (cASO) ==== | ||
+ | |||
+ | (see [[https:// | ||
+ | |||
+ | On **egi-cloud** create accounting user and role, and set the proper policies: | ||
+ | <code bash> | ||
+ | openstack user create --domain default --password < | ||
+ | openstack role create accounting | ||
+ | for i in VO: | ||
+ | cat<< | ||
+ | " | ||
+ | " | ||
+ | EOF | ||
+ | </ | ||
+ | Install cASO on **egi-cloud-ha** (with CMD-OS repo already installed): | ||
+ | <code bash> | ||
+ | yum -y install caso | ||
+ | </ | ||
+ | Edit the / | ||
+ | <code bash> | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | </ | ||
+ | Create the directories | ||
+ | <code bash> | ||
+ | mkdir / | ||
+ | </ | ||
+ | Test it | ||
+ | <code bash> | ||
+ | caso-extract -v -d | ||
+ | </ | ||
+ | Create the cron job | ||
+ | <code bash> | ||
+ | cat << | ||
+ | # extract and send usage records to APEL/ | ||
+ | 10 * * * * root / | ||
+ | EOF | ||
+ | </ | ||
+ | |||
+ | ==== Install Cloudkeeper and Cloudkeeper-OS ==== | ||
+ | |||
+ | On **egi-cloud.pd.infn.it** create a cloudkeeper user in keystone: | ||
+ | <code bash> | ||
+ | openstack user create --domain default --password CLOUDKEEPER_PASS cloudkeeper | ||
+ | </ | ||
+ | and, for each project, add the cloudkeeper user with the user role | ||
+ | <code bash> | ||
+ | for i in VO:ops VO: | ||
+ | </ | ||
+ | Install Cloudkeeper and Cloudkeeper-OS on **egi-cloud-ha** (with CMD-OS repo already installed): | ||
+ | <code bash> | ||
+ | yum -y install cloudkeeper cloudkeeper-os | ||
+ | </ | ||
+ | Edit / | ||
+ | <code bash> | ||
+ | - https:// | ||
+ | - https:// | ||
+ | - https:// | ||
+ | </ | ||
+ | Edit the / | ||
+ | <code bash> | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | </ | ||
+ | Creating the / | ||
+ | <code bash> | ||
+ | cat<< | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | } | ||
+ | EOF | ||
+ | </ | ||
+ | Enable and start the services | ||
+ | <code bash> | ||
+ | systemctl enable cloudkeeper-os | ||
+ | systemctl start cloudkeeper-os | ||
+ | systemctl enable cloudkeeper.timer | ||
+ | systemctl start cloudkeeper.timer | ||
+ | </ | ||
+ | |||
+ | ==== Installing Squid for CVMFS (optional) ==== | ||
+ | |||
+ | Install and configure squid on cloud-01 and cloud-02 for use from VMs (see https:// | ||
+ | <code bash> | ||
+ | yum install -y squid | ||
+ | sed -i " | ||
+ | cat<< | ||
+ | minimum_expiry_time 0 | ||
+ | |||
+ | max_filedesc 8192 | ||
+ | maximum_object_size 1024 MB | ||
+ | |||
+ | cache_mem 128 MB | ||
+ | maximum_object_size_in_memory 128 KB | ||
+ | # 50 GB disk cache | ||
+ | cache_dir ufs / | ||
+ | acl cvmfs dst cvmfs-stratum-one.cern.ch | ||
+ | acl cvmfs dst cernvmfs.gridpp.rl.ac.uk | ||
+ | acl cvmfs dst cvmfs.racf.bnl.gov | ||
+ | acl cvmfs dst cvmfs02.grid.sinica.edu.tw | ||
+ | acl cvmfs dst cvmfs.fnal.gov | ||
+ | acl cvmfs dst cvmfs-atlas-nightlies.cern.ch | ||
+ | acl cvmfs dst cvmfs-egi.gridpp.rl.ac.uk | ||
+ | acl cvmfs dst klei.nikhef.nl | ||
+ | acl cvmfs dst cvmfsrepo.lcg.triumf.ca | ||
+ | acl cvmfs dst cvmfsrep.grid.sinica.edu.tw | ||
+ | acl cvmfs dst cvmfs-s1bnl.opensciencegrid.org | ||
+ | acl cvmfs dst cvmfs-s1fnal.opensciencegrid.org | ||
+ | http_access allow cvmfs | ||
+ | EOF | ||
+ | rm -rf / | ||
+ | mkdir -p / | ||
+ | chown -R squid.squid / | ||
+ | squid -k parse | ||
+ | squid -z | ||
+ | ulimit -n 8192 | ||
+ | systemctl start squid | ||
+ | firewall-cmd --permanent --add-port 3128/tcp | ||
+ | systemctl restart firewalld | ||
+ | </ | ||
+ | Use CVMFS_HTTP_PROXY=" | ||
+ | |||
+ | Actually, better to use already existing squids: | ||
+ | CVMFS_HTTP_PROXY=" | ||
+ | |||
+ | ==== Local Accounting ==== | ||
+ | A local accounting system based on Grafana, InfluxDB and Collectd has been set up following the instructions [[https:// | ||
+ | |||
+ | ==== Local Monitoring ==== | ||
+ | === Ganglia === | ||
+ | * Install ganglia-gmond on all servers | ||
+ | * Configure cluster and host fields in **/ | ||
+ | * Finally: systemctl enable gmond.service; | ||
+ | === Nagios === | ||
+ | * Install on compute nodes nsca-client, | ||
+ | |||
+ | * Copy the file **cld-nagios:/ | ||
+ | |||
+ | * Then do in all compute nodes: | ||
+ | <code bash> | ||
+ | $ echo encryption_method=1 >> / | ||
+ | $ usermod -a -G libvirt nagios | ||
+ | $ sed -i ' | ||
+ | # then be sure the files below are in / | ||
+ | $ ls / | ||
+ | check_kvm | ||
+ | $ cat <<EOF > crontab.txt | ||
+ | # Puppet Name: nagios_check_kvm | ||
+ | 0 */1 * * * / | ||
+ | EOF | ||
+ | $ crontab crontab.txt | ||
+ | $ crontab -l | ||
+ | </ | ||
+ | * On the contoller node, add in / | ||
+ | <code bash> | ||
+ | " | ||
+ | </ | ||
+ | and in / | ||
+ | <code bash> | ||
+ | " | ||
+ | </ | ||
+ | * Create in the VO:dteam project a cirros VM with tiny flavour named nagios-probe and access key named dteam-key (saving the private key file dteam-key.pem in egi-cloud /root directory), and take note of its ID and private IP. Then on the cld-nagios server put its ID in the file **/ | ||
+ | |||
+ | * On the cld-nagios server check/ | ||
+ | |||
+ | ==== Security incindents and IP traceability ==== | ||
+ | See [[https:// | ||
+ | On egi-cloud do install the [[https:// | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# os-ip-trace 90.147.77.229 | ||
+ | +--------------------------------------+-----------+---------------------+---------------------+ | ||
+ | | device id | user name | | ||
+ | +--------------------------------------+-----------+---------------------+---------------------+ | ||
+ | | 3002b1f1-bca3-4e4f-b21e-8de12c0b926e | | ||
+ | +--------------------------------------+-----------+---------------------+---------------------+ | ||
+ | </ | ||
+ | Save and archive important log files: | ||
+ | * On egi-cloud and each compute node cloud-0%, add the line "*.* @@192.168.60.31: | ||
+ | * In cld-foreman, | ||
+ | Install ulogd in the controller node | ||
+ | <code bash> | ||
+ | yum install -y libnetfilter_log | ||
+ | yum localinstall -y http:// | ||
+ | yum localinstall -y http:// | ||
+ | </ | ||
+ | and configure / | ||
+ | Start the service | ||
+ | <code bash> | ||
+ | systemctl enable ulogd | ||
+ | systemctl start ulogd | ||
+ | </ | ||
+ | Finally, be sure that / | ||
+ | |||
+ | ==== Troubleshooting ==== | ||
+ | |||
+ | * Passwordless ssh access to egi-cloud from cld-nagios and from egi-cloud to cloud-0* has been already configured | ||
+ | * If cld-nagios does not ping egi-cloud, be sure that the rule "route add -net 192.168.60.0 netmask 255.255.255.0 gw 192.168.114.1" | ||
+ | * In case of Nagios alarms, try to restart all cloud services doing the following: | ||
+ | <code bash> | ||
+ | $ ssh root@egi-cloud | ||
+ | [root@egi-cloud ~]# ./ | ||
+ | [root@egi-cloud ~]# for i in $(seq 1 7); do ssh cloud-0$i ./ | ||
+ | </ | ||
+ | * Resubmit the Nagios probe and check if it works again | ||
+ | * In case the problem persist, check the consistency of the DB by executing (this also fix the issue when quota overview in the dashboard is not consistent with actual VMs active): | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# python nova-quota-sync.py | ||
+ | </ | ||
+ | * In case of EGI Nagios alarm, check that the user running the Nagios probes is not belonging also to tenants other than " | ||
+ | |||
+ | * in case of reboot of egi-cloud server: | ||
+ | * check its network configuration (use IPMI if not reachable): all 4 interfaces must be up and the default gateway must be 90.147.77.254. | ||
+ | * check DNS in / | ||
+ | * check routing with $route -n, if needed do: $ip route replace default via 90.147.77.254. Also be sure to have a route for 90.147.77.0 network. | ||
+ | * check if storage mountpoints 192.168.61.100:/ | ||
+ | * check if port 8472 is open on the local firewall (it is used by linuxbridge vxlan networks) | ||
+ | |||
+ | * in case of reboot of cloud-0* server (use IPMI if not reachable): all 3 interfaces must be up and the default destination must have 192.168.114.1 as gateway | ||
+ | * check its network configuration | ||
+ | * check if all partitions in /etc/fstab are properly mounted (do: $ df -h) | ||
+ | |||
+ | * In case of network instabilities, | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# / | ||
+ | generic-receive-offload: | ||
+ | </ | ||
+ | |||
+ | * Also check if / | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# cat / | ||
+ | #!/bin/bash | ||
+ | case " | ||
+ | em1) | ||
+ | / | ||
+ | ;; | ||
+ | em2) | ||
+ | / | ||
+ | ;; | ||
+ | em3) | ||
+ | / | ||
+ | ;; | ||
+ | em4) | ||
+ | / | ||
+ | ;; | ||
+ | esac | ||
+ | exit 0 | ||
+ | </ | ||
+ | |||
+ | * If you need to change the project quotas, check " | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# source admin-openrc.sh | ||
+ | [root@egi-cloud ~]# openstack quota set --cores 184 VO:enmr.eu | ||
+ | </ | ||