progetti:cloud-areapd:egi_federated_cloud:newton-centos7_testbed
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
progetti:cloud-areapd:egi_federated_cloud:newton-centos7_testbed [2017/12/06 11:44] – [Layout] verlato@infn.it | progetti:cloud-areapd:egi_federated_cloud:newton-centos7_testbed [2018/12/20 13:16] (current) – [Local Monitoring] verlato@infn.it | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Newton-CentOS7 Testbed ====== | ||
+ | Fully integrated Resource Provider [[https:// | ||
+ | === EGI Monitoring/ | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | === Local Monitoring/ | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ | === Local dashboard === | ||
+ | * [[http:// | ||
+ | ===== Layout ===== | ||
+ | |||
+ | * Controller + Network node + Storage node + Telemetry service: **egi-cloud.pd.infn.it** | ||
+ | |||
+ | * Compute nodes: **cloud-01: | ||
+ | | ||
+ | * NoSQL database: **egi-cloud-ha.pn.pd.infn.it** | ||
+ | |||
+ | * OneData provider: **one-data-01.pd.infn.it** | ||
+ | |||
+ | * Cloudkeeper and Cloudkeeper-OS: | ||
+ | |||
+ | * Network layout available [[http:// | ||
+ | |||
+ | |||
+ | ===== OpenStack configuration ===== | ||
+ | * Controller/ | ||
+ | * We created one tenant for each EGI FedCloud VO supported, a router and various nets and subnets obtaining the following network topology: | ||
+ | {{: | ||
+ | *We mount the partitions for the glance and cinder services form 192.168.61.100 with nfs driver | ||
+ | <code bash> | ||
+ | yum install -y nfs-utils | ||
+ | mkdir -p / | ||
+ | cat<< | ||
+ | 192.168.61.100:/ | ||
+ | EOF | ||
+ | mount -a | ||
+ | </ | ||
+ | *We use some specific configurations for cinder services using the following documentation [[http:// | ||
+ | | ||
+ | * The telemetry service uses a NoSQL database then we install [[http:// | ||
+ | ===== EGI FedCloud specific configuration ===== | ||
+ | |||
+ | (see [[https:// | ||
+ | |||
+ | * Install CAs Certificates and the software for fetching the CRLs in both Controller (egi-cloud) and Compute (cloud-01: | ||
+ | <code bash> | ||
+ | systemctl stop httpd | ||
+ | curl -L http:// | ||
+ | yum install -y ca-policy-egi-core fetch-crl | ||
+ | systemctl enable fetch-crl-cron.service | ||
+ | systemctl start fetch-crl-cron.service | ||
+ | </ | ||
+ | ==== Install OpenStack Keystone-VOMS module ==== | ||
+ | (see [[https:// | ||
+ | * Prepare to run keystone as WSGI app in SSL | ||
+ | <code bash> | ||
+ | yum install -y voms mod_ssl | ||
+ | |||
+ | APACHE_LOG_DIR=/ | ||
+ | |||
+ | cat << | ||
+ | Listen 5000 | ||
+ | WSGIDaemonProcess keystone user=keystone group=keystone processes=8 threads=1 | ||
+ | < | ||
+ | LogLevel | ||
+ | ErrorLog | ||
+ | CustomLog | ||
+ | |||
+ | SSLEngine | ||
+ | SSLCertificateFile | ||
+ | SSLCertificateKeyFile | ||
+ | SSLCACertificatePath | ||
+ | SSLCARevocationPath | ||
+ | SSLVerifyClient | ||
+ | SSLVerifyDepth | ||
+ | SSLProtocol | ||
+ | SSLCipherSuite | ||
+ | SSLOptions | ||
+ | |||
+ | WSGIScriptAlias / / | ||
+ | WSGIProcessGroup keystone | ||
+ | </ | ||
+ | |||
+ | Listen 35357 | ||
+ | WSGIDaemonProcess | ||
+ | < | ||
+ | LogLevel | ||
+ | ErrorLog | ||
+ | CustomLog | ||
+ | |||
+ | SSLEngine | ||
+ | SSLCertificateFile | ||
+ | SSLCertificateKeyFile | ||
+ | SSLCACertificatePath | ||
+ | SSLCARevocationPath | ||
+ | SSLVerifyClient | ||
+ | SSLVerifyDepth | ||
+ | SSLProtocol | ||
+ | SSLCipherSuite | ||
+ | SSLOptions | ||
+ | |||
+ | WSGIScriptAlias | ||
+ | WSGIProcessGroup | ||
+ | </ | ||
+ | EOF | ||
+ | </ | ||
+ | * Check and in case install the host certificate for your server in / | ||
+ | <code bash> | ||
+ | [root@egi-cloud]# | ||
+ | -rw-r--r--. | ||
+ | -rw-------. | ||
+ | </ | ||
+ | * take the file [[http:// | ||
+ | * copy it to / | ||
+ | <code bash> | ||
+ | echo " | ||
+ | mkdir -p / | ||
+ | curl http:// | ||
+ | ln / | ||
+ | ln / | ||
+ | chown -R keystone: | ||
+ | </ | ||
+ | * Installing the Keystone-VOMS module: | ||
+ | <code bash> | ||
+ | yum localinstall -y http:// | ||
+ | </ | ||
+ | * Enable the Keystone VOMS module | ||
+ | <code bash> | ||
+ | cat<< | ||
+ | |||
+ | [filter: | ||
+ | paste.filter_factory = keystone_voms.core: | ||
+ | EOF | ||
+ | |||
+ | sed -i ' | ||
+ | </ | ||
+ | * Configuring the Keystone VOMS module | ||
+ | <code bash> | ||
+ | cat<< | ||
+ | |||
+ | [voms] | ||
+ | vomsdir_path = / | ||
+ | ca_path = / | ||
+ | voms_policy = / | ||
+ | vomsapi_lib = libvomsapi.so.1 | ||
+ | autocreate_users = True | ||
+ | add_roles = False | ||
+ | user_roles = _member_ | ||
+ | enable_pusp = False | ||
+ | EOF | ||
+ | </ | ||
+ | <code bash> | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | for i in ops atlas lhcb cms | ||
+ | do | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | done | ||
+ | </ | ||
+ | <code bash> | ||
+ | cat << | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | } | ||
+ | EOF | ||
+ | </ | ||
+ | * Adjust manually the keystone catalog in order the identity backend points to the correct URLs: | ||
+ | * public URL: https:// | ||
+ | * admin URL: https:// | ||
+ | * internal URL: https:// | ||
+ | <code bash> | ||
+ | mysql> use keystone; | ||
+ | mysql> update endpoint set url=" | ||
+ | mysql> update endpoint set url=" | ||
+ | mysql> select id,url from endpoint; | ||
+ | should show lines with the above URLs. | ||
+ | </ | ||
+ | * Replace http with https in auth_[protocol, | ||
+ | * Replace http with https in auth_[protocol, | ||
+ | * Also check if " | ||
+ | ==== Install the OOI API ==== | ||
+ | * Install ooi (see [[https:// | ||
+ | <code bash> | ||
+ | yum localinstall -y http:// | ||
+ | </ | ||
+ | * Edit the / | ||
+ | <code bash> | ||
+ | cat <<EOF >>/ | ||
+ | |||
+ | ####### | ||
+ | # OOI # | ||
+ | ####### | ||
+ | |||
+ | [composite: | ||
+ | use = call: | ||
+ | /occi1.2: occi_api_12 | ||
+ | /occi1.1: occi_api_12 | ||
+ | |||
+ | [filter: | ||
+ | paste.filter_factory = ooi.wsgi: | ||
+ | openstack_version = /v2.1 | ||
+ | |||
+ | [composite: | ||
+ | use = call: | ||
+ | noauth2 = cors http_proxy_to_wsgi compute_req_id faultwrap sizelimit noauth2 occi osapi_compute_app_v21 | ||
+ | keystone = cors http_proxy_to_wsgi compute_req_id faultwrap sizelimit authtoken keystonecontext occi osapi_compute_app_v21 | ||
+ | EOF | ||
+ | </ | ||
+ | * Make sure the API occiapi is enabled in the / | ||
+ | <code bash> | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | </ | ||
+ | * Restart the nova services: | ||
+ | <code bash> | ||
+ | systemctl restart openstack-nova-api openstack-nova-consoleauth openstack-nova-scheduler openstack-nova-conductor openstack-nova-novncproxy | ||
+ | </ | ||
+ | * Register service in Keystone: | ||
+ | <code bash> | ||
+ | openstack service create --name occi --description "OCCI Interface" | ||
+ | openstack endpoint create --region RegionOne occi public https:// | ||
+ | openstack endpoint create --region RegionOne occi internal https:// | ||
+ | openstack endpoint create --region RegionOne occi admin https:// | ||
+ | </ | ||
+ | * Enable SSL connection on port 8787, by creating the file / | ||
+ | <code bash> | ||
+ | cat <<EOF > / | ||
+ | #LoadModule proxy_http_module modules/ | ||
+ | # | ||
+ | # Proxy Server directives. Uncomment the following lines to | ||
+ | # enable the proxy server: | ||
+ | #LoadModule proxy_module / | ||
+ | #LoadModule proxy_http_module / | ||
+ | #LoadModule substitute_module / | ||
+ | |||
+ | |||
+ | Listen 8787 | ||
+ | < | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | < | ||
+ | # Do not enable proxying with ProxyRequests until you have secured | ||
+ | # your server. | ||
+ | # Open proxy servers are dangerous both to your network and to the | ||
+ | # Internet at large. | ||
+ | | ||
+ | |||
+ | < | ||
+ | Order deny, | ||
+ | Deny from all | ||
+ | </ | ||
+ | |||
+ | | ||
+ | | ||
+ | < | ||
+ | | ||
+ | | ||
+ | Order allow,deny | ||
+ | Allow from all | ||
+ | </ | ||
+ | |||
+ | </ | ||
+ | </ | ||
+ | EOF | ||
+ | </ | ||
+ | * Restart http service | ||
+ | <code bash> | ||
+ | systemctl restart httpd | ||
+ | </ | ||
+ | ==== Install rOCCI Client ==== | ||
+ | |||
+ | For complete guide about the rOCCI Client see [[https:// | ||
+ | ==== Install FedCloud BDII ==== | ||
+ | (See [[https:// | ||
+ | * Installing the resource bdii and the cloud-info-provider: | ||
+ | <code bash> | ||
+ | yum install bdii -y | ||
+ | yum -y localinstall http:// | ||
+ | </ | ||
+ | * Customize the configuration file with the local sites' infos | ||
+ | <code bash> | ||
+ | cp / | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | </ | ||
+ | * Be sure that keystone contains the OOI endpoints, otherwise it will not be published by the BDII. | ||
+ | * Create the file / | ||
+ | <code bash> | ||
+ | cat<< | ||
+ | #!/bin/sh | ||
+ | cloud-info-provider-service --yaml / | ||
+ | --middleware openstack \ | ||
+ | --os-username admin --os-password ADMIN_PASS \ | ||
+ | --os-tenant-name admin --os-auth-url https:// | ||
+ | --os-cacert / | ||
+ | EOF | ||
+ | </ | ||
+ | * Run manually the cloud-info-provider script and check that the output return the complete LDIF. To do so, execute: | ||
+ | <code bash> | ||
+ | chmod +x / | ||
+ | / | ||
+ | / | ||
+ | </ | ||
+ | * Now you can start the bdii service: | ||
+ | <code bash> | ||
+ | systemctl start bdii | ||
+ | </ | ||
+ | * Use the command below to see if the information is being published: | ||
+ | <code bash> | ||
+ | ldapsearch -x -h localhost -p 2170 -b o=glue | ||
+ | </ | ||
+ | * Do not forget to open port 2170: | ||
+ | <code bash> | ||
+ | firewall-cmd --add-port=2170/ | ||
+ | firewall-cmd --permanent --add-port=2170/ | ||
+ | systemctl restart firewalld | ||
+ | </ | ||
+ | * Information on how to set up the site-BDII in egi-cloud-sbdii.pd.infn.it is available [[https:// | ||
+ | * Add your cloud-info-provider to your site-BDII egi-cloud-sbdii.pd.infn.it by adding new lines in the site.def like this: | ||
+ | <code bash> | ||
+ | BDII_REGIONS=" | ||
+ | BDII_CLOUD_URL=" | ||
+ | BDII_BDII_URL=" | ||
+ | </ | ||
+ | ==== Use the same APEL/SSM of grid site ==== | ||
+ | * Cloud usage records are sent to APEL through the ssmsend program installed in cert-37.pd.infn.it: | ||
+ | <code bash> | ||
+ | [root@cert-37 ~]# cat / | ||
+ | # send buffered usage records to APEL | ||
+ | 30 */24 * * * root / | ||
+ | </ | ||
+ | * It si therefore neede to install and configure NFS on egi-cloud: | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# mkdir -p / | ||
+ | [root@egi-cloud ~]# cat<< | ||
+ | / | ||
+ | EOF | ||
+ | [root@egi-cloud ~]$ systemctl status nfs-server | ||
+ | </ | ||
+ | * In case of APEL nagios probe failure, check if / | ||
+ | * To check if accounting records are properly received by APEL server look at [[http:// | ||
+ | ==== Install the new accounting system (CASO) ==== | ||
+ | |||
+ | (see [[https:// | ||
+ | |||
+ | <code bash> | ||
+ | yum -y install libffi-devel openssl-devel gcc | ||
+ | yum -y localinstall http:// | ||
+ | </ | ||
+ | * Create role and user | ||
+ | <code bash> | ||
+ | openstack user create --domain default --password ACCOUNTING_PASS accounting | ||
+ | openstack role create accounting | ||
+ | </ | ||
+ | *For each of the tenants, add the user with the accounting role | ||
+ | <code bash> | ||
+ | for i in fctf wenmr atlas ops dteam lhcb cms indigo emsodev | ||
+ | do | ||
+ | openstack role add --project $i --user accounting accounting | ||
+ | done | ||
+ | </ | ||
+ | * Edit the / | ||
+ | <code bash> | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | </ | ||
+ | *Edit the / | ||
+ | <code bash> | ||
+ | sed -i ' | ||
+ | </ | ||
+ | <code bash> | ||
+ | mkdir / | ||
+ | </ | ||
+ | *Test it | ||
+ | <code bash> | ||
+ | caso-extract -v -d | ||
+ | </ | ||
+ | * Create the cron job | ||
+ | <code bash> | ||
+ | cat << | ||
+ | # extract and send usage records to APEL/ | ||
+ | 10 * * * * root / | ||
+ | EOF | ||
+ | </ | ||
+ | ==== Install Cloudkeeper and Cloudkeeper-OS==== | ||
+ | Cloudkeeper and Cloudkeeper-OS are installed in a dedicated server (egi-cloud-ha.pn.pd.infn.it). | ||
+ | Install Cloudkeeper | ||
+ | <code bash> | ||
+ | yum localinstall -y http:// | ||
+ | </ | ||
+ | Edit / | ||
+ | <code bash> | ||
+ | - https:// | ||
+ | - https:// | ||
+ | - https:// | ||
+ | - https:// | ||
+ | - https:// | ||
+ | - https:// | ||
+ | - https:// | ||
+ | - https:// | ||
+ | - https:// | ||
+ | |||
+ | ip-address: CONTROLLER_IP # IP address NGINX can listen on | ||
+ | </ | ||
+ | |||
+ | Enable and start the service | ||
+ | |||
+ | <code bash> | ||
+ | systemctl enable cloudkeeper-cron | ||
+ | systemctl start cloudkeeper-cron | ||
+ | </ | ||
+ | |||
+ | Install Cloudkeeper-OS | ||
+ | <code bash> | ||
+ | cd / | ||
+ | wget http:// | ||
+ | cd | ||
+ | yum update | ||
+ | yum -y install cloudkeeper-os | ||
+ | </ | ||
+ | |||
+ | Create a cloudkeeper user in keystone | ||
+ | <code bash> | ||
+ | openstack user create --domain default --password CLOUDKEEPER_PASS cloudkeeper | ||
+ | </ | ||
+ | and, for each of the tenants, add the cloudkeeper user with the user role | ||
+ | <code bash> | ||
+ | for i in fctf wenmr atlas ops dteam lhcb cms indigo emsodev biomed | ||
+ | do | ||
+ | openstack role add --project $i --user cloudkeeper user | ||
+ | done | ||
+ | </ | ||
+ | Edit the etc/ | ||
+ | <code bash> | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | </ | ||
+ | Edit the / | ||
+ | Enable and start the service | ||
+ | <code bash> | ||
+ | systemctl enable cloudkeeper-os | ||
+ | systemctl start cloudkeeper-os | ||
+ | </ | ||
+ | |||
+ | ==== Install Indigo IAM==== | ||
+ | |||
+ | ([[https:// | ||
+ | |||
+ | First you need to register your site on [[https:// | ||
+ | |||
+ | * install mod_auth_openidc | ||
+ | <code bash> | ||
+ | https:// | ||
+ | </ | ||
+ | |||
+ | * configure mod_auth_openidc | ||
+ | |||
+ | Edit / | ||
+ | <code bash> | ||
+ | (...) | ||
+ | < | ||
+ | |||
+ | (...) | ||
+ | |||
+ | OIDCClaimPrefix | ||
+ | OIDCResponseType | ||
+ | OIDCScope | ||
+ | OIDCProviderMetadataURL | ||
+ | OIDCClientID | ||
+ | OIDCClientSecret | ||
+ | OIDCProviderTokenEndpointAuth | ||
+ | OIDCCryptoPassphrase | ||
+ | OIDCRedirectURI | ||
+ | |||
+ | # The JWKs URL on which the Authorization publishes the keys used to sign its JWT access tokens. | ||
+ | # When not defined local validation of JWTs can still be done using statically configured keys, | ||
+ | # by setting OIDCOAuthVerifyCertFiles and/or OIDCOAuthVerifySharedKeys. | ||
+ | OIDCOAuthVerifyJwksUri " | ||
+ | |||
+ | < | ||
+ | AuthType | ||
+ | Require | ||
+ | LogLevel | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | AuthType | ||
+ | Require | ||
+ | LogLevel | ||
+ | </ | ||
+ | |||
+ | (...) | ||
+ | |||
+ | </ | ||
+ | </ | ||
+ | Substitute the following values: | ||
+ | <code bash> | ||
+ | <CLIENT ID>: Client ID as obtained from the IAM. | ||
+ | <CLIENT SECRET>: Client Secret as obtained from the IAM. | ||
+ | < | ||
+ | < | ||
+ | </ | ||
+ | * Edit / | ||
+ | |||
+ | <code bash> | ||
+ | [auth] | ||
+ | methods = external, | ||
+ | oidc = keystone.auth.plugins.mapped.Mapped | ||
+ | |||
+ | [oidc] | ||
+ | remote_id_attribute = HTTP_OIDC_ISS | ||
+ | |||
+ | [federation] | ||
+ | remote_id_attribute = HTTP_OIDC_ISS | ||
+ | trusted_dashboard = https://< | ||
+ | sso_callback_template = / | ||
+ | </ | ||
+ | * Ensure that / | ||
+ | |||
+ | * Keystone Groups, Projects and Mapping setup | ||
+ | <code bash> | ||
+ | openstack group create indigo_group --description " | ||
+ | openstack project create indigo --description " | ||
+ | openstack role add user --group indigo_group --project indigo | ||
+ | openstack role add user --group indigo_group --domain default | ||
+ | </ | ||
+ | |||
+ | Now the federation plugin needs to be setup | ||
+ | |||
+ | * Load the mapping as follows | ||
+ | <code bash> | ||
+ | openstack identity provider create indigo-dc --remote-id https:// | ||
+ | openstack federation protocol create oidc --identity-provider indigo-dc --mapping indigo_mapping | ||
+ | openstack mapping set --rules indigo_mapping.json indigo_mapping | ||
+ | |||
+ | </ | ||
+ | |||
+ | * OpenStack Dashboard (Horizon) Configuration | ||
+ | Edit / | ||
+ | |||
+ | <code bash> | ||
+ | WEBSSO_ENABLED = True | ||
+ | WEBSSO_INITIAL_CHOICE = " | ||
+ | |||
+ | WEBSSO_CHOICES = ( | ||
+ | (" | ||
+ | (" | ||
+ | ) | ||
+ | </ | ||
+ | ==== Local Monitoring ==== | ||
+ | === Ganglia === | ||
+ | * Install ganglia-gmond on all servers | ||
+ | * Configure cluster and host fields in / | ||
+ | * Finally: systemctl enable gmond.service; | ||
+ | === Nagios === | ||
+ | * Install on compute nodes ncsa-client, | ||
+ | * Copy the file cld-nagios:/ | ||
+ | * Then do in all compute nodes: | ||
+ | <code bash> | ||
+ | $ echo encryption_method=1 > / | ||
+ | $ usermod -a -G libvirtd nagios | ||
+ | $ sed -i ' | ||
+ | # then be sure the files below are in / | ||
+ | $ ls / | ||
+ | check_kvm | ||
+ | $ cat <<EOF > crontab.txt | ||
+ | # Puppet Name: nagios_check_kvm | ||
+ | 0 */1 * * * / | ||
+ | EOF | ||
+ | $ crontab crontab.txt | ||
+ | $ crontab -l | ||
+ | </ | ||
+ | * On the contoller node check if $ sed -i ' | ||
+ | * On the cld-nagios server check/ | ||
+ | |||
+ | ==== Security incindents and IP traceability ==== | ||
+ | See [[https:// | ||
+ | On egi-cloud do install the [[https:// | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# os-ip-trace 90.147.77.229 | ||
+ | +--------------------------------------+-----------+---------------------+---------------------+ | ||
+ | | device id | user name | | ||
+ | +--------------------------------------+-----------+---------------------+---------------------+ | ||
+ | | 3002b1f1-bca3-4e4f-b21e-8de12c0b926e | | ||
+ | +--------------------------------------+-----------+---------------------+---------------------+ | ||
+ | </ | ||
+ | Save and archive important log files: | ||
+ | * On egi-cloud and each compute node cloud-0%, add the line "*.* @@192.168.60.31: | ||
+ | * In cld-foreman, | ||
+ | Install ulogd in the controller node | ||
+ | <code bash> | ||
+ | yum install -y libnetfilter_log | ||
+ | yum localinstall -y http:// | ||
+ | yum localinstall -y http:// | ||
+ | </ | ||
+ | and configure / | ||
+ | Start the service | ||
+ | <code bash> | ||
+ | systemctl enable ulogd | ||
+ | systemctl start ulogd | ||
+ | </ | ||
+ | Finally, be sure that / | ||
+ | |||
+ | ==== Troubleshooting ==== | ||
+ | |||
+ | * Passwordless ssh access to egi-cloud from cld-nagios and from egi-cloud to cloud-0* has been already configured | ||
+ | * If cld-nagios does not ping egi-cloud, be sure that the rule "route add -net 192.168.60.0 netmask 255.255.255.0 gw 192.168.114.1" | ||
+ | * In case of Nagios alarms, try to restart all cloud services doing the following: | ||
+ | <code bash> | ||
+ | $ ssh root@egi-cloud | ||
+ | [root@egi-cloud ~]# ./ | ||
+ | [root@egi-cloud ~]# for i in $(seq 1 6); do ssh cloud-0$i.pn.pd.infn.it ./ | ||
+ | </ | ||
+ | * Resubmit the Nagios probe and check if it works again | ||
+ | * In case the problem persist, check the consistency of the DB by executing (this also fix the issue when quota overview in the dashboard is not consistent with actual VMs active): | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# python nova-quota-sync.py | ||
+ | </ | ||
+ | * In case of EGI Nagios alarm, check that the user running the Nagios probes is not belonging also to tenants other than " | ||
+ | |||
+ | * in case of reboot of egi-cloud server: | ||
+ | * check its network configuration (use IPMI if not reachable): all 4 interfaces must be up and the default gateway must be 90.147.77.254. | ||
+ | * check DNS in / | ||
+ | * check routing with $route -n, if needed do: $ip route replace default via 90.147.77.254. Also be sure to have a route for 90.147.77.0 network. | ||
+ | * check if storage mountpoints 192.168.61.100:/ | ||
+ | |||
+ | * in case of reboot of cloud-0* server (use IPMI if not reachable): all 3 interfaces must be up and the default destination must have both 192.168.114.1 and 192.168.115.1 gateways | ||
+ | * check its network configuration | ||
+ | * check if all partitions in /etc/fstab are properly mounted (do: $ df -h) | ||
+ | |||
+ | * In case of network instabilities, | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# / | ||
+ | generic-receive-offload: | ||
+ | </ | ||
+ | |||
+ | * Also check if / | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# cat / | ||
+ | #!/bin/bash | ||
+ | case " | ||
+ | em1) | ||
+ | / | ||
+ | ;; | ||
+ | em2) | ||
+ | / | ||
+ | ;; | ||
+ | em3) | ||
+ | / | ||
+ | ;; | ||
+ | em4) | ||
+ | / | ||
+ | ;; | ||
+ | esac | ||
+ | exit 0 | ||
+ | </ | ||
+ | |||
+ | * If you need to change the project quotas, do not forget to apply the change to both tenantId and tenantName, due to a knonw bug, e.g.: | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# source admin-openrc.sh | ||
+ | [root@egi-cloud ~]# tenantId=$(openstack project list | grep fctf | awk ' | ||
+ | [root@egi-cloud ~]# nova quota-update --instances 40 --cores 40 --ram 81840 $tenantId | ||
+ | [root@egi-cloud ~]# nova quota-update --instances 40 --cores 40 --ram 81840 fctf | ||
+ | [root@egi-cloud ~]# neutron quota-update --floatingip 1 --tenant-id $tenantId | ||
+ | [root@egi-cloud ~]# neutron quota-update --floatingip 1 --tenant-id fctf | ||
+ | </ |