progetti:cloud-areapd:egi_federated_cloud:newton-centos7_testbed
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| progetti:cloud-areapd:egi_federated_cloud:newton-centos7_testbed [2017/09/04 13:53] – [Security incindents and IP traceability] segatta@infn.it | progetti:cloud-areapd:egi_federated_cloud:newton-centos7_testbed [2018/12/20 13:16] (current) – [Local Monitoring] verlato@infn.it | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Newton-CentOS7 Testbed ====== | ||
| + | Fully integrated Resource Provider [[https:// | ||
| + | === EGI Monitoring/ | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | * [[http:// | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | === Local Monitoring/ | ||
| + | * [[http:// | ||
| + | * [[http:// | ||
| + | * [[http:// | ||
| + | * [[http:// | ||
| + | * [[https:// | ||
| + | === Local dashboard === | ||
| + | * [[http:// | ||
| + | ===== Layout ===== | ||
| + | |||
| + | * Controller + Network node + Storage node + Telemetry service: **egi-cloud.pd.infn.it** | ||
| + | |||
| + | * Compute nodes: **cloud-01: | ||
| + | | ||
| + | * NoSQL database: **egi-cloud-ha.pn.pd.infn.it** | ||
| + | |||
| + | * OneData provider: **one-data-01.pd.infn.it** | ||
| + | |||
| + | * Cloudkeeper and Cloudkeeper-OS: | ||
| + | |||
| + | * Network layout available [[http:// | ||
| + | |||
| + | |||
| + | ===== OpenStack configuration ===== | ||
| + | * Controller/ | ||
| + | * We created one tenant for each EGI FedCloud VO supported, a router and various nets and subnets obtaining the following network topology: | ||
| + | {{: | ||
| + | *We mount the partitions for the glance and cinder services form 192.168.61.100 with nfs driver | ||
| + | <code bash> | ||
| + | yum install -y nfs-utils | ||
| + | mkdir -p / | ||
| + | cat<< | ||
| + | 192.168.61.100:/ | ||
| + | EOF | ||
| + | mount -a | ||
| + | </ | ||
| + | *We use some specific configurations for cinder services using the following documentation [[http:// | ||
| + | | ||
| + | * The telemetry service uses a NoSQL database then we install [[http:// | ||
| + | ===== EGI FedCloud specific configuration ===== | ||
| + | |||
| + | (see [[https:// | ||
| + | |||
| + | * Install CAs Certificates and the software for fetching the CRLs in both Controller (egi-cloud) and Compute (cloud-01: | ||
| + | <code bash> | ||
| + | systemctl stop httpd | ||
| + | curl -L http:// | ||
| + | yum install -y ca-policy-egi-core fetch-crl | ||
| + | systemctl enable fetch-crl-cron.service | ||
| + | systemctl start fetch-crl-cron.service | ||
| + | </ | ||
| + | ==== Install OpenStack Keystone-VOMS module ==== | ||
| + | (see [[https:// | ||
| + | * Prepare to run keystone as WSGI app in SSL | ||
| + | <code bash> | ||
| + | yum install -y voms mod_ssl | ||
| + | |||
| + | APACHE_LOG_DIR=/ | ||
| + | |||
| + | cat << | ||
| + | Listen 5000 | ||
| + | WSGIDaemonProcess keystone user=keystone group=keystone processes=8 threads=1 | ||
| + | < | ||
| + | LogLevel | ||
| + | ErrorLog | ||
| + | CustomLog | ||
| + | |||
| + | SSLEngine | ||
| + | SSLCertificateFile | ||
| + | SSLCertificateKeyFile | ||
| + | SSLCACertificatePath | ||
| + | SSLCARevocationPath | ||
| + | SSLVerifyClient | ||
| + | SSLVerifyDepth | ||
| + | SSLProtocol | ||
| + | SSLCipherSuite | ||
| + | SSLOptions | ||
| + | |||
| + | WSGIScriptAlias / / | ||
| + | WSGIProcessGroup keystone | ||
| + | </ | ||
| + | |||
| + | Listen 35357 | ||
| + | WSGIDaemonProcess | ||
| + | < | ||
| + | LogLevel | ||
| + | ErrorLog | ||
| + | CustomLog | ||
| + | |||
| + | SSLEngine | ||
| + | SSLCertificateFile | ||
| + | SSLCertificateKeyFile | ||
| + | SSLCACertificatePath | ||
| + | SSLCARevocationPath | ||
| + | SSLVerifyClient | ||
| + | SSLVerifyDepth | ||
| + | SSLProtocol | ||
| + | SSLCipherSuite | ||
| + | SSLOptions | ||
| + | |||
| + | WSGIScriptAlias | ||
| + | WSGIProcessGroup | ||
| + | </ | ||
| + | EOF | ||
| + | </ | ||
| + | * Check and in case install the host certificate for your server in / | ||
| + | <code bash> | ||
| + | [root@egi-cloud]# | ||
| + | -rw-r--r--. | ||
| + | -rw-------. | ||
| + | </ | ||
| + | * take the file [[http:// | ||
| + | * copy it to / | ||
| + | <code bash> | ||
| + | echo " | ||
| + | mkdir -p / | ||
| + | curl http:// | ||
| + | ln / | ||
| + | ln / | ||
| + | chown -R keystone: | ||
| + | </ | ||
| + | * Installing the Keystone-VOMS module: | ||
| + | <code bash> | ||
| + | yum localinstall -y http:// | ||
| + | </ | ||
| + | * Enable the Keystone VOMS module | ||
| + | <code bash> | ||
| + | cat<< | ||
| + | |||
| + | [filter: | ||
| + | paste.filter_factory = keystone_voms.core: | ||
| + | EOF | ||
| + | |||
| + | sed -i ' | ||
| + | </ | ||
| + | * Configuring the Keystone VOMS module | ||
| + | <code bash> | ||
| + | cat<< | ||
| + | |||
| + | [voms] | ||
| + | vomsdir_path = / | ||
| + | ca_path = / | ||
| + | voms_policy = / | ||
| + | vomsapi_lib = libvomsapi.so.1 | ||
| + | autocreate_users = True | ||
| + | add_roles = False | ||
| + | user_roles = _member_ | ||
| + | enable_pusp = False | ||
| + | EOF | ||
| + | </ | ||
| + | <code bash> | ||
| + | mkdir -p / | ||
| + | cat > / | ||
| + | / | ||
| + | / | ||
| + | EOF | ||
| + | cat > / | ||
| + | / | ||
| + | / | ||
| + | EOF | ||
| + | mkdir -p / | ||
| + | cat > / | ||
| + | / | ||
| + | / | ||
| + | EOF | ||
| + | cat > / | ||
| + | / | ||
| + | / | ||
| + | EOF | ||
| + | mkdir -p / | ||
| + | cat > / | ||
| + | / | ||
| + | / | ||
| + | EOF | ||
| + | cat > / | ||
| + | / | ||
| + | / | ||
| + | EOF | ||
| + | mkdir -p / | ||
| + | cat > / | ||
| + | / | ||
| + | / | ||
| + | EOF | ||
| + | mkdir -p / | ||
| + | cat > / | ||
| + | / | ||
| + | / | ||
| + | EOF | ||
| + | cat > / | ||
| + | / | ||
| + | / | ||
| + | EOF | ||
| + | for i in ops atlas lhcb cms | ||
| + | do | ||
| + | mkdir -p / | ||
| + | cat > / | ||
| + | / | ||
| + | / | ||
| + | EOF | ||
| + | cat > / | ||
| + | / | ||
| + | / | ||
| + | EOF | ||
| + | done | ||
| + | </ | ||
| + | <code bash> | ||
| + | cat << | ||
| + | { | ||
| + | " | ||
| + | " | ||
| + | }, | ||
| + | " | ||
| + | " | ||
| + | }, | ||
| + | " | ||
| + | " | ||
| + | }, | ||
| + | " | ||
| + | " | ||
| + | }, | ||
| + | " | ||
| + | " | ||
| + | }, | ||
| + | " | ||
| + | " | ||
| + | }, | ||
| + | " | ||
| + | " | ||
| + | }, | ||
| + | " | ||
| + | " | ||
| + | }, | ||
| + | " | ||
| + | " | ||
| + | } | ||
| + | } | ||
| + | EOF | ||
| + | </ | ||
| + | * Adjust manually the keystone catalog in order the identity backend points to the correct URLs: | ||
| + | * public URL: https:// | ||
| + | * admin URL: https:// | ||
| + | * internal URL: https:// | ||
| + | <code bash> | ||
| + | mysql> use keystone; | ||
| + | mysql> update endpoint set url=" | ||
| + | mysql> update endpoint set url=" | ||
| + | mysql> select id,url from endpoint; | ||
| + | should show lines with the above URLs. | ||
| + | </ | ||
| + | * Replace http with https in auth_[protocol, | ||
| + | * Replace http with https in auth_[protocol, | ||
| + | * Also check if " | ||
| + | ==== Install the OOI API ==== | ||
| + | * Install ooi (see [[https:// | ||
| + | <code bash> | ||
| + | yum localinstall -y http:// | ||
| + | </ | ||
| + | * Edit the / | ||
| + | <code bash> | ||
| + | cat <<EOF >>/ | ||
| + | |||
| + | ####### | ||
| + | # OOI # | ||
| + | ####### | ||
| + | |||
| + | [composite: | ||
| + | use = call: | ||
| + | /occi1.2: occi_api_12 | ||
| + | /occi1.1: occi_api_12 | ||
| + | |||
| + | [filter: | ||
| + | paste.filter_factory = ooi.wsgi: | ||
| + | openstack_version = /v2.1 | ||
| + | |||
| + | [composite: | ||
| + | use = call: | ||
| + | noauth2 = cors http_proxy_to_wsgi compute_req_id faultwrap sizelimit noauth2 occi osapi_compute_app_v21 | ||
| + | keystone = cors http_proxy_to_wsgi compute_req_id faultwrap sizelimit authtoken keystonecontext occi osapi_compute_app_v21 | ||
| + | EOF | ||
| + | </ | ||
| + | * Make sure the API occiapi is enabled in the / | ||
| + | <code bash> | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | </ | ||
| + | * Restart the nova services: | ||
| + | <code bash> | ||
| + | systemctl restart openstack-nova-api openstack-nova-consoleauth openstack-nova-scheduler openstack-nova-conductor openstack-nova-novncproxy | ||
| + | </ | ||
| + | * Register service in Keystone: | ||
| + | <code bash> | ||
| + | openstack service create --name occi --description "OCCI Interface" | ||
| + | openstack endpoint create --region RegionOne occi public https:// | ||
| + | openstack endpoint create --region RegionOne occi internal https:// | ||
| + | openstack endpoint create --region RegionOne occi admin https:// | ||
| + | </ | ||
| + | * Enable SSL connection on port 8787, by creating the file / | ||
| + | <code bash> | ||
| + | cat <<EOF > / | ||
| + | #LoadModule proxy_http_module modules/ | ||
| + | # | ||
| + | # Proxy Server directives. Uncomment the following lines to | ||
| + | # enable the proxy server: | ||
| + | #LoadModule proxy_module / | ||
| + | #LoadModule proxy_http_module / | ||
| + | #LoadModule substitute_module / | ||
| + | |||
| + | |||
| + | Listen 8787 | ||
| + | < | ||
| + | | ||
| + | | ||
| + | | ||
| + | |||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | < | ||
| + | # Do not enable proxying with ProxyRequests until you have secured | ||
| + | # your server. | ||
| + | # Open proxy servers are dangerous both to your network and to the | ||
| + | # Internet at large. | ||
| + | | ||
| + | |||
| + | < | ||
| + | Order deny, | ||
| + | Deny from all | ||
| + | </ | ||
| + | |||
| + | | ||
| + | | ||
| + | < | ||
| + | | ||
| + | | ||
| + | Order allow,deny | ||
| + | Allow from all | ||
| + | </ | ||
| + | |||
| + | </ | ||
| + | </ | ||
| + | EOF | ||
| + | </ | ||
| + | * Restart http service | ||
| + | <code bash> | ||
| + | systemctl restart httpd | ||
| + | </ | ||
| + | ==== Install rOCCI Client ==== | ||
| + | |||
| + | For complete guide about the rOCCI Client see [[https:// | ||
| + | ==== Install FedCloud BDII ==== | ||
| + | (See [[https:// | ||
| + | * Installing the resource bdii and the cloud-info-provider: | ||
| + | <code bash> | ||
| + | yum install bdii -y | ||
| + | yum -y localinstall http:// | ||
| + | </ | ||
| + | * Customize the configuration file with the local sites' infos | ||
| + | <code bash> | ||
| + | cp / | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | sed -i ' | ||
| + | </ | ||
| + | * Be sure that keystone contains the OOI endpoints, otherwise it will not be published by the BDII. | ||
| + | * Create the file / | ||
| + | <code bash> | ||
| + | cat<< | ||
| + | #!/bin/sh | ||
| + | cloud-info-provider-service --yaml / | ||
| + | --middleware openstack \ | ||
| + | --os-username admin --os-password ADMIN_PASS \ | ||
| + | --os-tenant-name admin --os-auth-url https:// | ||
| + | --os-cacert / | ||
| + | EOF | ||
| + | </ | ||
| + | * Run manually the cloud-info-provider script and check that the output return the complete LDIF. To do so, execute: | ||
| + | <code bash> | ||
| + | chmod +x / | ||
| + | / | ||
| + | / | ||
| + | </ | ||
| + | * Now you can start the bdii service: | ||
| + | <code bash> | ||
| + | systemctl start bdii | ||
| + | </ | ||
| + | * Use the command below to see if the information is being published: | ||
| + | <code bash> | ||
| + | ldapsearch -x -h localhost -p 2170 -b o=glue | ||
| + | </ | ||
| + | * Do not forget to open port 2170: | ||
| + | <code bash> | ||
| + | firewall-cmd --add-port=2170/ | ||
| + | firewall-cmd --permanent --add-port=2170/ | ||
| + | systemctl restart firewalld | ||
| + | </ | ||
| + | * Information on how to set up the site-BDII in egi-cloud-sbdii.pd.infn.it is available [[https:// | ||
| + | * Add your cloud-info-provider to your site-BDII egi-cloud-sbdii.pd.infn.it by adding new lines in the site.def like this: | ||
| + | <code bash> | ||
| + | BDII_REGIONS=" | ||
| + | BDII_CLOUD_URL=" | ||
| + | BDII_BDII_URL=" | ||
| + | </ | ||
| + | ==== Use the same APEL/SSM of grid site ==== | ||
| + | * Cloud usage records are sent to APEL through the ssmsend program installed in cert-37.pd.infn.it: | ||
| + | <code bash> | ||
| + | [root@cert-37 ~]# cat / | ||
| + | # send buffered usage records to APEL | ||
| + | 30 */24 * * * root / | ||
| + | </ | ||
| + | * It si therefore neede to install and configure NFS on egi-cloud: | ||
| + | <code bash> | ||
| + | [root@egi-cloud ~]# mkdir -p / | ||
| + | [root@egi-cloud ~]# cat<< | ||
| + | / | ||
| + | EOF | ||
| + | [root@egi-cloud ~]$ systemctl status nfs-server | ||
| + | </ | ||
| + | * In case of APEL nagios probe failure, check if / | ||
| + | * To check if accounting records are properly received by APEL server look at [[http:// | ||
| + | ==== Install the new accounting system (CASO) ==== | ||
| + | |||
| + | (see [[https:// | ||
| + | |||
| + | <code bash> | ||
| + | yum -y install libffi-devel openssl-devel gcc | ||
| + | yum -y localinstall http:// | ||
| + | </ | ||
| + | * Create role and user | ||
| + | <code bash> | ||
| + | openstack user create --domain default --password ACCOUNTING_PASS accounting | ||
| + | openstack role create accounting | ||
| + | </ | ||
| + | *For each of the tenants, add the user with the accounting role | ||
| + | <code bash> | ||
| + | for i in fctf wenmr atlas ops dteam lhcb cms indigo emsodev | ||
| + | do | ||
| + | openstack role add --project $i --user accounting accounting | ||
| + | done | ||
| + | </ | ||
| + | * Edit the / | ||
| + | <code bash> | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | </ | ||
| + | *Edit the / | ||
| + | <code bash> | ||
| + | sed -i ' | ||
| + | </ | ||
| + | <code bash> | ||
| + | mkdir / | ||
| + | </ | ||
| + | *Test it | ||
| + | <code bash> | ||
| + | caso-extract -v -d | ||
| + | </ | ||
| + | * Create the cron job | ||
| + | <code bash> | ||
| + | cat << | ||
| + | # extract and send usage records to APEL/ | ||
| + | 10 * * * * root / | ||
| + | EOF | ||
| + | </ | ||
| + | ==== Install Cloudkeeper and Cloudkeeper-OS==== | ||
| + | Cloudkeeper and Cloudkeeper-OS are installed in a dedicated server (egi-cloud-ha.pn.pd.infn.it). | ||
| + | Install Cloudkeeper | ||
| + | <code bash> | ||
| + | yum localinstall -y http:// | ||
| + | </ | ||
| + | Edit / | ||
| + | <code bash> | ||
| + | - https:// | ||
| + | - https:// | ||
| + | - https:// | ||
| + | - https:// | ||
| + | - https:// | ||
| + | - https:// | ||
| + | - https:// | ||
| + | - https:// | ||
| + | - https:// | ||
| + | |||
| + | ip-address: CONTROLLER_IP # IP address NGINX can listen on | ||
| + | </ | ||
| + | |||
| + | Enable and start the service | ||
| + | |||
| + | <code bash> | ||
| + | systemctl enable cloudkeeper-cron | ||
| + | systemctl start cloudkeeper-cron | ||
| + | </ | ||
| + | |||
| + | Install Cloudkeeper-OS | ||
| + | <code bash> | ||
| + | cd / | ||
| + | wget http:// | ||
| + | cd | ||
| + | yum update | ||
| + | yum -y install cloudkeeper-os | ||
| + | </ | ||
| + | |||
| + | Create a cloudkeeper user in keystone | ||
| + | <code bash> | ||
| + | openstack user create --domain default --password CLOUDKEEPER_PASS cloudkeeper | ||
| + | </ | ||
| + | and, for each of the tenants, add the cloudkeeper user with the user role | ||
| + | <code bash> | ||
| + | for i in fctf wenmr atlas ops dteam lhcb cms indigo emsodev biomed | ||
| + | do | ||
| + | openstack role add --project $i --user cloudkeeper user | ||
| + | done | ||
| + | </ | ||
| + | Edit the etc/ | ||
| + | <code bash> | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | openstack-config --set / | ||
| + | </ | ||
| + | Edit the / | ||
| + | Enable and start the service | ||
| + | <code bash> | ||
| + | systemctl enable cloudkeeper-os | ||
| + | systemctl start cloudkeeper-os | ||
| + | </ | ||
| + | |||
| + | ==== Install Indigo IAM==== | ||
| + | |||
| + | ([[https:// | ||
| + | |||
| + | First you need to register your site on [[https:// | ||
| + | |||
| + | * install mod_auth_openidc | ||
| + | <code bash> | ||
| + | https:// | ||
| + | </ | ||
| + | |||
| + | * configure mod_auth_openidc | ||
| + | |||
| + | Edit / | ||
| + | <code bash> | ||
| + | (...) | ||
| + | < | ||
| + | |||
| + | (...) | ||
| + | |||
| + | OIDCClaimPrefix | ||
| + | OIDCResponseType | ||
| + | OIDCScope | ||
| + | OIDCProviderMetadataURL | ||
| + | OIDCClientID | ||
| + | OIDCClientSecret | ||
| + | OIDCProviderTokenEndpointAuth | ||
| + | OIDCCryptoPassphrase | ||
| + | OIDCRedirectURI | ||
| + | |||
| + | # The JWKs URL on which the Authorization publishes the keys used to sign its JWT access tokens. | ||
| + | # When not defined local validation of JWTs can still be done using statically configured keys, | ||
| + | # by setting OIDCOAuthVerifyCertFiles and/or OIDCOAuthVerifySharedKeys. | ||
| + | OIDCOAuthVerifyJwksUri " | ||
| + | |||
| + | < | ||
| + | AuthType | ||
| + | Require | ||
| + | LogLevel | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | AuthType | ||
| + | Require | ||
| + | LogLevel | ||
| + | </ | ||
| + | |||
| + | (...) | ||
| + | |||
| + | </ | ||
| + | </ | ||
| + | Substitute the following values: | ||
| + | <code bash> | ||
| + | <CLIENT ID>: Client ID as obtained from the IAM. | ||
| + | <CLIENT SECRET>: Client Secret as obtained from the IAM. | ||
| + | < | ||
| + | < | ||
| + | </ | ||
| + | * Edit / | ||
| + | |||
| + | <code bash> | ||
| + | [auth] | ||
| + | methods = external, | ||
| + | oidc = keystone.auth.plugins.mapped.Mapped | ||
| + | |||
| + | [oidc] | ||
| + | remote_id_attribute = HTTP_OIDC_ISS | ||
| + | |||
| + | [federation] | ||
| + | remote_id_attribute = HTTP_OIDC_ISS | ||
| + | trusted_dashboard = https://< | ||
| + | sso_callback_template = / | ||
| + | </ | ||
| + | * Ensure that / | ||
| + | |||
| + | * Keystone Groups, Projects and Mapping setup | ||
| + | <code bash> | ||
| + | openstack group create indigo_group --description " | ||
| + | openstack project create indigo --description " | ||
| + | openstack role add user --group indigo_group --project indigo | ||
| + | openstack role add user --group indigo_group --domain default | ||
| + | </ | ||
| + | |||
| + | Now the federation plugin needs to be setup | ||
| + | |||
| + | * Load the mapping as follows | ||
| + | <code bash> | ||
| + | openstack identity provider create indigo-dc --remote-id https:// | ||
| + | openstack federation protocol create oidc --identity-provider indigo-dc --mapping indigo_mapping | ||
| + | openstack mapping set --rules indigo_mapping.json indigo_mapping | ||
| + | |||
| + | </ | ||
| + | |||
| + | * OpenStack Dashboard (Horizon) Configuration | ||
| + | Edit / | ||
| + | |||
| + | <code bash> | ||
| + | WEBSSO_ENABLED = True | ||
| + | WEBSSO_INITIAL_CHOICE = " | ||
| + | |||
| + | WEBSSO_CHOICES = ( | ||
| + | (" | ||
| + | (" | ||
| + | ) | ||
| + | </ | ||
| + | ==== Local Monitoring ==== | ||
| + | === Ganglia === | ||
| + | * Install ganglia-gmond on all servers | ||
| + | * Configure cluster and host fields in / | ||
| + | * Finally: systemctl enable gmond.service; | ||
| + | === Nagios === | ||
| + | * Install on compute nodes ncsa-client, | ||
| + | * Copy the file cld-nagios:/ | ||
| + | * Then do in all compute nodes: | ||
| + | <code bash> | ||
| + | $ echo encryption_method=1 > / | ||
| + | $ usermod -a -G libvirtd nagios | ||
| + | $ sed -i ' | ||
| + | # then be sure the files below are in / | ||
| + | $ ls / | ||
| + | check_kvm | ||
| + | $ cat <<EOF > crontab.txt | ||
| + | # Puppet Name: nagios_check_kvm | ||
| + | 0 */1 * * * / | ||
| + | EOF | ||
| + | $ crontab crontab.txt | ||
| + | $ crontab -l | ||
| + | </ | ||
| + | * On the contoller node check if $ sed -i ' | ||
| + | * On the cld-nagios server check/ | ||
| + | |||
| + | ==== Security incindents and IP traceability ==== | ||
| + | See [[https:// | ||
| + | On egi-cloud do install the [[https:// | ||
| + | <code bash> | ||
| + | [root@egi-cloud ~]# os-ip-trace 90.147.77.229 | ||
| + | +--------------------------------------+-----------+---------------------+---------------------+ | ||
| + | | device id | user name | | ||
| + | +--------------------------------------+-----------+---------------------+---------------------+ | ||
| + | | 3002b1f1-bca3-4e4f-b21e-8de12c0b926e | | ||
| + | +--------------------------------------+-----------+---------------------+---------------------+ | ||
| + | </ | ||
| + | Save and archive important log files: | ||
| + | * On egi-cloud and each compute node cloud-0%, add the line "*.* @@192.168.60.31: | ||
| + | * In cld-foreman, | ||
| + | Install ulogd in the controller node | ||
| + | <code bash> | ||
| + | yum install -y libnetfilter_log | ||
| + | yum localinstall -y http:// | ||
| + | yum localinstall -y http:// | ||
| + | </ | ||
| + | and configure / | ||
| + | Start the service | ||
| + | <code bash> | ||
| + | systemctl enable ulogd | ||
| + | systemctl start ulogd | ||
| + | </ | ||
| + | Finally, be sure that / | ||
| + | |||
| + | ==== Troubleshooting ==== | ||
| + | |||
| + | * Passwordless ssh access to egi-cloud from cld-nagios and from egi-cloud to cloud-0* has been already configured | ||
| + | * If cld-nagios does not ping egi-cloud, be sure that the rule "route add -net 192.168.60.0 netmask 255.255.255.0 gw 192.168.114.1" | ||
| + | * In case of Nagios alarms, try to restart all cloud services doing the following: | ||
| + | <code bash> | ||
| + | $ ssh root@egi-cloud | ||
| + | [root@egi-cloud ~]# ./ | ||
| + | [root@egi-cloud ~]# for i in $(seq 1 6); do ssh cloud-0$i.pn.pd.infn.it ./ | ||
| + | </ | ||
| + | * Resubmit the Nagios probe and check if it works again | ||
| + | * In case the problem persist, check the consistency of the DB by executing (this also fix the issue when quota overview in the dashboard is not consistent with actual VMs active): | ||
| + | <code bash> | ||
| + | [root@egi-cloud ~]# python nova-quota-sync.py | ||
| + | </ | ||
| + | * In case of EGI Nagios alarm, check that the user running the Nagios probes is not belonging also to tenants other than " | ||
| + | |||
| + | * in case of reboot of egi-cloud server: | ||
| + | * check its network configuration (use IPMI if not reachable): all 4 interfaces must be up and the default gateway must be 90.147.77.254. | ||
| + | * check DNS in / | ||
| + | * check routing with $route -n, if needed do: $ip route replace default via 90.147.77.254. Also be sure to have a route for 90.147.77.0 network. | ||
| + | * check if storage mountpoints 192.168.61.100:/ | ||
| + | |||
| + | * in case of reboot of cloud-0* server (use IPMI if not reachable): all 3 interfaces must be up and the default destination must have both 192.168.114.1 and 192.168.115.1 gateways | ||
| + | * check its network configuration | ||
| + | * check if all partitions in /etc/fstab are properly mounted (do: $ df -h) | ||
| + | |||
| + | * In case of network instabilities, | ||
| + | <code bash> | ||
| + | [root@egi-cloud ~]# / | ||
| + | generic-receive-offload: | ||
| + | </ | ||
| + | |||
| + | * Also check if / | ||
| + | <code bash> | ||
| + | [root@egi-cloud ~]# cat / | ||
| + | #!/bin/bash | ||
| + | case " | ||
| + | em1) | ||
| + | / | ||
| + | ;; | ||
| + | em2) | ||
| + | / | ||
| + | ;; | ||
| + | em3) | ||
| + | / | ||
| + | ;; | ||
| + | em4) | ||
| + | / | ||
| + | ;; | ||
| + | esac | ||
| + | exit 0 | ||
| + | </ | ||
| + | |||
| + | * If you need to change the project quotas, do not forget to apply the change to both tenantId and tenantName, due to a knonw bug, e.g.: | ||
| + | <code bash> | ||
| + | [root@egi-cloud ~]# source admin-openrc.sh | ||
| + | [root@egi-cloud ~]# tenantId=$(openstack project list | grep fctf | awk ' | ||
| + | [root@egi-cloud ~]# nova quota-update --instances 40 --cores 40 --ram 81840 $tenantId | ||
| + | [root@egi-cloud ~]# nova quota-update --instances 40 --cores 40 --ram 81840 fctf | ||
| + | [root@egi-cloud ~]# neutron quota-update --floatingip 1 --tenant-id $tenantId | ||
| + | [root@egi-cloud ~]# neutron quota-update --floatingip 1 --tenant-id fctf | ||
| + | </ | ||
