progetti:cloud-areapd:egi_federated_cloud:liberty-centos7_testbed
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
progetti:cloud-areapd:egi_federated_cloud:liberty-centos7_testbed [2017/06/20 13:17] – [Install vmcatcher/glancepush] segatta@infn.it | progetti:cloud-areapd:egi_federated_cloud:liberty-centos7_testbed [2017/10/03 12:34] (current) – [Local Monitoring] segatta@infn.it | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Liberty-CentOS7 Testbed ====== | ||
+ | Fully integrated Resource Provider [[https:// | ||
+ | === EGI Monitoring/ | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | === Local Monitoring/ | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ | === Local dashboard === | ||
+ | * [[http:// | ||
+ | ===== Layout ===== | ||
+ | |||
+ | * Controller + Network node + Storage node + Telemetry service + Orchestration service: **egi-cloud.pd.infn.it** | ||
+ | |||
+ | * Compute nodes: **cloud-01: | ||
+ | | ||
+ | * NoSQL database: **cld-mongo-egi.cloud.pd.infn.it** | ||
+ | |||
+ | * OneData provider: **one-data-01.pd.infn.it** | ||
+ | |||
+ | * Network layout available [[http:// | ||
+ | |||
+ | |||
+ | ===== OpenStack configuration ===== | ||
+ | * Controller/ | ||
+ | * We created one tenant for each EGI FedCloud VO supported, a router and various nets and subnets obtaining the following network topology: | ||
+ | {{: | ||
+ | *We mount the partitions for the glance and cinder services form 192.168.61.100 with nfs driver | ||
+ | <code bash> | ||
+ | yum install -y nfs-utils | ||
+ | mkdir -p / | ||
+ | mkdir -p / | ||
+ | cat<< | ||
+ | 192.168.61.100:/ | ||
+ | 192.168.61.100:/ | ||
+ | EOF | ||
+ | mount -a | ||
+ | </ | ||
+ | *We use some specific configurations for cinder and neutron services using the following documentation: | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | | ||
+ | * The telemetry service uses a NoSQL database then we install [[http:// | ||
+ | ===== EGI FedCloud specific configuration ===== | ||
+ | |||
+ | (see [[https:// | ||
+ | |||
+ | * Install CAs Certificates and the software for fetching the CRLs in both Controller (egi-cloud) and Compute (cloud-01: | ||
+ | <code bash> | ||
+ | systemctl stop httpd | ||
+ | curl -L http:// | ||
+ | yum install -y ca-policy-egi-core fetch-crl | ||
+ | systemctl enable fetch-crl-cron.service | ||
+ | systemctl start fetch-crl-cron.service | ||
+ | </ | ||
+ | |||
+ | ==== Install OpenStack Keystone-VOMS module ==== | ||
+ | (see [[https:// | ||
+ | * Prepare to run keystone as WSGI app in SSL | ||
+ | <code bash> | ||
+ | yum install -y voms mod_ssl | ||
+ | |||
+ | APACHE_LOG_DIR=/ | ||
+ | |||
+ | cat << | ||
+ | Listen 5000 | ||
+ | WSGIDaemonProcess keystone user=keystone group=keystone processes=8 threads=1 | ||
+ | < | ||
+ | LogLevel | ||
+ | ErrorLog | ||
+ | CustomLog | ||
+ | |||
+ | SSLEngine | ||
+ | SSLCertificateFile | ||
+ | SSLCertificateKeyFile | ||
+ | SSLCACertificatePath | ||
+ | SSLCARevocationPath | ||
+ | SSLVerifyClient | ||
+ | SSLVerifyDepth | ||
+ | SSLProtocol | ||
+ | SSLCipherSuite | ||
+ | SSLOptions | ||
+ | |||
+ | WSGIScriptAlias / / | ||
+ | WSGIProcessGroup keystone | ||
+ | </ | ||
+ | |||
+ | Listen 35357 | ||
+ | WSGIDaemonProcess | ||
+ | < | ||
+ | LogLevel | ||
+ | ErrorLog | ||
+ | CustomLog | ||
+ | |||
+ | SSLEngine | ||
+ | SSLCertificateFile | ||
+ | SSLCertificateKeyFile | ||
+ | SSLCACertificatePath | ||
+ | SSLCARevocationPath | ||
+ | SSLVerifyClient | ||
+ | SSLVerifyDepth | ||
+ | SSLProtocol | ||
+ | SSLCipherSuite | ||
+ | SSLOptions | ||
+ | |||
+ | WSGIScriptAlias | ||
+ | WSGIProcessGroup | ||
+ | </ | ||
+ | EOF | ||
+ | </ | ||
+ | * Check and in case install the host certificate for your server in / | ||
+ | <code bash> | ||
+ | [root@egi-cloud]# | ||
+ | -rw-r--r--. | ||
+ | -rw-------. | ||
+ | </ | ||
+ | * take the file [[http:// | ||
+ | * copy it to / | ||
+ | <code bash> | ||
+ | echo " | ||
+ | rm -Rf / | ||
+ | mkdir -p / | ||
+ | curl http:// | ||
+ | ln / | ||
+ | ln / | ||
+ | chown -R keystone: | ||
+ | systemctl start httpd | ||
+ | </ | ||
+ | * Installing the Keystone-VOMS module: | ||
+ | <code bash> | ||
+ | git clone git:// | ||
+ | cd keystone-voms | ||
+ | pip install . | ||
+ | </ | ||
+ | * Enable the Keystone VOMS module | ||
+ | <code bash> | ||
+ | sed -i ' | ||
+ | echo " | ||
+ | openstack-config --set / | ||
+ | sed -i ' | ||
+ | </ | ||
+ | * Configuring the Keystone VOMS module | ||
+ | <code bash> | ||
+ | echo " | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | </ | ||
+ | <code bash> | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | for i in ops atlas lhcb cms | ||
+ | do | ||
+ | mkdir -p / | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | cat > / | ||
+ | / | ||
+ | / | ||
+ | EOF | ||
+ | done | ||
+ | </ | ||
+ | <code bash> | ||
+ | cat << | ||
+ | { | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | }, | ||
+ | " | ||
+ | " | ||
+ | } | ||
+ | } | ||
+ | EOF | ||
+ | </ | ||
+ | * Adjust manually the keystone catalog in order the identity backend points to the correct URLs: | ||
+ | * public URL: https:// | ||
+ | * admin URL: https:// | ||
+ | * internal URL: https:// | ||
+ | <code bash> | ||
+ | mysql> use keystone; | ||
+ | mysql> update endpoint set url=" | ||
+ | mysql> update endpoint set url=" | ||
+ | mysql> select id,url from endpoint; | ||
+ | should show lines with the above URLs. | ||
+ | </ | ||
+ | * Replace http with https in auth_[protocol, | ||
+ | * Replace http with https in auth_[protocol, | ||
+ | * Also check if " | ||
+ | |||
+ | |||
+ | ==== Install the OOI API ==== | ||
+ | |||
+ | |||
+ | (see [[https:// | ||
+ | |||
+ | (only on Controller node) | ||
+ | |||
+ | Install INDIGO - DataCloud repositories | ||
+ | <code bash> | ||
+ | rpm --import http:// | ||
+ | yum localinstall -y indigodc-release-1.0.0-1.el7.centos.noarch.rpm | ||
+ | </ | ||
+ | |||
+ | and configuration file, / | ||
+ | |||
+ | <code bash> | ||
+ | [ main ] | ||
+ | enabled = 1 | ||
+ | check_obsoletes = 1 | ||
+ | </ | ||
+ | |||
+ | Install ooi | ||
+ | <code bash> | ||
+ | yum -y install python-ooi | ||
+ | </ | ||
+ | |||
+ | and edit the / | ||
+ | <code bash> | ||
+ | cat <<EOF >>/ | ||
+ | |||
+ | ######## | ||
+ | # OOI # | ||
+ | ######## | ||
+ | |||
+ | [composite: | ||
+ | use = call: | ||
+ | /occi1.1: occi_api_11 | ||
+ | |||
+ | [filter: | ||
+ | paste.filter_factory = ooi.wsgi: | ||
+ | openstack_version = /v2.1 | ||
+ | |||
+ | [composite: | ||
+ | use = call: | ||
+ | noauth2 = compute_req_id faultwrap sizelimit noauth2 occi osapi_compute_app_v21 | ||
+ | keystone = compute_req_id faultwrap sizelimit authtoken keystonecontext occi osapi_compute_app_v21 | ||
+ | EOF | ||
+ | </ | ||
+ | * Make sure the API occiapi is enabled in the / | ||
+ | <code bash> | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | </ | ||
+ | * Configure nova to use the / | ||
+ | <code bash> | ||
+ | sed -i ' | ||
+ | </ | ||
+ | * Add this line in / | ||
+ | <code bash> | ||
+ | openstack-config --set / | ||
+ | </ | ||
+ | * modify the / | ||
+ | <code bash> | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | </ | ||
+ | * and restart the nova-* services: | ||
+ | <code bash> | ||
+ | for i in nova-api nova-cert nova-consoleauth nova-scheduler nova-conductor nova-novncproxy; | ||
+ | </ | ||
+ | * Register service in Keystone: | ||
+ | <code bash> | ||
+ | openstack service create --name occi --description "OCCI Interface" | ||
+ | openstack endpoint create --region RegionOne occi public https:// | ||
+ | openstack endpoint create --region RegionOne occi internal https:// | ||
+ | openstack endpoint create --region RegionOne occi admin https:// | ||
+ | </ | ||
+ | * Enable SSL connection on port 8787, by creating the file / | ||
+ | <code bash> | ||
+ | cat <<EOF > / | ||
+ | #LoadModule proxy_http_module modules/ | ||
+ | # | ||
+ | # Proxy Server directives. Uncomment the following lines to | ||
+ | # enable the proxy server: | ||
+ | #LoadModule proxy_module / | ||
+ | #LoadModule proxy_http_module / | ||
+ | #LoadModule substitute_module / | ||
+ | |||
+ | Listen 8787 | ||
+ | < | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | < | ||
+ | # Do not enable proxying with ProxyRequests until you have secured | ||
+ | # your server. | ||
+ | # Open proxy servers are dangerous both to your network and to the | ||
+ | # Internet at large. | ||
+ | | ||
+ | |||
+ | < | ||
+ | Order deny, | ||
+ | Deny from all | ||
+ | </ | ||
+ | |||
+ | | ||
+ | | ||
+ | < | ||
+ | | ||
+ | | ||
+ | Order allow,deny | ||
+ | Allow from all | ||
+ | </ | ||
+ | |||
+ | </ | ||
+ | </ | ||
+ | EOF | ||
+ | </ | ||
+ | *Now restart http service | ||
+ | <code bash> | ||
+ | systemctl restart httpd | ||
+ | </ | ||
+ | |||
+ | ==== Install rOCCI Client ==== | ||
+ | |||
+ | For complete guide about the rOCCI Client see [[https:// | ||
+ | |||
+ | ==== Install FedCloud BDII ==== | ||
+ | (See [[https:// | ||
+ | |||
+ | * Installing the resource bdii and the cloud-info-provider: | ||
+ | <code bash> | ||
+ | yum install bdii -y | ||
+ | git clone https:// | ||
+ | cd BDIIscripts | ||
+ | pip install . | ||
+ | </ | ||
+ | * Customize the configuration file with the local sites' infos | ||
+ | <code bash> | ||
+ | cp / | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | sed -i ' | ||
+ | </ | ||
+ | * Be sure that keystone contains the OOI endpoints, otherwise it will not be published by the BDII. | ||
+ | * By default, the provider script will filter images without marketplace uri defined into the marketplace or vmcatcher_event_ad_mpuri property. If you want to list all the images templates (included local snapshots), set the variable ' | ||
+ | * Create the file / | ||
+ | <code bash> | ||
+ | cat<< | ||
+ | #!/bin/sh | ||
+ | cloud-info-provider-service --yaml / | ||
+ | --middleware openstack \ | ||
+ | --os-username admin --os-password ADMIN_PASS \ | ||
+ | --os-tenant-name admin --os-auth-url https:// | ||
+ | EOF | ||
+ | </ | ||
+ | * Run manually the cloud-info-provider script and check that the output return the complete LDIF. To do so, execute: | ||
+ | <code bash> | ||
+ | chmod +x / | ||
+ | / | ||
+ | </ | ||
+ | * Now you can start the bdii service: | ||
+ | <code bash> | ||
+ | systemctl start bdii | ||
+ | </ | ||
+ | * Use the command below to see if the information is being published: | ||
+ | <code bash> | ||
+ | ldapsearch -x -h localhost -p 2170 -b o=glue | ||
+ | </ | ||
+ | * Do not forget to open port 2170: | ||
+ | <code bash> | ||
+ | firewall-cmd --add-port=2170/ | ||
+ | firewall-cmd --permanent --add-port=2170/ | ||
+ | systemctl restart firewalld | ||
+ | </ | ||
+ | * Information on how to set up the site-BDII in egi-cloud-sbdii.pd.infn.it is available [[https:// | ||
+ | * Add your cloud-info-provider to your site-BDII egi-cloud-sbdii.pd.infn.it by adding new lines in the site.def like this: | ||
+ | <code bash> | ||
+ | BDII_REGIONS=" | ||
+ | BDII_CLOUD_URL=" | ||
+ | BDII_BDII_URL=" | ||
+ | </ | ||
+ | |||
+ | ==== Install vmcatcher/ | ||
+ | |||
+ | (see [[https:// | ||
+ | |||
+ | * VMcatcher allows users to subscribe to virtual machine Virtual Machine image lists, cache the images referenced to in the Virtual Machine Image List, validate the images list with x509 based public key cryptography, | ||
+ | |||
+ | <code bash> | ||
+ | useradd -m -b /opt stack | ||
+ | STACKHOME=/ | ||
+ | yum install -y m2crypto python2-setuptools | ||
+ | pip install nose | ||
+ | git clone https:// | ||
+ | git clone https:// | ||
+ | git clone https:// | ||
+ | wget http:// | ||
+ | wget http:// | ||
+ | tar -zxvf python-glancepush-0.0.6.tar.gz -C $STACKHOME/ | ||
+ | tar -zxvf gpvcmupdate-0.0.7.tar.gz -C $STACKHOME/ | ||
+ | for i in hepixvmitrust smimeX509validation vmcatcher $STACKHOME/ | ||
+ | do | ||
+ | cd $i | ||
+ | python setup.py install | ||
+ | echo exit code=$? | ||
+ | cd | ||
+ | done | ||
+ | </ | ||
+ | <code bash> | ||
+ | mkdir -p / | ||
+ | mkdir -p $STACKHOME/ | ||
+ | mkdir -p / | ||
+ | ln / | ||
+ | sed -i ' | ||
+ | </ | ||
+ | * Now for each VO/tenant you have in voms.json write a file like this: | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# su - stack | ||
+ | [stack@egi-cloud ~]# cat << EOF > / | ||
+ | [general] | ||
+ | # Tenant for this VO. Must match the tenant defined in voms.json file | ||
+ | testing_tenant=dteam | ||
+ | # Identity service endpoint (Keystone) | ||
+ | endpoint_url=https:// | ||
+ | # User Password | ||
+ | password=ADMIN_PASS | ||
+ | # User | ||
+ | username=admin | ||
+ | # Set this to true if you're NOT using self-signed certificates | ||
+ | is_secure=True | ||
+ | # SSH private key that will be used to perform policy checks (to be done) | ||
+ | ssh_key=/ | ||
+ | # WARNING: Only define the next variable if you're going to need it. Otherwise you may encounter problems | ||
+ | # | ||
+ | EOF | ||
+ | </ | ||
+ | * and for images not belonging to any VO use the admin tenant | ||
+ | <code bash> | ||
+ | [stack@egi-cloud ~]# cat << EOF > / | ||
+ | [general] | ||
+ | # Tenant for this VO. Must match the tenant defined in voms.json file | ||
+ | testing_tenant=admin | ||
+ | # Identity service endpoint (Keystone) | ||
+ | endpoint_url=https:// | ||
+ | # User Password | ||
+ | password=ADMIN_PASS | ||
+ | # User | ||
+ | username=admin | ||
+ | # Set this to true if you're NOT using self-signed certificates | ||
+ | is_secure=True | ||
+ | # SSH private key that will be used to perform policy checks (to be done) | ||
+ | ssh_key=/ | ||
+ | # WARNING: Only define the next variable if you're going to need it. Otherwise you may encounter problems | ||
+ | # | ||
+ | EOF | ||
+ | </ | ||
+ | <code bash> | ||
+ | chown stack:stack -R / | ||
+ | </ | ||
+ | * Check that vmcatcher is running properly by listing and subscribing to an image list | ||
+ | <code bash> | ||
+ | cat << | ||
+ | export VMCATCHER_RDBMS=" | ||
+ | export VMCATCHER_CACHE_DIR_CACHE=" | ||
+ | export VMCATCHER_CACHE_DIR_DOWNLOAD=" | ||
+ | export VMCATCHER_CACHE_DIR_EXPIRE=" | ||
+ | EOF | ||
+ | [stack@egi-cloud ~]# export VMCATCHER_RDBMS=" | ||
+ | [stack@egi-cloud ~]# vmcatcher_subscribe -l | ||
+ | [stack@egi-cloud ~]# vmcatcher_subscribe -e -s https://< | ||
+ | [stack@ocp-ctrl ~]$ vmcatcher_subscribe -l | ||
+ | 76fdee70-8119-5d33-9f40-3c57e1c60df1 | ||
+ | </ | ||
+ | * Create a CRON wrapper for vmcatcher, named $STACKHOME/ | ||
+ | <code bash> | ||
+ | cat<< | ||
+ | #!/bin/bash | ||
+ | #Cron handler for VMCatcher image syncronization script for OpenStack | ||
+ | |||
+ | |||
+ | #Vmcatcher configuration variables | ||
+ | export VMCATCHER_RDBMS=" | ||
+ | export VMCATCHER_CACHE_DIR_CACHE=" | ||
+ | export VMCATCHER_CACHE_DIR_DOWNLOAD=" | ||
+ | export VMCATCHER_CACHE_DIR_EXPIRE=" | ||
+ | export VMCATCHER_CACHE_EVENT=" | ||
+ | |||
+ | |||
+ | #Update vmcatcher image lists | ||
+ | / | ||
+ | |||
+ | |||
+ | #Add all the new images to the cache | ||
+ | for a in \$(/ | ||
+ | / | ||
+ | done | ||
+ | |||
+ | |||
+ | #Update the cache | ||
+ | / | ||
+ | |||
+ | |||
+ | #Run glancepush | ||
+ | python / | ||
+ | EOF | ||
+ | </ | ||
+ | * Add admin user to the tenants and set the right ownership to directories | ||
+ | <code bash> | ||
+ | for vo in atlas cms lhcb dteam ops wenmr fctf indigo | ||
+ | do | ||
+ | openstack role add --project $vo --user admin _member_ | ||
+ | done | ||
+ | |||
+ | chown -R stack:stack $STACKHOME | ||
+ | </ | ||
+ | * Test that the vmcatcher handler is working correctly by running: | ||
+ | <code bash> | ||
+ | chmod +x $STACKHOME/ | ||
+ | chown -R stack:stack $STACKHOME | ||
+ | </ | ||
+ | * Add the following line to the stack user crontab: | ||
+ | <code bash> | ||
+ | 50 */6 * * * $STACKHOME/ | ||
+ | </ | ||
+ | * Useful links for getting VO-wide image lists that need authentication to AppDB: [[https:// | ||
+ | |||
+ | ==== Use the same APEL/SSM of grid site ==== | ||
+ | * Cloud usage records are sent to APEL through the ssmsend program installed in cert-37.pd.infn.it: | ||
+ | <code bash> | ||
+ | [root@cert-37 ~]# cat / | ||
+ | # send buffered usage records to APEL | ||
+ | 30 */24 * * * root / | ||
+ | </ | ||
+ | * It si therefore neede to install and configure NFS on egi-cloud: | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# mkdir -p / | ||
+ | [root@egi-cloud ~]# cat<< | ||
+ | / | ||
+ | EOF | ||
+ | [root@egi-cloud ~]$ systemctl status nfs-server | ||
+ | </ | ||
+ | * In case of APEL nagios probe failure, check if / | ||
+ | * To check if accounting records are properly received by APEL server look at [[http:// | ||
+ | |||
+ | ==== Install the new accounting system (CASO) ==== | ||
+ | |||
+ | (see [[https:// | ||
+ | | ||
+ | <code bash> | ||
+ | yum -y install libffi-devel openssl-devel gcc | ||
+ | pip install caso | ||
+ | </ | ||
+ | * Create role and user | ||
+ | <code bash> | ||
+ | openstack user create --domain default --password ACCOUNTING_PASS accounting | ||
+ | openstack role create accounting | ||
+ | </ | ||
+ | *For each of the tenants, add the user with the accounting role | ||
+ | <code bash> | ||
+ | for i in fctf wenmr atlas ops dteam lhcb cms indigo | ||
+ | do | ||
+ | openstack role add --project $i --user accounting accounting | ||
+ | done | ||
+ | </ | ||
+ | * Edit the / | ||
+ | <code bash> | ||
+ | cp / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | openstack-config --set / | ||
+ | </ | ||
+ | *Edit the / | ||
+ | <code bash> | ||
+ | sed -i ' | ||
+ | </ | ||
+ | <code bash> | ||
+ | mkdir / | ||
+ | </ | ||
+ | *Test it | ||
+ | <code bash> | ||
+ | caso-extract -v -d | ||
+ | </ | ||
+ | * Create the cron job | ||
+ | <code bash> | ||
+ | cat << | ||
+ | # extract and send usage records to APEL/ | ||
+ | 10 * * * * root / | ||
+ | EOF | ||
+ | </ | ||
+ | ==== Local Monitoring ==== | ||
+ | === Ganglia === | ||
+ | * Install ganglia-gmond on all servers | ||
+ | * Configure cluster and host fields in / | ||
+ | * Finally: systemctl enable gmond.service; | ||
+ | === Nagios === | ||
+ | * Install on compute nodes ncsa-client, | ||
+ | * Copy the file cld-nagios:/ | ||
+ | * Then do in all compute nodes: | ||
+ | <code bash> | ||
+ | $ echo encryption_method=1 > / | ||
+ | $ usermod -a -G libvirtd nagios | ||
+ | $ sed -i ' | ||
+ | # then be sure the files below are in / | ||
+ | $ ls / | ||
+ | check_kvm | ||
+ | $ cat <<EOF > crontab.txt | ||
+ | # Puppet Name: nagios_check_kvm | ||
+ | 0 */1 * * * / | ||
+ | # Puppet Name: nagios_check_ovs | ||
+ | */10 * * * * / | ||
+ | EOF | ||
+ | $ crontab crontab.txt | ||
+ | $ crontab -l | ||
+ | </ | ||
+ | * On the contoller node check if $ sed -i ' | ||
+ | * On the cld-nagios server check/ | ||
+ | |||
+ | ==== Security incindents and IP traceability ==== | ||
+ | * See [[https:// | ||
+ | * On egi-cloud do install the [[https:// | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# os-ip-trace 90.147.77.229 | ||
+ | +--------------------------------------+-----------+---------------------+---------------------+ | ||
+ | | device id | user name | | ||
+ | +--------------------------------------+-----------+---------------------+---------------------+ | ||
+ | | 3002b1f1-bca3-4e4f-b21e-8de12c0b926e | | ||
+ | +--------------------------------------+-----------+---------------------+---------------------+ | ||
+ | </ | ||
+ | * Save and archive important log files: | ||
+ | * On egi-cloud and each compute node cloud-0%, add the line "*.* @@192.168.60.31: | ||
+ | * In cld-foreman, | ||
+ | * Install ulogd in the controller node: | ||
+ | * In egi-cloud make yum install libnetfilter_log and yum local install libnetfilter_acct-1.0.2-3.el7.lux.1.x86_64.rpm ulogd-2.0.4-3.el7.lux.1.x86_64.rpm (these files are in cld-ctrl-01:/ | ||
+ | * Configure / | ||
+ | * Then copy cld-ctrl-01:/ | ||
+ | * Finally, be sure that / | ||
+ | |||
+ | ==== Troubleshooting ==== | ||
+ | |||
+ | * Passwordless ssh access to egi-cloud from cld-nagios and from egi-cloud to cloud-0* has been already configured | ||
+ | * If cld-nagios does not ping egi-cloud, be sure that the rule "route add -net 192.168.60.0 netmask 255.255.255.0 gw 192.168.114.1" | ||
+ | * In case of Nagios alarms, try to restart all cloud services doing the following: | ||
+ | <code bash> | ||
+ | $ ssh root@egi-cloud | ||
+ | [root@egi-cloud ~]# ./ | ||
+ | [root@egi-cloud ~]# for i in $(seq 1 6); do ssh cloud-0$i.pn.pd.infn.it ./ | ||
+ | </ | ||
+ | * Resubmit the Nagios probe and check if it works again | ||
+ | * In case the problem persist, check the consistency of the DB by executing (this also fix the issue when quota overview in the dashboard is not consistent with actual VMs active): | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# python nova-quota-sync.py | ||
+ | </ | ||
+ | * In case of EGI Nagios alarm, check that the user running the Nagios probes is not belonging also to tenants other than " | ||
+ | |||
+ | * in case of reboot of egi-cloud server: | ||
+ | * check its network configuration (use IPMI if not reachable): all 4 interfaces must be up and the default gateway must be 90.147.77.254. | ||
+ | * check DNS in / | ||
+ | * check routing with $route -n, if needed do: $ip route replace default via 90.147.77.254. Also be sure to have a route for 90.147.77.0 network. | ||
+ | * check if storage mountpoints 192.168.61.100:/ | ||
+ | |||
+ | * in case of reboot of cloud-0* server (use IPMI if not reachable): all 3 interfaces must be up and the default destination must have both 192.168.114.1 and 192.168.115.1 gateways | ||
+ | * check its network configuration | ||
+ | * check if all partitions in /etc/fstab are properly mounted (do: $ df -h) | ||
+ | |||
+ | * if the VM doesn' | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# cat / | ||
+ | DEVICE=br-ex | ||
+ | DEVICETYPE=ovs | ||
+ | TYPE=OVSBridge | ||
+ | BOOTPROTO=static | ||
+ | IPADDR=90.147.77.223 | ||
+ | NETMASK=255.255.255.0 | ||
+ | GATEWAY=90.147.77.254 | ||
+ | ONBOOT=yes | ||
+ | |||
+ | [root@egi-cloud ~]# cat / | ||
+ | DEVICE=em3 | ||
+ | ONBOOT=yes | ||
+ | VLAN=yes | ||
+ | BOOTPROTO=none | ||
+ | OVS_BRIDGE=br-ex | ||
+ | TYPE=OVSPort | ||
+ | DEVICETYPE=ovs | ||
+ | </ | ||
+ | * In case of network instabilities, | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# / | ||
+ | generic-receive-offload: | ||
+ | </ | ||
+ | * Also check if / | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# cat / | ||
+ | #!/bin/bash | ||
+ | case " | ||
+ | em1) | ||
+ | / | ||
+ | ;; | ||
+ | em2) | ||
+ | / | ||
+ | ;; | ||
+ | em3) | ||
+ | / | ||
+ | ;; | ||
+ | em4) | ||
+ | / | ||
+ | ;; | ||
+ | esac | ||
+ | exit 0 | ||
+ | </ | ||
+ | |||
+ | * If you need to change the project quotas, do not forget to apply the change to both tenantId and tenantName, due to a knonw bug, e.g.: | ||
+ | <code bash> | ||
+ | [root@egi-cloud ~]# source admin-openrc.sh | ||
+ | [root@egi-cloud ~]# tenantId=$(openstack project list | grep fctf | awk ' | ||
+ | [root@egi-cloud ~]# nova quota-update --instances 40 --cores 40 --ram 81840 $tenantId | ||
+ | [root@egi-cloud ~]# nova quota-update --instances 40 --cores 40 --ram 81840 fctf | ||
+ | [root@egi-cloud ~]# neutron quota-update --floatingip 1 --tenant-id $tenantId | ||
+ | [root@egi-cloud ~]# neutron quota-update --floatingip 1 --tenant-id fctf | ||
+ | </ |