====== Liberty-CentOS7 Testbed ======
Fully integrated Resource Provider [[https://wiki.egi.eu/wiki/Fedcloud-tf:ResourceProviders#Fully_integrated_Resource_Providers|INFN-PADOVA-STACK]] in production since 26 September 2016 to 20 August 2017.
=== EGI Monitoring/Accounting ===
* [[https://goc.egi.eu/portal/index.php?Page_Type=Site&id=1024|GOCDB static info]]
* [[https://wiki.egi.eu/wiki/Federated_Cloud_infrastructure_status|Overall EGI FedCloud static info]]
* [[http://argo.egi.eu/lavoisier/site_reports?ngi=NGI_IT&report=Critical&accept=html|ARGO availability]]
* [[https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?hostgroup=site-INFN-PADOVA-STACK&style=detail|EGI Nagios]]
* [[https://accounting.egi.eu/cloud/site/INFN-PADOVA-STACK/sum_elap/VO/6M/|EGI Accounting]]
=== Local Monitoring/Accounting ===
* [[http://cld-ganglia.cloud.pd.infn.it/ganglia/?m=load_one&r=hour&s=descending&c=Cloud+Padovana&h=egi-cloud&sh=1&hc=4&z=small|Local Ganglia]]
* [[http://cld-ganglia.cloud.pd.infn.it/ganglia/graph_all_periods.php?title=INFN-PADOVA-STACK+load_one&vl=load&x=&n=&hreg%5B%5D=egi-cloud%7Ccloud-0&mreg%5B%5D=load_one>ype=line&glegend=show&aggregate=1|Local Ganglia Load Aggregated]]
* [[http://cld-ganglia.cloud.pd.infn.it/ganglia/graph_all_periods.php?title=INFN-PADOVA-STACK+bytes&vl=bytes&x=&n=&hreg%5B%5D=egi-cloud%7Ccloud-0&mreg%5B%5D=bytes_(in%7Cout)>ype=line&glegend=show&aggregate=1|Local Ganglia Network Aggregated]]
* [[http://cld-nagios.cloud.pd.infn.it/nagios/cgi-bin//status.cgi?hostgroup=egi-fedcloud&style=detail|Local Nagios]]
* [[https://cld-caos.cloud.pd.infn.it/EgiCloud|Local Accounting]]
=== Local dashboard ===
* [[http://egi-cloud.pd.infn.it/dashboard/auth/login/|Local Dashboard]]
===== Layout =====
* Controller + Network node + Storage node + Telemetry service + Orchestration service: **egi-cloud.pd.infn.it**
* Compute nodes: **cloud-01:06.pn.pd.infn.it**
* NoSQL database: **cld-mongo-egi.cloud.pd.infn.it**
* OneData provider: **one-data-01.pd.infn.it**
* Network layout available [[http://wiki.infn.it/progetti/cloud-areapd/networking/egi_fedcloud_networks| here]] (authorized users only)
===== OpenStack configuration =====
* Controller/Network node and Compute nodes were installed according to [[http://docs.openstack.org/liberty/install-guide-rdo/index.html|OpenStack official documentation]]
* We created one tenant for each EGI FedCloud VO supported, a router and various nets and subnets obtaining the following network topology:
{{:progetti:cloud-areapd:egi_federated_cloud:network.png|}}
*We mount the partitions for the glance and cinder services form 192.168.61.100 with nfs driver
yum install -y nfs-utils
mkdir -p /var/lib/cinder
mkdir -p /var/lib/glance/images
cat<>/etc/fstab
192.168.61.100:/cinder-egi /var/lib/cinder nfs defaults
192.168.61.100:/glance-egi /var/lib/glance/images nfs defaults
EOF
mount -a
*We use some specific configurations for cinder and neutron services using the following documentation:
* [[http://docs.openstack.org/liberty/networking-guide/scenario-classic-ovs.html|neutron with OVS agent]]
* [[http://docs.openstack.org/admin-guide/blockstorage-nfs-backend.html|cinder with NFS backend]]
* The telemetry service uses a NoSQL database then we install [[http://docs.openstack.org/liberty/install-guide-rdo/environment-nosql-database.html|mongodb]] on cld-mongo-egi.cloud.pd.infn.it
===== EGI FedCloud specific configuration =====
(see [[https://wiki.egi.eu/wiki/MAN10#OpenStack|EGI Doc]])
* Install CAs Certificates and the software for fetching the CRLs in both Controller (egi-cloud) and Compute (cloud-01:06) nodes:
systemctl stop httpd
curl -L http://repository.egi.eu/sw/production/cas/1/current/repo-files/EGI-trustanchors.repo | sudo tee /etc/yum.repos.d/EGI-trustanchors.repo
yum install -y ca-policy-egi-core fetch-crl
systemctl enable fetch-crl-cron.service
systemctl start fetch-crl-cron.service
==== Install OpenStack Keystone-VOMS module ====
(see [[https://keystone-voms.readthedocs.io/en/stable-liberty/|Keystone-voms doc]])
* Prepare to run keystone as WSGI app in SSL
yum install -y voms mod_ssl
APACHE_LOG_DIR=/var/log/httpd
cat </etc/httpd/conf.d/wsgi-keystone.conf
Listen 5000
WSGIDaemonProcess keystone user=keystone group=keystone processes=8 threads=1
LogLevel warn
ErrorLog $APACHE_LOG_DIR/error.log
CustomLog $APACHE_LOG_DIR/ssl_access.log combined
SSLEngine on
SSLCertificateFile /etc/grid-security/hostcert.pem
SSLCertificateKeyFile /etc/grid-security/hostkey.pem
SSLCACertificatePath /etc/grid-security/certificates
SSLCARevocationPath /etc/grid-security/certificates
SSLVerifyClient optional
SSLVerifyDepth 10
SSLProtocol all -SSLv2
SSLCipherSuite ALL:!ADH:!EXPORT:!SSLv2:RC4+RSA:+HIGH:+MEDIUM:+LOW
SSLOptions +StdEnvVars +ExportCertData
WSGIScriptAlias / /var/www/cgi-bin/keystone/main
WSGIProcessGroup keystone
Listen 35357
WSGIDaemonProcess keystoneapi user=keystone group=keystone processes=8 threads=1
LogLevel warn
ErrorLog $APACHE_LOG_DIR/error.log
CustomLog $APACHE_LOG_DIR/ssl_access.log combined
SSLEngine on
SSLCertificateFile /etc/grid-security/hostcert.pem
SSLCertificateKeyFile /etc/grid-security/hostkey.pem
SSLCACertificatePath /etc/grid-security/certificates
SSLCARevocationPath /etc/grid-security/certificates
SSLVerifyClient optional
SSLVerifyDepth 10
SSLProtocol all -SSLv2
SSLCipherSuite ALL:!ADH:!EXPORT:!SSLv2:RC4+RSA:+HIGH:+MEDIUM:+LOW
SSLOptions +StdEnvVars +ExportCertData
WSGIScriptAlias / /var/www/cgi-bin/keystone/admin
WSGIProcessGroup keystoneapi
EOF
* Check and in case install the host certificate for your server in /etc/grid-security/ directory:
[root@egi-cloud]# ls -l /etc/grid-security/host*
-rw-r--r--. 1 root root 2021 Sep 8 18:35 hostcert.pem
-rw-------. 1 root root 1675 Sep 8 18:35 hostkey.pem
* take the file [[http://git.openstack.org/cgit/openstack/keystone/plain/httpd/keystone.py?h=stable/liberty|keystone.py]]
* copy it to /usr/lib/cgi-bin/keystone/keystone.py and create the following links:
echo "OPENSSL_ALLOW_PROXY_CERTS=1" >> /etc/sysconfig/httpd
rm -Rf /usr/lib/cgi-bin/keystone
mkdir -p /var/www/cgi-bin/keystone
curl http://git.openstack.org/cgit/openstack/keystone/plain/httpd/keystone.py?h=stable/liberty | tee /var/www/cgi-bin/keystone/keystone.py
ln /var/www/cgi-bin/keystone/keystone.py /var/www/cgi-bin/keystone/main
ln /var/www/cgi-bin/keystone/keystone.py /var/www/cgi-bin/keystone/admin
chown -R keystone:keystone /var/www/cgi-bin/keystone
systemctl start httpd
* Installing the Keystone-VOMS module:
git clone git://github.com/IFCA/keystone-voms.git -b stable/liberty
cd keystone-voms
pip install .
* Enable the Keystone VOMS module
sed -i 's|#config_file = keystone-paste.ini|config_file = /usr/share/keystone/keystone-dist-paste.ini|g' /etc/keystone/keystone.conf
echo "[filter:voms]" >> /usr/share/keystone/keystone-dist-paste.ini
openstack-config --set /usr/share/keystone/keystone-dist-paste.ini filter:voms paste.filter_factory keystone_voms.core:VomsAuthNMiddleware.factory
sed -i 's|ec2_extension user_crud_extension|voms ec2_extension user_crud_extension|g' /usr/share/keystone/keystone-dist-paste.ini
* Configuring the Keystone VOMS module
echo "[voms]" >> /etc/keystone/keystone.conf
openstack-config --set /etc/keystone/keystone.conf voms vomsdir_path /etc/grid-security/vomsdir
openstack-config --set /etc/keystone/keystone.conf voms ca_path /etc/grid-security/certificates
openstack-config --set /etc/keystone/keystone.conf voms voms_policy /etc/keystone/voms.json
openstack-config --set /etc/keystone/keystone.conf voms vomsapi_lib libvomsapi.so.1
openstack-config --set /etc/keystone/keystone.conf voms autocreate_users True
openstack-config --set /etc/keystone/keystone.conf voms add_roles False
openstack-config --set /etc/keystone/keystone.conf voms user_roles _member_
mkdir -p /etc/grid-security/vomsdir/fedcloud.egi.eu
cat > /etc/grid-security/vomsdir/fedcloud.egi.eu/voms1.egee.cesnet.cz.lsc << EOF
/DC=org/DC=terena/DC=tcs/OU=Domain Control Validated/CN=voms1.egee.cesnet.cz
/C=NL/O=TERENA/CN=TERENA eScience SSL CA
EOF
cat > /etc/grid-security/vomsdir/fedcloud.egi.eu/voms2.grid.cesnet.cz.lsc << EOF
/DC=org/DC=terena/DC=tcs/OU=Domain Control Validated/CN=voms2.grid.cesnet.cz
/C=NL/ST=Noord-Holland/L=Amsterdam/O=TERENA/CN=TERENA eScience SSL CA 2
EOF
mkdir -p /etc/grid-security/vomsdir/dteam
cat > /etc/grid-security/vomsdir/dteam/voms.hellasgrid.gr.lsc << EOF
/C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms.hellasgrid.gr
/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006
EOF
cat > /etc/grid-security/vomsdir/dteam/voms2.hellasgrid.gr.lsc << EOF
/C=GR/O=HellasGrid/OU=hellasgrid.gr/CN=voms2.hellasgrid.gr
/C=GR/O=HellasGrid/OU=Certification Authorities/CN=HellasGrid CA 2006
EOF
mkdir -p /etc/grid-security/vomsdir/enmr.eu
cat > /etc/grid-security/vomsdir/enmr.eu/voms2.cnaf.infn.it.lsc < /etc/grid-security/vomsdir/enmr.eu/voms-02.pd.infn.it.lsc < /etc/grid-security/vomsdir/vo.indigo-datacloud.eu/voms01.ncg.ingrid.pt.lsc < /etc/grid-security/vomsdir/$i/lcg-voms2.cern.ch.lsc << EOF
/DC=ch/DC=cern/OU=computers/CN=lcg-voms2.cern.ch
/DC=ch/DC=cern/CN=CERN Grid Certification Authority
EOF
cat > /etc/grid-security/vomsdir/$i/voms2.cern.ch.lsc << EOF
/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch
/DC=ch/DC=cern/CN=CERN Grid Certification Authority
EOF
done
cat </etc/keystone/voms.json
{
"vo.indigo-datacloud.eu": {
"tenant": "indigo"
},
"fedcloud.egi.eu": {
"tenant": "fctf"
},
"ops": {
"tenant": "ops"
},
"enmr.eu": {
"tenant": "wenmr"
},
"dteam": {
"tenant": "dteam"
},
"atlas": {
"tenant": "atlas"
},
"lhcb": {
"tenant": "lhcb"
},
"cms": {
"tenant": "cms"
}
}
EOF
* Adjust manually the keystone catalog in order the identity backend points to the correct URLs:
* public URL: https://egi-cloud.pd.infn.it:5000/v2.0
* admin URL: https://egi-cloud.pd.infn.it:35357/v2.0
* internal URL: https://egi-cloud.pd.infn.it:5000/v2.0
mysql> use keystone;
mysql> update endpoint set url="https://egi-cloud.pd.infn.it:5000/v2.0" where url="http://egi-cloud.pd.infn.it:5000/v2.0";
mysql> update endpoint set url="https://egi-cloud.pd.infn.it:35357/v2.0" where url="http://egi-cloud.pd.infn.it:35357/v2.0";
mysql> select id,url from endpoint;
should show lines with the above URLs.
* Replace http with https in auth_[protocol,uri,url] variables and IP address with egi-cloud.pd.infn.it in auth_[host,uri,url] in /etc/nova/nova.conf, /etc/nova/api-paste.ini, /etc/neutron/neutron.conf, /etc/neutron/api-paste.ini, /etc/neutron/metadata_agent.ini, /etc/cinder/cinder.conf, /etc/cinder/api-paste.ini, /etc/glance/glance-api.conf, /etc/glance/glance-registry.conf, /etc/glance/glance-cache.conf and any other service that needs to check keystone tokens, and then restart the services of the Controller node
* Replace http with https in auth_[protocol,uri,url] variables and IP address with egi-cloud.pd.infn.it in auth_[host,uri,url] in /etc/nova/nova.conf and /etc/neutron/neutron.conf and restart the services openstack-nova-compute and neutron-openvswitch-agent of the Compute nodes.
* Also check if "cafile" variable has INFN-CA-2015.pem in all service configuration files and admin-openrc.sh file.
==== Install the OOI API ====
(see [[https://github.com/indigo-dc/ooi|Ooi installation guide]] and [[https://ooi.readthedocs.io/en/stable/|Ooi configuration guide]])
(only on Controller node)
Install INDIGO - DataCloud repositories
rpm --import http://repo.indigo-datacloud.eu/repository/RPM-GPG-KEY-indigodc
yum localinstall -y indigodc-release-1.0.0-1.el7.centos.noarch.rpm
and configuration file, /etc/yum/pluginconf.d/priorities.conf as following:
[ main ]
enabled = 1
check_obsoletes = 1
Install ooi
yum -y install python-ooi
and edit the /etc/nova/api-paste.ini file
cat <>/etc/nova/api-paste.ini
########
# OOI #
########
[composite:ooi]
use = call:nova.api.openstack.urlmap:urlmap_factory
/occi1.1: occi_api_11
[filter:occi]
paste.filter_factory = ooi.wsgi:OCCIMiddleware.factory
openstack_version = /v2.1
[composite:occi_api_11]
use = call:nova.api.auth:pipeline_factory_v21
noauth2 = compute_req_id faultwrap sizelimit noauth2 occi osapi_compute_app_v21
keystone = compute_req_id faultwrap sizelimit authtoken keystonecontext occi osapi_compute_app_v21
EOF
* Make sure the API occiapi is enabled in the /etc/nova/nova.conf configuration file:
openstack-config --set /etc/nova/nova.conf DEFAULT enabled_apis osapi_compute,metadata,ooi
openstack-config --set /etc/nova/nova.conf DEFAULT ooi_listen 0.0.0.0
openstack-config --set /etc/nova/nova.conf DEFAULT ooi_listen_port 8787
* Configure nova to use the /etc/nova/api-paste.ini file
sed -i 's|#api_paste_config=api-paste.ini|api_paste_config=/etc/nova/api-paste.ini|g' /etc/nova/nova.conf
* Add this line in /etc/nova/nova.conf (needed to allow floating-ip association via occi-client):
openstack-config --set /etc/nova/nova.conf DEFAULT default_floating_pool ext-net
* modify the /etc/nova/policy.json file in order to allow any user to get details about VMs not owned by her/him, while she/he cannot execute any other action (stop/suspend/pause/terminate/…) on them (see slide 7 [[https://agenda.infn.it/getFile.py/access?contribId=14&sessionId=3&resId=0&materialId=slides&confId=7915|here]]):
sed -i 's|"admin_or_owner": "is_admin:True or project_id:%(project_id)s",|"admin_or_owner": "is_admin:True or project_id:%(project_id)s",\n "admin_or_user": "is_admin:True or user_id:%(user_id)s",|g' /etc/nova/policy.json
sed -i 's|"default": "rule:admin_or_owner",|"default": "rule:admin_or_user",|g' /etc/nova/policy.json
sed -i 's|"compute:get": "",|"compute:get": "rule:admin_or_owner",|g' /etc/nova/policy.json
* and restart the nova-* services:
for i in nova-api nova-cert nova-consoleauth nova-scheduler nova-conductor nova-novncproxy; do systemctl restart openstack-$i ; done
* Register service in Keystone:
openstack service create --name occi --description "OCCI Interface" occi
openstack endpoint create --region RegionOne occi public https://egi-cloud.pd.infn.it:8787/occi1.1
openstack endpoint create --region RegionOne occi internal https://egi-cloud.pd.infn.it:8787/occi1.1
openstack endpoint create --region RegionOne occi admin https://egi-cloud.pd.infn.it:8787/occi1.1
* Enable SSL connection on port 8787, by creating the file /etc/httpd/conf.d/ooi.conf
cat < /etc/httpd/conf.d/ooi.conf
#LoadModule proxy_http_module modules/mod_proxy_http.so
#
# Proxy Server directives. Uncomment the following lines to
# enable the proxy server:
#LoadModule proxy_module /usr/lib64/httpd/modules/mod_proxy.so
#LoadModule proxy_http_module /usr/lib64/httpd/modules/mod_proxy_http.so
#LoadModule substitute_module /usr/lib64/httpd/modules/mod_substitute.so
Listen 8787
LogLevel debug
ErrorLog /var/log/httpd/ooi-error.log
CustomLog /var/log/httpd/ooi-ssl_access.log combined
SSLEngine on
SSLCertificateFile /etc/grid-security/hostcert.pem
SSLCertificateKeyFile /etc/grid-security/hostkey.pem
SSLCACertificatePath /etc/grid-security/certificates
SSLCARevocationPath /etc/grid-security/certificates
SSLVerifyClient optional
SSLVerifyDepth 10
SSLProtocol all -SSLv2
SSLCipherSuite ALL:!ADH:!EXPORT:!SSLv2:RC4+RSA:+HIGH:+MEDIUM:+LOW
SSLOptions +StdEnvVars +ExportCertData
# Do not enable proxying with ProxyRequests until you have secured
# your server.
# Open proxy servers are dangerous both to your network and to the
# Internet at large.
ProxyRequests Off
Order deny,allow
Deny from all
ProxyPass / http://egi-cloud.pd.infn.it:9000/
ProxyPassReverse / http://egi-cloud.pd.infn.it:9000/
AddOutputFilterByType SUBSTITUTE text/plain text text/uri-list
Substitute s|http://egi-cloud.pd.infn.it:9000/|https://egi-cloud.pd.infn.it:8787/|n
Order allow,deny
Allow from all
EOF
*Now restart http service
systemctl restart httpd
==== Install rOCCI Client ====
For complete guide about the rOCCI Client see [[https://wiki.egi.eu/wiki/HOWTO11_How_to_use_the_rOCCI_Client|How to use the rOCCI Client]].
==== Install FedCloud BDII ====
(See [[https://wiki.egi.eu/wiki/HOWTO15|EGI guide]] and [[https://github.com/EGI-FCTF/cloud-bdii-provider|BDII onfiguration guide]])
* Installing the resource bdii and the cloud-info-provider:
yum install bdii -y
git clone https://github.com/EGI-FCTF/BDIIscripts
cd BDIIscripts
pip install .
* Customize the configuration file with the local sites' infos
cp /etc/cloud-info-provider/sample.openstack.yaml /etc/cloud-info-provider/bdii.yaml
sed -i 's|#name: SITE_NAME|name: INFN-PADOVA-STACK|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|#production_level: production|production_level: production|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|#url: http://site.url.example.org/|#url: http://www.pd.infn.it|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|#country: ES|country: IT|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|#ngi: NGI_FOO|ngi: NGI_IT|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|#latitude: 0.0|latitude: 45.41|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|#longitude: 0.0|longitude: 11.89|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|#general_contact: general-support@example.org|general_contact: cloud-prod@lists.pd.infn.it|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|#security_contact: security-support@example.org|security_contact: grid-sec@pd.infn.it|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|#user_support_contact: user-support@example.org|user_support_contact: cloud-prod@lists.pd.infn.it|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|total_cores: 0|total_cores: 144|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|total_ram: 0|total_ram: 285|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|hypervisor: Foo Hypervisor|hypervisor: KVM Hypervisor|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|hypervisor_version: 0.0.0|hypervisor_version: 2.0.0|g' /etc/cloud-info-provider/bdii.yaml
sed -i 's|middleware_version: havana|middleware_version: Liberty|g' /etc/cloud-info-provider/bdii.yaml
* Be sure that keystone contains the OOI endpoints, otherwise it will not be published by the BDII.
* By default, the provider script will filter images without marketplace uri defined into the marketplace or vmcatcher_event_ad_mpuri property. If you want to list all the images templates (included local snapshots), set the variable 'require_marketplace_id: false' under 'compute' -> 'images' -> 'defaults' in the YAML configuration file.
* Create the file /var/lib/bdii/gip/provider/cloud-info-provider that calls the provider with the correct options for your site, for example:
cat</var/lib/bdii/gip/provider/cloud-info-provider
#!/bin/sh
cloud-info-provider-service --yaml /etc/cloud-info-provider/bddi.yaml \
--middleware openstack \
--os-username admin --os-password ADMIN_PASS \
--os-tenant-name admin --os-auth-url https://egi-cloud.pd.infn.it:35357/v2.0
EOF
* Run manually the cloud-info-provider script and check that the output return the complete LDIF. To do so, execute:
chmod +x /var/lib/bdii/gip/provider/cloud-info-provider
/var/lib/bdii/gip/provider/cloud-info-provider
* Now you can start the bdii service:
systemctl start bdii
* Use the command below to see if the information is being published:
ldapsearch -x -h localhost -p 2170 -b o=glue
* Do not forget to open port 2170:
firewall-cmd --add-port=2170/tcp
firewall-cmd --permanent --add-port=2170/tcp
systemctl restart firewalld
* Information on how to set up the site-BDII in egi-cloud-sbdii.pd.infn.it is available [[https://wiki.egi.eu/wiki/MAN01_How_to_publish_Site_Information|here]]
* Add your cloud-info-provider to your site-BDII egi-cloud-sbdii.pd.infn.it by adding new lines in the site.def like this:
BDII_REGIONS="CLOUD BDII"
BDII_CLOUD_URL="ldap://egi-cloud.pd.infn.it:2170/GLUE2GroupID=cloud,o=glue"
BDII_BDII_URL="ldap://egi-cloud-sbdii.pd.infn.it:2170/mds-vo-name=resource,o=grid"
==== Install vmcatcher/glancepush ====
(see [[https://wiki.egi.eu/wiki/Fedcloud-tf:WorkGroups:Scenario8:Configuration#VMcatcher|EGI guide]])
* VMcatcher allows users to subscribe to virtual machine Virtual Machine image lists, cache the images referenced to in the Virtual Machine Image List, validate the images list with x509 based public key cryptography, and validate the images against sha512 hashes in the images lists and provide events for further applications to process updates or expiries of virtual machine images without having to further validate the images.
useradd -m -b /opt stack
STACKHOME=/opt/stack
yum install -y m2crypto python2-setuptools
pip install nose
git clone https://github.com/hepix-virtualisation/hepixvmitrust.git -b hepixvmitrust-0.0.18
git clone https://github.com/hepix-virtualisation/smimeX509validation.git -b smimeX509validation-0.0.17
git clone https://github.com/hepix-virtualisation/vmcatcher.git -b vmcatcher-0.6.1
wget http://repository.egi.eu/community/software/python.glancepush/0.0.X/releases/generic/0.0.6/python-glancepush-0.0.6.tar.gz
wget http://repository.egi.eu/community/software/openstack.handler.for.vmcatcher/0.0.X/releases/generic/0.0.7/gpvcmupdate-0.0.7.tar.gz
tar -zxvf python-glancepush-0.0.6.tar.gz -C $STACKHOME/
tar -zxvf gpvcmupdate-0.0.7.tar.gz -C $STACKHOME/
for i in hepixvmitrust smimeX509validation vmcatcher $STACKHOME/python-glancepush-0.0.6 $STACKHOME/gpvcmupdate-0.0.7
do
cd $i
python setup.py install
echo exit code=$?
cd
done
mkdir -p /var/lib/swift/vmcatcher
mkdir -p $STACKHOME/vmcatcher/cache $STACKHOME/vmcatcher/cache/partial $STACKHOME/vmcatcher/cache/expired $STACKHOME/vmcatcher/tmp
mkdir -p /var/spool/glancepush /var/log/glancepush/ /etc/glancepush /etc/glancepush/transform /etc/glancepush/meta /etc/glancepush/test /etc/glancepush/clouds
ln /etc/keystone/voms.json /etc/glancepush/voms.json
sed -i 's|temp_dir = "/tmp/"|temp_dir = "$STACKHOME/vmcatcher/tmp/"|g' /usr/bin/gpvcmupdate.py
* Now for each VO/tenant you have in voms.json write a file like this:
[root@egi-cloud ~]# su - stack
[stack@egi-cloud ~]# cat << EOF > /etc/glancepush/clouds/dteam
[general]
# Tenant for this VO. Must match the tenant defined in voms.json file
testing_tenant=dteam
# Identity service endpoint (Keystone)
endpoint_url=https://egi-cloud.pd.infn.it:35357/v2.0
# User Password
password=ADMIN_PASS
# User
username=admin
# Set this to true if you're NOT using self-signed certificates
is_secure=True
# SSH private key that will be used to perform policy checks (to be done)
ssh_key=/opt/stack/.ssh/id_rsa
# WARNING: Only define the next variable if you're going to need it. Otherwise you may encounter problems
#cacert=path_to_your_cert
EOF
* and for images not belonging to any VO use the admin tenant
[stack@egi-cloud ~]# cat << EOF > /etc/glancepush/clouds/openstack
[general]
# Tenant for this VO. Must match the tenant defined in voms.json file
testing_tenant=admin
# Identity service endpoint (Keystone)
endpoint_url=https://egi-cloud.pd.infn.it:35357/v2.0
# User Password
password=ADMIN_PASS
# User
username=admin
# Set this to true if you're NOT using self-signed certificates
is_secure=True
# SSH private key that will be used to perform policy checks (to be done)
ssh_key=/opt/stack/.ssh/id_rsa
# WARNING: Only define the next variable if you're going to need it. Otherwise you may encounter problems
#cacert=path_to_your_cert
EOF
chown stack:stack -R /var/spool/glancepush /etc/glancepush /var/log/glancepush/
* Check that vmcatcher is running properly by listing and subscribing to an image list
cat <>$STACKHOME/.bashrc
export VMCATCHER_RDBMS="sqlite:///$STACKHOME/vmcatcher/vmcatcher.db"
export VMCATCHER_CACHE_DIR_CACHE="$STACKHOME/vmcatcher/cache"
export VMCATCHER_CACHE_DIR_DOWNLOAD="$STACKHOME/vmcatcher/cache/partial"
export VMCATCHER_CACHE_DIR_EXPIRE="$STACKHOME/vmcatcher/cache/expired"
EOF
[stack@egi-cloud ~]# export VMCATCHER_RDBMS="sqlite:////opt/stack/vmcatcher/vmcatcher.db"
[stack@egi-cloud ~]# vmcatcher_subscribe -l
[stack@egi-cloud ~]# vmcatcher_subscribe -e -s https://:x-oauth-basic@vmcaster.appdb.egi.eu/store/vo/fedcloud.egi.eu/image.list
[stack@ocp-ctrl ~]$ vmcatcher_subscribe -l
76fdee70-8119-5d33-9f40-3c57e1c60df1 True None https://vmcaster.appdb.egi.eu/store/vo/fedcloud.egi.eu/image.list
* Create a CRON wrapper for vmcatcher, named $STACKHOME/gpvcmupdate-0.0.7/vmcatcher_eventHndl_OS_cron.sh, using the following code:
cat<$STACKHOME/gpvcmupdate-0.0.7/vmcatcher_eventHndl_OS_cron.sh
#!/bin/bash
#Cron handler for VMCatcher image syncronization script for OpenStack
#Vmcatcher configuration variables
export VMCATCHER_RDBMS="sqlite:///$STACKHOME/vmcatcher/vmcatcher.db"
export VMCATCHER_CACHE_DIR_CACHE="$STACKHOME/vmcatcher/cache"
export VMCATCHER_CACHE_DIR_DOWNLOAD="$STACKHOME/vmcatcher/cache/partial"
export VMCATCHER_CACHE_DIR_EXPIRE="$STACKHOME/vmcatcher/cache/expired"
export VMCATCHER_CACHE_EVENT="python $HOME/gpvcmupdate-0.0.7/gpvcmupdate.py -D"
#Update vmcatcher image lists
/usr/bin/vmcatcher_subscribe -U
#Add all the new images to the cache
for a in \$(/usr/bin/vmcatcher_image -l | awk '{if (\$2==2) print \$1}'); do
/usr/bin/vmcatcher_image -a -u \$a
done
#Update the cache
/usr/bin/vmcatcher_cache -v -v
#Run glancepush
python /usr/bin/glancepush.py
EOF
* Add admin user to the tenants and set the right ownership to directories
for vo in atlas cms lhcb dteam ops wenmr fctf indigo
do
openstack role add --project $vo --user admin _member_
done
chown -R stack:stack $STACKHOME
* Test that the vmcatcher handler is working correctly by running:
chmod +x $STACKHOME/gpvcmupdate-0.0.7/vmcatcher_eventHndl_OS_cron.sh
chown -R stack:stack $STACKHOME
* Add the following line to the stack user crontab:
50 */6 * * * $STACKHOME/gpvcmupdate-0.0.7/vmcatcher_eventHndl_OS_cron.sh >> /var/log/glancepush/vmcatcher.log 2>&1
* Useful links for getting VO-wide image lists that need authentication to AppDB: [[https://wiki.appdb.egi.eu/main:guides:vmcatcher_site_setup|Vmcatcher setup]], [[https://wiki.appdb.egi.eu/main:faq:how_can_i_create_a_personal_access_token|Obtaining an access token]],[[https://vmcaster.appdb.egi.eu/store/#vos|Image list store]].
==== Use the same APEL/SSM of grid site ====
* Cloud usage records are sent to APEL through the ssmsend program installed in cert-37.pd.infn.it:
[root@cert-37 ~]# cat /etc/cron.d/ssm-cloud
# send buffered usage records to APEL
30 */24 * * * root /usr/bin/ssmsend -c /etc/apel/sender-cloud.cfg
* It si therefore neede to install and configure NFS on egi-cloud:
[root@egi-cloud ~]# mkdir -p /var/spool/apel/outgoing/openstack
[root@egi-cloud ~]# cat<>/etc/exports
/var/spool/apel/outgoing/openstack cert-37.pd.infn.it(rw,sync)
EOF
[root@egi-cloud ~]$ systemctl status nfs-server
* In case of APEL nagios probe failure, check if /var/spool/apel/outgoing/openstack is properly mounted by cert-37
* To check if accounting records are properly received by APEL server look at [[http://goc-accounting.grid-support.ac.uk/cloudtest/cloudsites2.html|this site]]
==== Install the new accounting system (CASO) ====
(see [[https://caso.readthedocs.org/en/latest/|CASO installation guide]] )
yum -y install libffi-devel openssl-devel gcc
pip install caso
* Create role and user
openstack user create --domain default --password ACCOUNTING_PASS accounting
openstack role create accounting
*For each of the tenants, add the user with the accounting role
for i in fctf wenmr atlas ops dteam lhcb cms indigo
do
openstack role add --project $i --user accounting accounting
done
* Edit the /etc/caso/caso.conf file
cp /etc/caso/caso.conf.sample /etc/caso/caso.conf
openstack-config --set /etc/caso/caso.conf DEFAULT extractor nova
openstack-config --set /etc/caso/caso.conf DEFAULT site_name INFN-PADOVA-STACK
openstack-config --set /etc/caso/caso.conf DEFAULT tenants fctf,wenmr,atlas,ops,dteam,lhcb,cms,indigo
openstack-config --set /etc/caso/caso.conf DEFAULT messengers caso.messenger.ssm.SsmMessager
openstack-config --set /etc/caso/caso.conf extractor user accounting
openstack-config --set /etc/caso/caso.conf extractor password ACCOUNTING_PASS
openstack-config --set /etc/caso/caso.conf extractor endpoint https://egi-cloud.pd.infn.it:35357/v2.0
openstack-config --set /etc/caso/caso.conf extractor mapping_file /etc/keystone/voms.json
openstack-config --set /etc/caso/caso.conf ssm output_path /var/spool/apel/outgoing/openstack
openstack-config --set /etc/caso/caso.conf logstash host egi-cloud.pd.infn.it
openstack-config --set /etc/caso/caso.conf logstash port 5000
*Edit the /etc/keystone/policy.json file
sed -i 's|\"admin_required\": \"role:admin or is_admin:1\",|\"admin_required\": \"role:admin or is_admin:1 or role:accounting\",|g' /etc/keystone/policy.json
mkdir /var/spool/caso /var/log/caso
*Test it
caso-extract -v -d
* Create the cron job
cat </etc/cron.d/caso
# extract and send usage records to APEL/SSM
10 * * * * root /usr/bin/caso-extract >> /var/log/caso/caso.log 2>&1 ; chmod go+w -R /var/spool/apel/outgoing/openstack/
EOF
==== Local Monitoring ====
=== Ganglia ===
* Install ganglia-gmond on all servers
* Configure cluster and host fields in /etc/ganglia/gmond.conf to point to cld-ganglia.cloud.pd.infn.it server
* Finally: systemctl enable gmond.service; systemctl start gmond.service
=== Nagios ===
* Install on compute nodes ncsa-client, nagios, nagios-plugins-disk, nagios-plugins-procs, nagios-plugins, nagios-common, nagios-plugins-load
* Copy the file cld-nagios:/var/spool/nagios/.ssh/id_rsa.pub in a file named /home/nagios/.ssh/authorized_keys of the controller and all compute nodes, and in a file named /root/.ssh/authorized_key of the controller. Be also sure that /home/nagios is the default directory in the /etc/passwd file.
* Then do in all compute nodes:
$ echo encryption_method=1 > /etc/nagios/send_nsca.cfg
$ usermod -a -G libvirtd nagios
$ sed -i 's|#password=|password=NSCA_PASSWORD|g' /etc/nagios/send_nsca.cfg
# then be sure the files below are in /usr/local/bin:
$ ls /usr/local/bin/
check_kvm check_kvm_wrapper.sh nagios_check_ovs.sh
$ cat < crontab.txt
# Puppet Name: nagios_check_kvm
0 */1 * * * /usr/local/bin/check_kvm_wrapper.sh
# Puppet Name: nagios_check_ovs
*/10 * * * * /usr/local/bin/nagios_check_ovs.sh
EOF
$ crontab crontab.txt
$ crontab -l
* On the contoller node check if $ sed -i 's|"compute:create:forced_host": "is_admin:True"|"compute:create:forced_host": ""|g' /etc/nova/policy.json is needed
* On the cld-nagios server check/modify the content of /var/spool/nagios/egi-cloud-dteam--openrc.sh, and of the files /etc/nagios/objects/egi* and /usr/lib64/nagios/plugins/*egi*
==== Security incindents and IP traceability ====
* See [[https://wiki.infn.it/progetti/cloud-areapd/operations/production_cloud/gestione_security_incidents| here]] for the description of the full process
* On egi-cloud do install the [[https://github.com/Pansanel/openstack-user-tools | CNRS tools]], they allow to track the usage of floating IPs as in the example below:
[root@egi-cloud ~]# os-ip-trace 90.147.77.229
+--------------------------------------+-----------+---------------------+---------------------+
| device id | user name | associating date | disassociating date |
+--------------------------------------+-----------+---------------------+---------------------+
| 3002b1f1-bca3-4e4f-b21e-8de12c0b926e | admin | 2016-11-30 14:01:38 | 2016-11-30 14:03:02 |
+--------------------------------------+-----------+---------------------+---------------------+
* Save and archive important log files:
* On egi-cloud and each compute node cloud-0%, add the line "*.* @@192.168.60.31:514" in the file /etc/rsyslog.conf, and restart rsyslog service with "systemctl restart rsyslog". It logs /var/log/secure,messages files in cld-foreman:/var/mpathd/log/egi-cloud,cloud-0%.
* In cld-foreman, check that the file /etc/cron.daily/vm-log.sh logs the /var/log/libvirt/qemu/*.log files of egi-cloud and each cloud-0% compute node (passwordless ssh must be enabled from cld-foreman to each node)
* Install ulogd in the controller node:
* In egi-cloud make yum install libnetfilter_log and yum local install libnetfilter_acct-1.0.2-3.el7.lux.1.x86_64.rpm ulogd-2.0.4-3.el7.lux.1.x86_64.rpm (these files are in cld-ctrl-01:/root/ulogd/)
* Configure /etc/ulogd.conf by replacing properly accept_src_filter variable (accept_src_filter=10.0.0.0/16) starting from the one in cld-ctrl-01:/etc/ulogd.conf.
* Then copy cld-ctrl-01:/root/ulogd/start-ulogd to egi-cloud:/root/ulogd/start-ulogd, replace the qrouter ID and execute /root/ulogd/start-ulogd. Then add to /etc/rc.d/rc.local the line /root/ulogd/start-ulogd &, and make rc.local executable.
* Finally, be sure that /etc/rsyslog.conf file has the lines "local6.* /var/log/ulogd.log" and "*.info;mail.none;authpriv.none;cron.none;local6.none /var/log/messages", and restart rsyslog service.
==== Troubleshooting ====
* Passwordless ssh access to egi-cloud from cld-nagios and from egi-cloud to cloud-0* has been already configured
* If cld-nagios does not ping egi-cloud, be sure that the rule "route add -net 192.168.60.0 netmask 255.255.255.0 gw 192.168.114.1" has been added in egi-cloud (/etc/sysconfig/network-script/route-em1 file should contain the line: 192.168.60.0/24 via 192.168.114.1)
* In case of Nagios alarms, try to restart all cloud services doing the following:
$ ssh root@egi-cloud
[root@egi-cloud ~]# ./Liberty_CentOS_controller.sh restart
[root@egi-cloud ~]# for i in $(seq 1 6); do ssh cloud-0$i.pn.pd.infn.it ./Liberty_CentOS_compute.sh restart; done
* Resubmit the Nagios probe and check if it works again
* In case the problem persist, check the consistency of the DB by executing (this also fix the issue when quota overview in the dashboard is not consistent with actual VMs active):
[root@egi-cloud ~]# python nova-quota-sync.py
* In case of EGI Nagios alarm, check that the user running the Nagios probes is not belonging also to tenants other than "ops". Also check that the right image and flavour is set in URL of the service published in the [[https://goc.egi.eu/portal/index.php?Page_Type=Service&id=5691 | GOCDB]].
* in case of reboot of egi-cloud server:
* check its network configuration (use IPMI if not reachable): all 4 interfaces must be up and the default gateway must be 90.147.77.254.
* check DNS in /etc/resolv.conf and GATEWAY in /etc/sysconfig/network
* check routing with $route -n, if needed do: $ip route replace default via 90.147.77.254. Also be sure to have a route for 90.147.77.0 network.
* check if storage mountpoints 192.168.61.100:/glance-egi and cinder-egi are properly mounted (do: $ df -h)
* in case of reboot of cloud-0* server (use IPMI if not reachable): all 3 interfaces must be up and the default destination must have both 192.168.114.1 and 192.168.115.1 gateways
* check its network configuration
* check if all partitions in /etc/fstab are properly mounted (do: $ df -h)
* if the VM doesn't see outside internet, check in egi-cloud if the files /etc/sysconfig/network-scripts/ifcfg-br-ex and /etc/sysconfig/network-scripts/ifcfg-em3 have the right content as below:
[root@egi-cloud ~]# cat /etc/sysconfig/network-scripts/ifcfg-br-ex
DEVICE=br-ex
DEVICETYPE=ovs
TYPE=OVSBridge
BOOTPROTO=static
IPADDR=90.147.77.223
NETMASK=255.255.255.0
GATEWAY=90.147.77.254
ONBOOT=yes
[root@egi-cloud ~]# cat /etc/sysconfig/network-scripts/ifcfg-em3
DEVICE=em3
ONBOOT=yes
VLAN=yes
BOOTPROTO=none
OVS_BRIDGE=br-ex
TYPE=OVSPort
DEVICETYPE=ovs
* In case of network instabilities, check if GRO if off for all interfaces, e.g.:
[root@egi-cloud ~]# /sbin/ethtool -k em3 | grep -i generic-receive-offload
generic-receive-offload: off
* Also check if /sbin/ifup-local is there:
[root@egi-cloud ~]# cat /sbin/ifup-local
#!/bin/bash
case "$1" in
em1)
/sbin/ethtool -K $1 gro off
;;
em2)
/sbin/ethtool -K $1 gro off
;;
em3)
/sbin/ethtool -K $1 gro off
;;
em4)
/sbin/ethtool -K $1 gro off
;;
esac
exit 0
* If you need to change the project quotas, do not forget to apply the change to both tenantId and tenantName, due to a knonw bug, e.g.:
[root@egi-cloud ~]# source admin-openrc.sh
[root@egi-cloud ~]# tenantId=$(openstack project list | grep fctf | awk '{print $2}')
[root@egi-cloud ~]# nova quota-update --instances 40 --cores 40 --ram 81840 $tenantId
[root@egi-cloud ~]# nova quota-update --instances 40 --cores 40 --ram 81840 fctf
[root@egi-cloud ~]# neutron quota-update --floatingip 1 --tenant-id $tenantId
[root@egi-cloud ~]# neutron quota-update --floatingip 1 --tenant-id fctf