User Tools

Site Tools


progetti:cloud-areapd:operations:monitoring

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
progetti:cloud-areapd:operations:monitoring [2015/02/17 15:23] – [Installation on Compute Nodes] verlato@infn.itprogetti:cloud-areapd:operations:monitoring [2016/03/07 10:57] (current) – [Installation on Controller] verlato@infn.it
Line 1: Line 1:
 +====== Monitoring ======
  
 +===== Ganglia =====
 +[[http://cld-ganglia.cloud.pd.infn.it/ganglia|Ganglia Monitoring Page]]
 +
 +
 +===== Nagios ======
 +
 +[[http://cld-nagios.cloud.pd.infn.it/nagios|Nagios Monitoring Page]]
 +
 +==== Active sensors ====
 +
 +   * ''Current load'': Check the 1m, 5m, 15m loads on the monitored host. Report an error or warning if one of these values exceeds a threshold (thresholds are defined in the Nagios .cfg files)  
 +   * ''Dell Server XYZ'': OpenManage sensors installed on Dell boxes. They check different hardware components.
 +   * ''HTTP'': check if the http service is running
 +   * ''HTTPS'': check the status of the http and https services
 +   * ''HTTPS Certificate'': check the certificate used in a HTTPS server. It reports a warning if it is expiring in less than 30 days
 +   * ''KVM'': this sensor is used on the Compute Nodes. If checks the status of KVM service and of the hosted VMs. It reports a warning if VMs are not all actives. So e.g. it could report a warning if a VM is in shutoff state, but this could not be a problem is the instance is in that state because of an user action
 +   * ''MySQL XYZ database'': this sensor is used in the mysql nodes. It checks the status of the considered DB.
 +   * ''Neutron Agents'': this sensor is used on the Network Node. It runs the ''neutron agent-list'' command and check if for all the agents the status is :-), i.e. ok
 +   * ''Neutron Server'': this sensor is used on the Network Node. It checks if a neutron-server process is being executed. If is reports a problem it should be checked if indeed the process is not running. Before restarting it, a check in the log files (in particular in '/var/log/neutron' is recommended)
 +   * ''Nova Compute'': this sensor is used on the Compute Nodes. It checks if a nova-compute process is being executed. If is reports a problem it should be checked if indeed the process is not running. Before restarting it, a check in the log files (in particular in '/var/log/nova' is recommended)
 +   * ''Openstack Check EC2 Instances'': Check, using the EC2 API, the instances on the Cloud. It reports a problem if the command doesn't work or if the VMs in error are more than a certain threshold, usually 5 (it is defined in the Nagios .cfg files)
 +   * ''Openstack Check Nova Instances'': Check, using the Nova API, the instances on the Cloud. It reports a problem if the command doesn't work or if the VMs in error are more than a certain threshold, usually 5 (it is defined in the Nagios .cfg files)
 +   * ''OpenStack Glance API'': Check, using the Glance API, the available images registered in Glance.  In case of problems please check the log files, in particular in '/var/log/glance'.
 +   * ''OpenStack Glance Upload'': Upload (and then delete) a small image in the Glance service, to test it. In case of problems please check the log files, in particular in '/var/log/glance'.
 +   * ''OpenStack Keystone API'': this sensor is used in the Controller node. It checks the functionality of the keystone service trying to get a token for it. If it reports a problem, please check the log files, in particular the ones in '/var/log/keystone'. In case of problems please check the log files, in particular in '/var/log/nova'
 +   * ''OpenStack Nova API'': this sensor tests the functionality of the Nova API asking the list of the available flavors. 
 +   * ''OpenvSwitch'': this sensor is used on the Compute Nodes. First of all it checks if the openvswitch service is running. It then parses the output of the 'ovs-ofctl dump-flows br-tun' command: a row with the strings 'table=0' and 'actions=resubmit' should be there. If this is not the case, a restart of the 'openvswitch' and 'neutron-openvswitch-agent' could help
 +   * ''PerconaStatus'': this sensor is used to monitor the members of the Percona cluster. It checks if it is synced and if the number of members of tha database cluster is 3 (i.e. if all the members are working)
 +   * ''PING'': check if the considered host is pingable
 +   * ''SSH'': check if the SSH service on the host is active 
 +   * ''XYZ Filesystem total size'': check the total size of a filesystem and compares it with the expected value. This sensor is used for Gluster file systems, to be notified when one brick "is lost"
 +   * ''XYZ Gluster'': checks a gluster volume in the servers providing it
 +   * ''XYZ Partition''': check if the considered file system is available and check its size. Trigger a warning/error if the free space is less than a certain threshold defined in the Nagios .cfg files (usually 20 % for Warning, 10 % for Error)
 +
 +===== Ceilometer =====
 +Installation instructions for Icehouse on:
 +  * http://docs.openstack.org/icehouse/install-guide/install/yum/content/ch_ceilometer.html 
 +  * https://openstack.redhat.com/CeilometerQuickStart
 +Graphic interface available in Horizon: 
 +  * https://cloud-areapd.pd.infn.it:8443/dashboard/admin/metering/
 +==== Installation on Controller ====
 +<code bash>
 +#!/bin/sh
 +export CEILOHOST=controller-01.pd.infn.it
 +export CEILOHOSTPV=controller-01.cloud.pd.infn.it
 +export DBHOST=cld-nagios
 +export KEYHOST=cloud-areapd.pd.infn.it
 +export RABBITHOSTS="192.168.60.100:5672,192.168.60.101:5672"
 +#
 +yum install -y openstack-ceilometer-api openstack-ceilometer-collector openstack-ceilometer-notification openstack-ceilometer-central openstack-ceilometer-alarm python-ceilometerclient
 + 
 +## the lines below to be executed on $DBHOST
 +#cat <<EOF >/etc/yum.repos.d/mongodb.repo 
 +#[mongodb]
 +#name=MongoDB Repository
 +#baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
 +#gpgcheck=0
 +#enabled=1
 +#EOF
 +#
 +#yum install -y mongodb-org
 +#service mongod start
 +#chkconfig mongod on
 +#mongo --host $DBHOST --eval 'db = db.getSiblingDB("ceiloprod"); db.createUser({user: "ceiloprod",pwd: "CEILOMETER_DBPASS",roles: [ "readWrite", "dbAdmin" ]})'
 +## end of lines to be executed on $DBHOST
 +#
 +## if mongodb already in place at $DBHOST and ceilometer DB not yet created, you can do it from the controller as below:
 +yum install -y mongodb-org-shell
 +mongo --host $DBHOST --eval 'db = db.getSiblingDB("ceiloprod"); db.createUser({user: "ceiloprod",pwd: "CEILOMETER_DBPASS",roles: [ "readWrite", "dbAdmin" ]})'
 +## end of $DBHOST stuff
 +#
 +openstack-config --set /etc/ceilometer/ceilometer.conf database connection mongodb://ceiloprod:CEILOMETER_DBPASS@$DBHOST:27017/ceiloprod
 +#
 +CEILOMETER_TOKEN=$(openssl rand -hex 10)
 +echo $CEILOMETER_TOKEN
 +openstack-config --set /etc/ceilometer/ceilometer.conf publisher metering_secret $CEILOMETER_TOKEN
 +openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rpc_backend ceilometer.openstack.common.rpc.impl_kombu
 +openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rabbit_ha_queues true
 +openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rabbit_hosts $RABBITHOSTS
 +openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT auth_strategy noauth
 +#
 +source keystone_admin.sh
 +keystone user-create --name=ceilometer --pass=CEILOMETER_PASS --email=ceilometer@example.com
 +keystone user-role-add --user=ceilometer --tenant=services --role=admin
 +keystone service-create --name=ceilometer --type=metering   --description="Ceilometer Telemetry Service"
 +keystone endpoint-create --region=regionOne --service-id=$(keystone service-list | awk '/ metering / {print $2}') --publicurl=https://$CEILOHOST:8777 --internalurl=https://$CEILOHOSTPV:8777 --adminurl=https://$CEILOHOSTPV:8777
 +#
 +openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken auth_host $KEYHOST
 +openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken admin_user ceilometer
 +openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken admin_tenant_name services
 +openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken auth_protocol https
 +openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken auth_uri https://$KEYHOST:35357
 +openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken admin_password CEILOMETER_PASS
 +openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken cafile /etc/grid-security/certificates/INFN-CA-2006.pem
 +openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_username ceilometer
 +openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_tenant_name services
 +openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_password CEILOMETER_PASS
 +openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_cacert /etc/grid-security/certificates/INFN-CA-2006.pem
 +#
 +openstack-config --set /etc/cinder/cinder.conf DEFAULT control_exchange cinder
 +openstack-config --set /etc/cinder/cinder.conf DEFAULT notification_driver cinder.openstack.common.notifier.rpc_notifier
 +service openstack-cinder-api restart
 +service openstack-cinder-volume restart
 +#
 +openstack-config --set /etc/glance/glance-api.conf DEFAULT notification_driver messaging
 +openstack-config --set /etc/glance/glance-api.conf DEFAULT rpc_backend rabbit
 +service openstack-glance-api restart
 +service openstack-glance-registry restart
 +#
 +for i in `ls /etc/rc.d/init.d/openstack-ceilometer-*` ; do $i start; done
 +</code>
 +==== Installation on Compute Nodes ====
 +<code bash>
 +#!/bin/sh
 +export KEYHOST=cloud-areapd.pd.infn.it
 +export RABBITMGIP=192.168.60.100
 +export RABBITHOSTS="192.168.60.100:5672,192.168.60.101:5672"
 +export CEILOMETER_TOKEN=5bfc5aa7fb1162dad8f1
 +#
 +# 1) do not forget to add in nova.conf:
 +#[DEFAULT]
 +#...
 +#notification_driver = nova.openstack.common.notifier.rpc_notifier
 +#notification_driver = ceilometer.compute.nova_notifier
 +sed -i 's|#notification_driver=|notification_driver = nova.openstack.common.notifier.rpc_notifier\nnotification_driver = ceilometer.compute.nova_notifier|g' /etc/nova/nova.conf
 +#
 +# 2) use the same $CEILOMETER_TOKEN created on the Controller Node
 +#
 +yum install -y openstack-ceilometer-compute
 +openstack-config --set /etc/nova/nova.conf DEFAULT instance_usage_audit True
 +openstack-config --set /etc/nova/nova.conf DEFAULT instance_usage_audit_period hour
 +openstack-config --set /etc/nova/nova.conf DEFAULT notify_on_state_change vm_and_task_state
 +openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rpc_backend ceilometer.openstack.common.rpc.impl_kombu
 +openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT auth_strategy noauth
 +openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rabbit_ha_queues true
 +openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rabbit_hosts $RABBITHOSTS
 +#openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rabbit_host $RABBITMGIP
 +openstack-config --set /etc/ceilometer/ceilometer.conf publisher metering_secret $CEILOMETER_TOKEN
 +openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken auth_host $KEYHOST
 +openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken admin_user ceilometer
 +openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken admin_tenant_name services
 +openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken auth_protocol https
 +openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken admin_password CEILOMETER_PASS
 +openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_username ceilometer
 +openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_tenant_name services
 +openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_password CEILOMETER_PASS
 +openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_auth_url https://$KEYHOST:35357/v2.0
 +openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_cacert /etc/grid-security/certificates/INFN-CA-2006.pem
 +sed -i 's|- cpu_sink|- cpu_sink\n          - cpu_sink.hrs|g' /etc/ceilometer/pipeline.yaml
 +cat <<EOF >> /etc/ceilometer/pipeline.yaml
 +    - name: cpu_sink.hrs
 +      transformers:
 +          - name: "unit_conversion"
 +            parameters:
 +                target:
 +                    name: "cpu.hours"
 +                    scale: "volume / (10**9 * 3600.0)"
 +                    unit: "hours"
 +                    type: "cumulative"
 +      publishers:
 +          - rpc://
 +EOF
 +service openstack-nova-compute restart
 +service openstack-ceilometer-compute start
 +chkconfig openstack-ceilometer-compute on
 +service openstack-ceilometer-compute status
 +ls -l /var/log/ceilometer/
 +</code>
 +==== Testing from Controller ====
 +<code bash>
 +source ceilometerrc.sh 
 +# show meters and resources
 +ceilometer meter-list
 +ceilometer resource-list
 + 
 +# show the entire sample for a given meter for all resources
 +ceilometer sample-list -m cpu
 +
 +# show the aggregate statistics (average, maxima, minima, sum, count) for a given meter for all resources
 +ceilometer statistics -m cpu
 +ceilometer statistics -m cpu.hours
 +ceilometer statistics -m cpu_util
 +
 +# cpu for an instance is defined as the Total cpu_time (in nanoseconds) output from the command "virsh cpu-stats instance-id" on the hypervisor
 +# cpu_util is (cpu_t2 - cpu_t1)*100/(Dt*n_vcpus*10**9) where Dt=t2-t1 is the sampling time (600 s by default) and n_vcpu is the number of VCPU of the instance
 +
 +# show the sample for a given resource after a given start time
 +START=2015-02-09T12:00:00
 +INSTANCE=67fccb06-b80e-42ab-9fca-f5e57adaba7
 +ceilometer sample-list -m cpu.hours -q "timestamp>$START;resource_id=$INSTANCE"
 +
 +# show the aggregate statistics (average, maxima, minima, sum, count) for a given resource after a given start time for a given period (here 3600 seconds)
 +ceilometer statistics -m cpu.hours -q "timestamp>$START;resource_id=$INSTANCE" -p 3600
 +</code>
 +==== Set up a monthly backup of Ceilometer DB ====
 +  * Inspired by the instructions [[http://blog.zhaw.ch/icclab/managing-ceilometer-data-in-openstack/|here]]
 +  * From the controller node execute the script below:
 +<code bash>
 +[root@cld-ctrl-01 ceilometer]# cat ceilometer-backup.sh
 +#!/bin/sh
 +export DBHOST=cld-nagios
 +export DBHOSTIP=192.168.60.32
 +cat <<EOF >/etc/yum.repos.d/mongodb.repo
 +[mongodb]
 +name=MongoDB Repository
 +baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
 +gpgcheck=0
 +enabled=1
 +EOF
 +yum -y install mongodb-org-shell mongodb-org-tools
 +# set time_to_live to 32 days
 +openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT time_to_live 2764800
 +for i in `ls /etc/rc.d/init.d/openstack-ceilometer-*` ; do $i restart; done
 +# copying the mongo_backup.py script
 +git clone https://github.com/icclab/arcus-energy-monitoring-tool/
 +useradd ceilobackup
 +cp arcus-energy-monitoring-tool/Tools/mongo_backup.py /home/ceilobackup/
 +cd /home/ceilobackup/
 +# set properly the DB host_ip (the one of cld-nagios) and ceilometer username password
 +sed -i 's|username = "ceilometer"|username = "ceiloprod"|g' mongo_backup.py
 +sed -i 's|password = "password"|password = "CEILOMETER_DBPASS"|g' mongo_backup.py
 +sed -i 's|host_ip = "192.168.0.2"|host_ip = "192.168.60.32"|g' mongo_backup.py
 +sed -i 's|zipath = "/path/to/zip/"|zipath = "/home/ceilobackup/zip/"|g' mongo_backup.py
 +sed -i 's|path = "/path/to/tmp/"|path = "/home/ceilobackup/tmp/"|g' mongo_backup.py
 +sed -i 's|ceilometer|ceiloprod|g' mongo_backup.py
 +#
 +chown ceilobackup.ceilobackup /home/ceilobackup/mongo_backup.py
 +su - ceilobackup -c "python mongo_backup.py; ls -lh zip/"
 +# purge monthly the DB:
 +cat <<EOF >/etc/cron.d/ceilometer-backup
 +0 0 1 * * ceilobackup python mongo_backup.py; mongo --host cld-nagios --eval 'db.getSiblingDB("ceiloprod").dropDatabase()'; mongo --host cld-nagios --eval 'db = db.getSiblingDB("ceiloprod"); db.createUser({user: "ceiloprod",pwd: "CEILOMETER_DBPASS",roles: [ "readWrite", "dbAdmin" ]})'
 +EOF
 +</code>

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki