User Tools

Site Tools


Sidebar

progetti:cloud-areapd:operations:monitoring

Monitoring

Ganglia

Nagios

Active sensors

  • Current load: Check the 1m, 5m, 15m loads on the monitored host. Report an error or warning if one of these values exceeds a threshold (thresholds are defined in the Nagios .cfg files)
  • Dell Server XYZ: OpenManage sensors installed on Dell boxes. They check different hardware components.
  • HTTP: check if the http service is running
  • HTTPS: check the status of the http and https services
  • HTTPS Certificate: check the certificate used in a HTTPS server. It reports a warning if it is expiring in less than 30 days
  • KVM: this sensor is used on the Compute Nodes. If checks the status of KVM service and of the hosted VMs. It reports a warning if VMs are not all actives. So e.g. it could report a warning if a VM is in shutoff state, but this could not be a problem is the instance is in that state because of an user action
  • MySQL XYZ database: this sensor is used in the mysql nodes. It checks the status of the considered DB.
  • Neutron Agents: this sensor is used on the Network Node. It runs the neutron agent-list command and check if for all the agents the status is :-), i.e. ok
  • Neutron Server: this sensor is used on the Network Node. It checks if a neutron-server process is being executed. If is reports a problem it should be checked if indeed the process is not running. Before restarting it, a check in the log files (in particular in '/var/log/neutron' is recommended)
  • Nova Compute: this sensor is used on the Compute Nodes. It checks if a nova-compute process is being executed. If is reports a problem it should be checked if indeed the process is not running. Before restarting it, a check in the log files (in particular in '/var/log/nova' is recommended)
  • Openstack Check EC2 Instances: Check, using the EC2 API, the instances on the Cloud. It reports a problem if the command doesn't work or if the VMs in error are more than a certain threshold, usually 5 (it is defined in the Nagios .cfg files)
  • Openstack Check Nova Instances: Check, using the Nova API, the instances on the Cloud. It reports a problem if the command doesn't work or if the VMs in error are more than a certain threshold, usually 5 (it is defined in the Nagios .cfg files)
  • OpenStack Glance API: Check, using the Glance API, the available images registered in Glance. In case of problems please check the log files, in particular in '/var/log/glance'.
  • OpenStack Glance Upload: Upload (and then delete) a small image in the Glance service, to test it. In case of problems please check the log files, in particular in '/var/log/glance'.
  • OpenStack Keystone API: this sensor is used in the Controller node. It checks the functionality of the keystone service trying to get a token for it. If it reports a problem, please check the log files, in particular the ones in '/var/log/keystone'. In case of problems please check the log files, in particular in '/var/log/nova'
  • OpenStack Nova API: this sensor tests the functionality of the Nova API asking the list of the available flavors.
  • OpenvSwitch: this sensor is used on the Compute Nodes. First of all it checks if the openvswitch service is running. It then parses the output of the 'ovs-ofctl dump-flows br-tun' command: a row with the strings 'table=0' and 'actions=resubmit' should be there. If this is not the case, a restart of the 'openvswitch' and 'neutron-openvswitch-agent' could help
  • PerconaStatus: this sensor is used to monitor the members of the Percona cluster. It checks if it is synced and if the number of members of tha database cluster is 3 (i.e. if all the members are working)
  • PING: check if the considered host is pingable
  • SSH: check if the SSH service on the host is active
  • XYZ Filesystem total size: check the total size of a filesystem and compares it with the expected value. This sensor is used for Gluster file systems, to be notified when one brick "is lost"
  • XYZ Gluster: checks a gluster volume in the servers providing it
  • XYZ Partition': check if the considered file system is available and check its size. Trigger a warning/error if the free space is less than a certain threshold defined in the Nagios .cfg files (usually 20 % for Warning, 10 % for Error)

Ceilometer

Installation on Controller

#!/bin/sh
export CEILOHOST=controller-01.pd.infn.it
export CEILOHOSTPV=controller-01.cloud.pd.infn.it
export DBHOST=cld-nagios
export KEYHOST=cloud-areapd.pd.infn.it
export RABBITHOSTS="192.168.60.100:5672,192.168.60.101:5672"
#
yum install -y openstack-ceilometer-api openstack-ceilometer-collector openstack-ceilometer-notification openstack-ceilometer-central openstack-ceilometer-alarm python-ceilometerclient
 
## the lines below to be executed on $DBHOST
#cat <<EOF >/etc/yum.repos.d/mongodb.repo 
#[mongodb]
#name=MongoDB Repository
#baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
#gpgcheck=0
#enabled=1
#EOF
#
#yum install -y mongodb-org
#service mongod start
#chkconfig mongod on
#mongo --host $DBHOST --eval 'db = db.getSiblingDB("ceiloprod"); db.createUser({user: "ceiloprod",pwd: "CEILOMETER_DBPASS",roles: [ "readWrite", "dbAdmin" ]})'
## end of lines to be executed on $DBHOST
#
## if mongodb already in place at $DBHOST and ceilometer DB not yet created, you can do it from the controller as below:
yum install -y mongodb-org-shell
mongo --host $DBHOST --eval 'db = db.getSiblingDB("ceiloprod"); db.createUser({user: "ceiloprod",pwd: "CEILOMETER_DBPASS",roles: [ "readWrite", "dbAdmin" ]})'
## end of $DBHOST stuff
#
openstack-config --set /etc/ceilometer/ceilometer.conf database connection mongodb://ceiloprod:CEILOMETER_DBPASS@$DBHOST:27017/ceiloprod
#
CEILOMETER_TOKEN=$(openssl rand -hex 10)
echo $CEILOMETER_TOKEN
openstack-config --set /etc/ceilometer/ceilometer.conf publisher metering_secret $CEILOMETER_TOKEN
openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rpc_backend ceilometer.openstack.common.rpc.impl_kombu
openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rabbit_ha_queues true
openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rabbit_hosts $RABBITHOSTS
openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT auth_strategy noauth
#
source keystone_admin.sh
keystone user-create --name=ceilometer --pass=CEILOMETER_PASS --email=ceilometer@example.com
keystone user-role-add --user=ceilometer --tenant=services --role=admin
keystone service-create --name=ceilometer --type=metering   --description="Ceilometer Telemetry Service"
keystone endpoint-create --region=regionOne --service-id=$(keystone service-list | awk '/ metering / {print $2}') --publicurl=https://$CEILOHOST:8777 --internalurl=https://$CEILOHOSTPV:8777 --adminurl=https://$CEILOHOSTPV:8777
#
openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken auth_host $KEYHOST
openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken admin_user ceilometer
openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken admin_tenant_name services
openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken auth_protocol https
openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken auth_uri https://$KEYHOST:35357
openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken admin_password CEILOMETER_PASS
openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken cafile /etc/grid-security/certificates/INFN-CA-2006.pem
openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_username ceilometer
openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_tenant_name services
openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_password CEILOMETER_PASS
openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_cacert /etc/grid-security/certificates/INFN-CA-2006.pem
#
openstack-config --set /etc/cinder/cinder.conf DEFAULT control_exchange cinder
openstack-config --set /etc/cinder/cinder.conf DEFAULT notification_driver cinder.openstack.common.notifier.rpc_notifier
service openstack-cinder-api restart
service openstack-cinder-volume restart
#
openstack-config --set /etc/glance/glance-api.conf DEFAULT notification_driver messaging
openstack-config --set /etc/glance/glance-api.conf DEFAULT rpc_backend rabbit
service openstack-glance-api restart
service openstack-glance-registry restart
#
for i in `ls /etc/rc.d/init.d/openstack-ceilometer-*` ; do $i start; done

Installation on Compute Nodes

#!/bin/sh
export KEYHOST=cloud-areapd.pd.infn.it
export RABBITMGIP=192.168.60.100
export RABBITHOSTS="192.168.60.100:5672,192.168.60.101:5672"
export CEILOMETER_TOKEN=5bfc5aa7fb1162dad8f1
#
# 1) do not forget to add in nova.conf:
#[DEFAULT]
#...
#notification_driver = nova.openstack.common.notifier.rpc_notifier
#notification_driver = ceilometer.compute.nova_notifier
sed -i 's|#notification_driver=|notification_driver = nova.openstack.common.notifier.rpc_notifier\nnotification_driver = ceilometer.compute.nova_notifier|g' /etc/nova/nova.conf
#
# 2) use the same $CEILOMETER_TOKEN created on the Controller Node
#
yum install -y openstack-ceilometer-compute
openstack-config --set /etc/nova/nova.conf DEFAULT instance_usage_audit True
openstack-config --set /etc/nova/nova.conf DEFAULT instance_usage_audit_period hour
openstack-config --set /etc/nova/nova.conf DEFAULT notify_on_state_change vm_and_task_state
openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rpc_backend ceilometer.openstack.common.rpc.impl_kombu
openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT auth_strategy noauth
openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rabbit_ha_queues true
openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rabbit_hosts $RABBITHOSTS
#openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT rabbit_host $RABBITMGIP
openstack-config --set /etc/ceilometer/ceilometer.conf publisher metering_secret $CEILOMETER_TOKEN
openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken auth_host $KEYHOST
openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken admin_user ceilometer
openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken admin_tenant_name services
openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken auth_protocol https
openstack-config --set /etc/ceilometer/ceilometer.conf keystone_authtoken admin_password CEILOMETER_PASS
openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_username ceilometer
openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_tenant_name services
openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_password CEILOMETER_PASS
openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_auth_url https://$KEYHOST:35357/v2.0
openstack-config --set /etc/ceilometer/ceilometer.conf service_credentials os_cacert /etc/grid-security/certificates/INFN-CA-2006.pem
sed -i 's|- cpu_sink|- cpu_sink\n          - cpu_sink.hrs|g' /etc/ceilometer/pipeline.yaml
cat <<EOF >> /etc/ceilometer/pipeline.yaml
    - name: cpu_sink.hrs
      transformers:
          - name: "unit_conversion"
            parameters:
                target:
                    name: "cpu.hours"
                    scale: "volume / (10**9 * 3600.0)"
                    unit: "hours"
                    type: "cumulative"
      publishers:
          - rpc://
EOF
service openstack-nova-compute restart
service openstack-ceilometer-compute start
chkconfig openstack-ceilometer-compute on
service openstack-ceilometer-compute status
ls -l /var/log/ceilometer/

Testing from Controller

source ceilometerrc.sh 
# show meters and resources
ceilometer meter-list
ceilometer resource-list
 
# show the entire sample for a given meter for all resources
ceilometer sample-list -m cpu
 
# show the aggregate statistics (average, maxima, minima, sum, count) for a given meter for all resources
ceilometer statistics -m cpu
ceilometer statistics -m cpu.hours
ceilometer statistics -m cpu_util
 
# cpu for an instance is defined as the Total cpu_time (in nanoseconds) output from the command "virsh cpu-stats instance-id" on the hypervisor
# cpu_util is (cpu_t2 - cpu_t1)*100/(Dt*n_vcpus*10**9) where Dt=t2-t1 is the sampling time (600 s by default) and n_vcpu is the number of VCPU of the instance
 
# show the sample for a given resource after a given start time
START=2015-02-09T12:00:00
INSTANCE=67fccb06-b80e-42ab-9fca-f5e57adaba7
ceilometer sample-list -m cpu.hours -q "timestamp>$START;resource_id=$INSTANCE"
 
# show the aggregate statistics (average, maxima, minima, sum, count) for a given resource after a given start time for a given period (here 3600 seconds)
ceilometer statistics -m cpu.hours -q "timestamp>$START;resource_id=$INSTANCE" -p 3600

Set up a monthly backup of Ceilometer DB

  • Inspired by the instructions here
  • From the controller node execute the script below:
[root@cld-ctrl-01 ceilometer]# cat ceilometer-backup.sh
#!/bin/sh
export DBHOST=cld-nagios
export DBHOSTIP=192.168.60.32
cat <<EOF >/etc/yum.repos.d/mongodb.repo
[mongodb]
name=MongoDB Repository
baseurl=http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/
gpgcheck=0
enabled=1
EOF
yum -y install mongodb-org-shell mongodb-org-tools
# set time_to_live to 32 days
openstack-config --set /etc/ceilometer/ceilometer.conf DEFAULT time_to_live 2764800
for i in `ls /etc/rc.d/init.d/openstack-ceilometer-*` ; do $i restart; done
# copying the mongo_backup.py script
git clone https://github.com/icclab/arcus-energy-monitoring-tool/
useradd ceilobackup
cp arcus-energy-monitoring-tool/Tools/mongo_backup.py /home/ceilobackup/
cd /home/ceilobackup/
# set properly the DB host_ip (the one of cld-nagios) and ceilometer username password
sed -i 's|username = "ceilometer"|username = "ceiloprod"|g' mongo_backup.py
sed -i 's|password = "password"|password = "CEILOMETER_DBPASS"|g' mongo_backup.py
sed -i 's|host_ip = "192.168.0.2"|host_ip = "192.168.60.32"|g' mongo_backup.py
sed -i 's|zipath = "/path/to/zip/"|zipath = "/home/ceilobackup/zip/"|g' mongo_backup.py
sed -i 's|path = "/path/to/tmp/"|path = "/home/ceilobackup/tmp/"|g' mongo_backup.py
sed -i 's|ceilometer|ceiloprod|g' mongo_backup.py
#
chown ceilobackup.ceilobackup /home/ceilobackup/mongo_backup.py
su - ceilobackup -c "python mongo_backup.py; ls -lh zip/"
# purge monthly the DB:
cat <<EOF >/etc/cron.d/ceilometer-backup
0 0 1 * * ceilobackup python mongo_backup.py; mongo --host cld-nagios --eval 'db.getSiblingDB("ceiloprod").dropDatabase()'; mongo --host cld-nagios --eval 'db = db.getSiblingDB("ceiloprod"); db.createUser({user: "ceiloprod",pwd: "CEILOMETER_DBPASS",roles: [ "readWrite", "dbAdmin" ]})'
EOF
progetti/cloud-areapd/operations/monitoring.txt · Last modified: 2016/03/07 10:57 by verlato@infn.it