Table of Contents
Nagios plugins for swift monitoring
Sono stati implementati due tipi di verifica:
- Verifiche dei processi, cioe` verifiche che i processi necessari siano up.
- Verifiche funzionali, cioe` verifiche che swift funzioni correttamente.
Nel primo caso si e` utilizzato un il plugin check_procs installato con nagios-plugins-procs. Nel secondo caso ho preso dei plugin distribuiti in nagios-exchange, un po' datati, e li ho modificati, quando necessario, in base alle nostre necessita`. In tutti i casi i comandi remoti sono lanciati usando il plugin check_nrpe.
Per completezza inserisco qui il codice e descrivo le modifiche.
check_swift
Description: "check_swift_object_servers uses swift-recon to query all clusters servers and ensure they all have the same copy of the object ring."
- Codice originale qui.
- Codice modificato per ottenere la funzionalita` qui. Ho fatto una modifica di minima per vederlo funzionare aggiungendo come parametri di input il tenant ed il tenant-id e mettendo l'opzione '–insecure' (hardcoded) nei comandi.
- Codice modificato per leggere i parametri da file di configurazione anziche` da linea di comando qui. File di configurazione /etc/swift/swift-check.conf. Ho introdotto una modifica nel check dell'operazione di delete poiche` (anche provando a mano) l'operazione di delete restituiva un fallimento con messaggio "object not found", ma in realta` andava a buon fine perche`, facendo un listing, si vedeva che l'oggetto non c'era piu`. Probabilmente si tratta di problemi di sincronizzazione. Ho quindi fatto il check come risultato di una delete+list e il check fallisce solo se l'oggetto, dopo la delete, e` ancora presente nel list.
Questa soluzione evita di spargere qua e la le credenziali di accesso che in certi casi di errore venivano anche stampate (come parte del comando) nell'output grafico di nagios.
check_swift_dispersion
Description: "uses swift-dispersion tools to report dispersion analysis and checks that all copies of objects are OK"
- Codice originale qui. File di configurazione /etc/swift/dispersion.conf.
- Codice modificato qui. File di configurazione /etc/swift/dispersion.conf.
Lo script originale usa lo script swift_dispersion il cui output in icehouse e` cambiato (altri se ne sono lamentati qui). La documentazione dice che l'output json deve essere tipo: <code_bash> {"object":{"retries:": 0, "missing_two": 0, "copies_found": 7863, "missing_one": 0,"copies_expected": 7863, "pct_found": 100.0, "overlapping": 0, "missing_all": 0}, "container":{"retries:": 0, "missing_two": 0, "copies_found": 12534, "missing_one": 0, "copies_expected":12534, "pct_found": 100.0, "overlapping": 15, "missing_all": 0}} </code> invece nella realta` e`: <code_bash> {"object":{"retries": 0, "missing_0": 2621, "copies_expected": 7863, "pct_found": 100.0, "overlapping": 0, "copies_found": 7863}, "container":{"retries": 0, "copies_expected": 7866, "pct_found": 100.0, "overlapping": 0, "copies_found": 7866}} </code> cioe` mancano i missing_one _two _all che il plugin si aspetta.
Ho modificato il plugin in modo da fare il check solo su 'pct_found'. E` vero che si perde l'informazione se manca una o due o tutte le copie, ma lo script swift_dispersion non la da` piu`.
check_swift_object_servers
Description: "check_swift_object_servers uses swift-recon to query all clusters servers and ensure they all have the same copy of the object ring."
- Codice originale qui.
- Codice modificato per ottenere la funzionalita` qui. Ho solo cambiato un parametro con cui viene chiamato lo script swift_recon.
- Vecchio comando:
<code_bash>
swift-recon --objmd5 </code> * Nuovo comando: <code_bash> swift-recon --md5 </code>
Configurazione del Nagios server host
Configurazione dei comandi
commands.cfg [SL: /etc/nagios/objects/commands.cfg]
- Configurazione del comando check_nrpe plugin.
Ho configurato molti parametri perche` nel caso di check_swift in versione parametrica, ad esempio, ne servono 9, incluso il comando. Mi sono tenuta larga.
<code_bash> define command{
command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 480 -c $ARG1$ -a $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$ $ARG8$ $ARG9$ $ARG10$ $ARG11$ $ARG12$
} </code>
- comandi per monitorare i processi swift con check_procs lanciato sull'host da monitorare tramite check_nrpe
<code_bash> ### Check swift processes define command{
command_name check_swift-proxy-server command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-proxy-server
} define command{
command_name check_swift-object-server command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-object-server
} define command{
command_name check_swift-object-auditor command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-object-auditor
} define command{
command_name check_swift-object-replicator command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-object-replicator
} define command{
command_name check_swift-object-updater command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-object-updater
} define command{
command_name check_swift-account-server command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-account-server
} define command{
command_name check_swift-account-auditor command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-account-auditor
} define command{
command_name check_swift-account-replicator command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-account-replicator
} define command{
command_name check_swift-account-reaper command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-account-reaper
} define command{
command_name check_swift-container-server command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-container-server
} define command{
command_name check_swift-container-auditor command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-container-auditor
} define command{
command_name check_swift-container-replicator command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-container-replicator
} define command{
command_name check_swift-container-updater command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-container-updater
} define command{
command_name check_swift-container-sync command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_swift-container-sync
} </code>
- comandi per monitorare le funzionalita` di swift con check_procs lanciato sull'host da monitorare tramite check_nrpe
<code_bash> ### Check swift functionalities define command{
command_name check_swift command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 480 -c check_swift -A $ARG2$ -U $ARG3$ -T $ARG4$ -I $ARG5$ -K $ARG6$ -V $ARG7$ -c $ARG8$
} define command{
command_name check_swift_1 command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 480 -c check_swift_1
} define command{
command_name check_swift_dispersion command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 480 -c check_swift_dispersion
} define command{
command_name check_swift_object_servers command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 480 -c check_swift_object_servers
} define command{
command_name check_rsync command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_rsync
} </code>
Configurazione dei servizi per monitorare swift
swift_nodes.cfg [SL: /etc/nagios/objects/swift_nodes.cfg] <code_bash> #
Define service base template
define service{
name swift-service use server-ssh-service hostgroup_name swift-nodes ;,%Cluster2 ;append new cluster(host_group) here register 0 }
#
Define Swift Clusters
# A list of nodes in a cluster define hostgroup {
hostgroup_name swift-nodes ;Fixed me alias SwiftStack pd nodes ;Fixed me members storage-node-01, storage-node-02 ;Fixed me }
define hostgroup {
hostgroup_name swift-proxies ;Fixed me alias SwiftStack pd nodes ;Fixed me members proxy-node ;Fixed me }
#
Define Swift Services
define service {
service_description try to upload download and delete a file in a Swift container to check that it works correctly. Read input parameters from configuration file check_command check_nrpe!check_swift_1!$HOSTADDRESS$ use swift-proxy-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description uses swift-dispersion tools to report dispersion analysis and checks that all copies of objects are OK check_command check_nrpe!check_swift_dispersion!$HOSTADDRESS$ use swift-proxy-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description uses swift-recon to query all clusters servers and ensure they all have the same copy of the object ring. check_command check_nrpe!check_swift_object_servers!$HOSTADDRESS$ use swift-proxy-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if process swift-proxy-server is alive check_command check_nrpe!check_swift-proxy-server!$HOSTADDRESS$ use swift-proxy-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process object-server is alive check_command check_nrpe!check_swift-object-server!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process object-auditor is alive check_command check_nrpe!check_swift-object-auditor!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process object-replicator is alive check_command check_nrpe!check_swift-object-replicator!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process object-updater is alive check_command check_nrpe!check_swift-object-updater!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process account-server is alive check_command check_nrpe!check_swift-account-server!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process account-auditor is alive check_command check_nrpe!check_swift-account-auditor!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process account-replicator is alive check_command check_nrpe!check_swift-account-replicator!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process account-reaper is alive check_command check_nrpe!check_swift-account-reaper!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process container-server is alive check_command check_nrpe!check_swift-container-server!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process container-auditor is alive check_command check_nrpe!check_swift-container-auditor!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process container-replicator is alive check_command check_nrpe!check_swift-container-replicator!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process container-updater is alive check_command check_nrpe!check_swift-container-updater!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if swift process container-sync is alive check_command check_nrpe!check_swift-container-sync!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} define service {
service_description check if process rsync is alive check_command check_nrpe!check_rsync!$HOSTADDRESS$ use swift-service notification_interval 0 ; set > 0 if you want to be renotified
} </code>
Configurazione del server nagios
nagios.cfg [SL: /etc/nagios/nagios.cfg]
- Se ci sono problemi di timeout aggiustare:
<code_bash>
# TIMEOUT VALUES
# These options control how much time Nagios will allow various # types of commands to execute before killing them off. Options # are available for controlling maximum time allotted for # service checks, host checks, event handlers, notifications, the # ocsp command, and performance data commands. All values are in # seconds.
service_check_timeout=480 host_check_timeout=300 event_handler_timeout=300 notification_timeout=300 ocsp_timeout=150 perfdata_timeout=150 </code>
- Per debuggare aiuta debug_level=2048 che nel file /var/log/nagios/nagios.debug mostra come costruisce i comandi
<code_bash> # DEBUG LEVEL # This option determines how much (if any) debugging information will # be written to the debug file. OR values together to log multiple # types of information. # Values: # -1 = Everything # 0 = Nothing # 1 = Functions # 2 = Configuration # 4 = Process information # 8 = Scheduled events # 16 = Host/service checks # 32 = Notifications # 64 = Event broker # 128 = External commands # 256 = Commands # 512 = Scheduled downtime # 1024 = Comments # 2048 = Macros
debug_level=2048 </code>
Alla fine non dimenticare che per attivare le modifiche di configurazione devi restartare l'nrpe server, in SL: <code_bash> service nagios restart </code>
Installazione e configurazione del plugin nagios nrpe sull'host da monitorare
Installazione
- Su SL:
<code_bash>
yum install nagios-plugins-nrpe </code> * Su Ubuntu: <code_bash> sudo apt-get install nagios-nrpe-server nagios-plugins </code>
Configurazione
/etc/nagios.nrpe.cfg
- Definizione dei comandi
<code_bash>
######### # Swift # ######### # Swift processes checks command[check_swift-proxy-server]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "swift-proxy-server" command[check_swift-object-server]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "object-server" command[check_swift-object-auditor]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "object-auditor" command[check_swift-object-replicator]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "object-replicator" command[check_swift-object-updater]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "object-updater" command[check_swift-account-server]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "account-server" command[check_swift-account-auditor]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "account-auditor" command[check_swift-account-replicator]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "account-replicator" command[check_swift-account-reaper]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "account-reaper" command[check_swift-container-server]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "container-server" command[check_swift-container-auditor]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "container-auditor" command[check_swift-container-replicator]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "container-replicator" command[check_swift-container-updater]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "container-updater" command[check_swift-container-sync]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "container-sync" command[check_rsync]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "rsync" # Swift functionalities checks #command[check_swift]=/usr/lib/nagios/plugins/check_swift -w 5 -c 10 command[check_swift]=/usr/lib/nagios/plugins/check_swift -A $ARG2$ -U $ARG3$ -T $ARG4$ -I $ARG5$ -K $ARG6$ -V $ARG7$ -c $ARG8$ command[check_swift_1]=/usr/lib/nagios/plugins/check_swift_sabe command[check_swift_dispersion]=/usr/lib/nagios/plugins/check_swift_dispersion -w 5 -c 1 command[check_swift_object_servers]=/usr/lib/nagios/plugins/check_swift_object_servers -w 5 -c 1 </code> * Per fare in modo che check_nrpe accetti piu` argomenti <code_bash> # COMMAND ARGUMENT PROCESSING # This option determines whether or not the NRPE daemon will allow clients # to specify arguments to commands that are executed. This option only works # if the daemon was configured with the --enable-command-args configure script # option. # # *** ENABLING THIS OPTION IS A SECURITY RISK! *** # Read the SECURITY file for information on some of the security implications # of enabling this variable. # # Values: 0=do not allow arguments, 1=allow command arguments
dont_blame_nrpe=1 </code> * Per operazioni di debug (output nel syslog) <code_bash> # DEBUGGING OPTION # This option determines whether or not debugging messages are logged to the # syslog facility. # Values: 0=debugging off, 1=debugging on
debug=1 </code> * Se il plugin va in timeout, correggere <code_bash> # COMMAND TIMEOUT # This specifies the maximum number of seconds that the NRPE daemon will # allow plugins to finish executing before killing them off.
command_timeout=300 </code>
Alla fine non dimenticare che per attivare le modifiche di configurazione devi restartare l'nrpe server: <code_bash> /etc/init.d/nagios-nrpe-server restart </code>