User Tools

Site Tools


Sidebar

progetti:cloud-areapd:swift_monitoring

Nagios plugins for swift monitoring

Sono stati implementati due tipi di verifica:

  • Verifiche dei processi, cioe` verifiche che i processi necessari siano up. * **Verifiche funzionali**, cioe` verifiche che swift funzioni correttamente.

Nel primo caso si e` utilizzato un il plugin check_procs installato con nagios-plugins-procs. Nel secondo caso ho preso dei plugin distribuiti in [[http://exchange.nagios.org/directory/Plugins/Clustering-and-High-2DAvailability/check_swift/details|nagios-exchange]], un po' datati, e li ho modificati, quando necessario, in base alle nostre necessita`. In tutti i casi i comandi remoti sono lanciati usando il plugin check_nrpe.

Per completezza inserisco qui il codice e descrivo le modifiche.

check_swift

Description: "check_swift_object_servers uses swift-recon to query all clusters servers and ensure they all have the same copy of the object ring."

  • Codice originale qui.
  • Codice modificato per ottenere la funzionalita` {{:progetti:cloud-areapd:check_swift_sabe_1.txt| qui}}. Ho fatto una modifica di minima per vederlo funzionare aggiungendo come parametri di input il tenant ed il tenant-id e mettendo l'opzione '--insecure' (hardcoded) nei comandi. * Codice modificato per leggere i parametri da file di configurazione anziche` da linea di comando qui. File di configurazione /etc/swift/swift-check.conf. Ho introdotto una modifica nel check dell'operazione di delete poiche` (anche provando a mano) l'operazione di delete restituiva un fallimento con messaggio "object not found", ma in realta` andava a buon fine perche`, facendo un listing, si vedeva che l'oggetto non c'era piu`. Probabilmente si tratta di problemi di sincronizzazione. Ho quindi fatto il check come risultato di una delete+list e il check fallisce solo se l'oggetto, dopo la delete, e` ancora presente nel list. Questa soluzione evita di spargere qua e la le credenziali di accesso che in certi casi di errore venivano anche stampate (come parte del comando) nell'output grafico di nagios. ==== check_swift_dispersion ==== **Description:** "uses swift-dispersion tools to report dispersion analysis and checks that all copies of objects are OK" * Codice originale {{:progetti:cloud-areapd:check_swift_dispersion.txt| qui}}. File di configurazione {{:progetti:cloud-areapd:dispersion.conf.txt|/etc/swift/dispersion.conf}}. * Codice modificato {{:progetti:cloud-areapd:check_swift_dispersion_sabe.txt| qui}}. File di configurazione {{:progetti:cloud-areapd:dispersion.conf.txt|/etc/swift/dispersion.conf}}. Lo script originale usa lo script swift_dispersion il cui output in icehouse e` cambiato (altri se ne sono lamentati qui). La documentazione dice che l'output json deve essere tipo:

<code_bash> {"object":{"retries:": 0, "missing_two": 0, "copies_found": 7863, "missing_one": 0,"copies_expected": 7863, "pct_found": 100.0, "overlapping": 0, "missing_all": 0}, "container":{"retries:": 0, "missing_two": 0, "copies_found": 12534, "missing_one": 0, "copies_expected":12534, "pct_found": 100.0, "overlapping": 15, "missing_all": 0}} </code> invece nella realta` e`: <code_bash> {"object":{"retries": 0, "missing_0": 2621, "copies_expected": 7863, "pct_found": 100.0, "overlapping": 0, "copies_found": 7863}, "container":{"retries": 0, "copies_expected": 7866, "pct_found": 100.0, "overlapping": 0, "copies_found": 7866}} </code> cioe` mancano i missing_one _two _all che il plugin si aspetta. Ho modificato il plugin in modo da fare il check solo su 'pct_found'. E` vero che si perde l'informazione se manca una o due o tutte le copie, ma lo script swift_dispersion non la da` piu`.

check_swift_object_servers

Description: "check_swift_object_servers uses swift-recon to query all clusters servers and ensure they all have the same copy of the object ring."

  • Codice originale qui.
  • Codice modificato per ottenere la funzionalita` {{:progetti:cloud-areapd:check_swift_object_servers.txt| qui}}. Ho solo cambiato un parametro con cui viene chiamato lo script swift_recon. * Vecchio comando: <code_bash> swift-recon --objmd5 </code> * Nuovo comando: <code_bash> swift-recon --md5 </code> ====== Configurazione del Nagios server host ====== ==== Configurazione dei comandi ==== **commands.cfg** [SL: /etc/nagios/objects/commands.cfg] * Configurazione del comando check_nrpe plugin. Ho configurato molti parametri perche` nel caso di check_swift in versione parametrica, ad esempio, ne servono 9, incluso il comando. Mi sono tenuta larga.

<code_bash> define command{

  command_name            check_nrpe
  command_line            $USER1$/check_nrpe  -H $HOSTADDRESS$ -t 480 -c $ARG1$ -a $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$ $ARG8$ $ARG9$ $ARG10$ $ARG11$ $ARG12$

} </code>

  • comandi per monitorare i processi swift con check_procs lanciato sull'host da monitorare tramite check_nrpe

<code_bash> ### Check swift processes define command{

      command_name        check_swift-proxy-server
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-proxy-server

} define command{

      command_name        check_swift-object-server
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-object-server

} define command{

      command_name        check_swift-object-auditor
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-object-auditor

} define command{

      command_name        check_swift-object-replicator
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-object-replicator

} define command{

      command_name        check_swift-object-updater
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-object-updater

} define command{

      command_name        check_swift-account-server
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-account-server

} define command{

      command_name        check_swift-account-auditor
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-account-auditor

} define command{

      command_name        check_swift-account-replicator
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-account-replicator

} define command{

      command_name        check_swift-account-reaper
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-account-reaper

} define command{

      command_name        check_swift-container-server
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-container-server

} define command{

      command_name        check_swift-container-auditor
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-container-auditor

} define command{

      command_name        check_swift-container-replicator
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-container-replicator

} define command{

      command_name        check_swift-container-updater
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-container-updater

} define command{

      command_name        check_swift-container-sync
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_swift-container-sync

} </code>

  • comandi per monitorare le funzionalita` di swift con check_procs lanciato sull'host da monitorare tramite check_nrpe <code_bash> ### Check swift functionalities define command{ command_name check_swift command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 480 -c check_swift -A $ARG2$ -U $ARG3$ -T $ARG4$ -I $ARG5$ -K $ARG6$ -V $ARG7$ -c $ARG8$

} define command{

  command_name            check_swift_1
  command_line            $USER1$/check_nrpe -H $HOSTADDRESS$ -t 480 -c check_swift_1

} define command{

  command_name            check_swift_dispersion
  command_line            $USER1$/check_nrpe -H $HOSTADDRESS$ -t 480 -c check_swift_dispersion

} define command{

  command_name            check_swift_object_servers
  command_line            $USER1$/check_nrpe -H $HOSTADDRESS$ -t 480 -c check_swift_object_servers

} define command{

      command_name        check_rsync
      command_line        $USER1$/check_nrpe  -H $HOSTADDRESS$ -c check_rsync

} </code>

Configurazione dei servizi per monitorare swift

swift_nodes.cfg [SL: /etc/nagios/objects/swift_nodes.cfg] <code_bash> #

Define service base template

define service{

      name                            swift-service
      use                             server-ssh-service
      hostgroup_name                  swift-nodes ;,%Cluster2     ;append new cluster(host_group) here
      register                        0
      }

#

Define Swift Clusters

# A list of nodes in a cluster define hostgroup {

      hostgroup_name  swift-nodes              ;Fixed me
      alias           SwiftStack pd nodes      ;Fixed me
      members         storage-node-01, storage-node-02      ;Fixed me
      }

define hostgroup {

      hostgroup_name  swift-proxies            ;Fixed me
      alias           SwiftStack pd nodes      ;Fixed me
      members         proxy-node               ;Fixed me
      }

#

Define Swift Services

define service {

      service_description             try to upload download and delete a file in a Swift container to check that it works correctly. Read input parameters from configuration file
      check_command                   check_nrpe!check_swift_1!$HOSTADDRESS$
      use                             swift-proxy-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             uses swift-dispersion tools to report dispersion analysis and checks that all copies of objects are OK 
      check_command                   check_nrpe!check_swift_dispersion!$HOSTADDRESS$
      use                             swift-proxy-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             uses swift-recon to query all clusters servers and ensure they all have the same copy of the object ring.
      check_command                   check_nrpe!check_swift_object_servers!$HOSTADDRESS$
      use                             swift-proxy-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if process swift-proxy-server is alive
      check_command                   check_nrpe!check_swift-proxy-server!$HOSTADDRESS$
      use                             swift-proxy-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process object-server is alive
      check_command                   check_nrpe!check_swift-object-server!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process object-auditor is alive
      check_command                   check_nrpe!check_swift-object-auditor!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process object-replicator is alive
      check_command                   check_nrpe!check_swift-object-replicator!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process object-updater  is alive
      check_command                   check_nrpe!check_swift-object-updater!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process account-server is alive
      check_command                   check_nrpe!check_swift-account-server!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process account-auditor is alive
      check_command                   check_nrpe!check_swift-account-auditor!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process account-replicator is alive
      check_command                   check_nrpe!check_swift-account-replicator!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process account-reaper is alive
      check_command                   check_nrpe!check_swift-account-reaper!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process container-server is alive
      check_command                   check_nrpe!check_swift-container-server!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process container-auditor is alive
      check_command                   check_nrpe!check_swift-container-auditor!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process container-replicator is alive
      check_command                   check_nrpe!check_swift-container-replicator!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process container-updater is alive
      check_command                   check_nrpe!check_swift-container-updater!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if swift process container-sync is alive
      check_command                   check_nrpe!check_swift-container-sync!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} define service {

      service_description             check if process rsync is alive
      check_command                   check_nrpe!check_rsync!$HOSTADDRESS$
      use                             swift-service
      notification_interval           0 ; set > 0 if you want to be renotified

} </code>

Configurazione del server nagios

nagios.cfg [SL: /etc/nagios/nagios.cfg]

  • Se ci sono problemi di timeout aggiustare:

<code_bash>

# TIMEOUT VALUES

# These options control how much time Nagios will allow various # types of commands to execute before killing them off. Options # are available for controlling maximum time allotted for # service checks, host checks, event handlers, notifications, the # ocsp command, and performance data commands. All values are in # seconds.

service_check_timeout=480 host_check_timeout=300 event_handler_timeout=300 notification_timeout=300 ocsp_timeout=150 perfdata_timeout=150 </code>

  • Per debuggare aiuta debug_level=2048 che nel file /var/log/nagios/nagios.debug mostra come costruisce i comandi

<code_bash> # DEBUG LEVEL # This option determines how much (if any) debugging information will # be written to the debug file. OR values together to log multiple # types of information. # Values: # -1 = Everything # 0 = Nothing # 1 = Functions # 2 = Configuration # 4 = Process information # 8 = Scheduled events # 16 = Host/service checks # 32 = Notifications # 64 = Event broker # 128 = External commands # 256 = Commands # 512 = Scheduled downtime # 1024 = Comments # 2048 = Macros

debug_level=2048 </code>

Alla fine non dimenticare che per attivare le modifiche di configurazione devi restartare l'nrpe server, in SL: <code_bash> service nagios restart </code>

Installazione e configurazione del plugin nagios nrpe sull'host da monitorare

Installazione
  • Su SL:

<code_bash>

      yum install nagios-plugins-nrpe
      </code>
    * Su Ubuntu:
      <code_bash>
      sudo apt-get install nagios-nrpe-server nagios-plugins
      </code>
Configurazione

/etc/nagios.nrpe.cfg

  • Definizione dei comandi

<code_bash>

      #########
      # Swift #
      #########
      # Swift processes checks
      command[check_swift-proxy-server]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "swift-proxy-server" 
      command[check_swift-object-server]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "object-server"
      command[check_swift-object-auditor]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "object-auditor" 
      command[check_swift-object-replicator]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "object-replicator" 
      command[check_swift-object-updater]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "object-updater" 
      command[check_swift-account-server]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "account-server" 
      command[check_swift-account-auditor]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "account-auditor" 
      command[check_swift-account-replicator]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "account-replicator" 
      command[check_swift-account-reaper]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "account-reaper" 
      command[check_swift-container-server]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "container-server" 
      command[check_swift-container-auditor]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "container-auditor" 
      command[check_swift-container-replicator]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "container-replicator" 
      command[check_swift-container-updater]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "container-updater" 
      command[check_swift-container-sync]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "container-sync" 
      command[check_rsync]=/usr/lib/nagios/plugins/check_procs -w 2:10 -c 1:100 -a "rsync"
      # Swift functionalities checks
      #command[check_swift]=/usr/lib/nagios/plugins/check_swift  -w 5 -c 10   
      command[check_swift]=/usr/lib/nagios/plugins/check_swift -A $ARG2$ -U $ARG3$ -T $ARG4$ -I $ARG5$ -K $ARG6$ -V $ARG7$ -c $ARG8$
      command[check_swift_1]=/usr/lib/nagios/plugins/check_swift_sabe 
      command[check_swift_dispersion]=/usr/lib/nagios/plugins/check_swift_dispersion  -w 5 -c 1
      command[check_swift_object_servers]=/usr/lib/nagios/plugins/check_swift_object_servers -w 5 -c 1
      </code>
   * Per fare in modo che check_nrpe accetti piu` argomenti
      <code_bash>
      # COMMAND ARGUMENT PROCESSING
      # This option determines whether or not the NRPE daemon will allow clients
      # to specify arguments to commands that are executed.  This option only works
      # if the daemon was configured with the --enable-command-args configure script
      # option.  
      #
      # *** ENABLING THIS OPTION IS A SECURITY RISK! *** 
      # Read the SECURITY file for information on some of the security implications
      # of enabling this variable.
      #
      # Values: 0=do not allow arguments, 1=allow command arguments
      dont_blame_nrpe=1
      </code>
   * Per operazioni di debug (output nel syslog)
      <code_bash>
      # DEBUGGING OPTION
      # This option determines whether or not debugging messages are logged to the
      # syslog facility.
      # Values: 0=debugging off, 1=debugging on
      debug=1
      </code>
   * Se il plugin va in timeout, correggere
      <code_bash>
      # COMMAND TIMEOUT
      # This specifies the maximum number of seconds that the NRPE daemon will
      # allow plugins to finish executing before killing them off.
      command_timeout=300
      </code>

Alla fine non dimenticare che per attivare le modifiche di configurazione devi restartare l'nrpe server: <code_bash> /etc/init.d/nagios-nrpe-server restart </code>

progetti/cloud-areapd/swift_monitoring.txt · Last modified: 2014/10/17 11:27 by bertocco@infn.it