progetti:cloud-areapd:best_practices:optimize_mysql_db_wan_keystone
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| progetti:cloud-areapd:best_practices:optimize_mysql_db_wan_keystone [2014/05/18 20:31] – [PROBLEM] aiftim@infn.it | progetti:cloud-areapd:best_practices:optimize_mysql_db_wan_keystone [2014/06/18 10:14] (current) – [SOLUTION] aiftim@infn.it | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | | ||
| + | |||
| + | ===== Reference ===== | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | * [[https:// | ||
| + | |||
| + | ===== PROBLEM ===== | ||
| + | |||
| + | After one week of activity the ibdata1 file was 8.1G and... growing | ||
| + | < | ||
| + | root@keystone: | ||
| + | -rw-rw---- 1 mysql mysql 8.1G May 12 13:01 / | ||
| + | </ | ||
| + | |||
| + | This caused a "disk full problem" | ||
| + | < | ||
| + | # less / | ||
| + | [...] | ||
| + | InnoDB: Your database may be corrupt or you may have copied the InnoDB | ||
| + | InnoDB: tablespace but not the InnoDB log files. See | ||
| + | InnoDB: http:// | ||
| + | InnoDB: for more information. | ||
| + | 140509 12:51:34 [ERROR] Slave SQL: Could not execute Write_rows event on table keystone.token; | ||
| + | de: 1062; handler error HA_ERR_FOUND_DUPP_KEY; | ||
| + | 140509 12:51:34 [Warning] WSREP: RBR event 2 Write_rows apply warning: 121, 861859 | ||
| + | 140509 12:51:34 [Warning] WSREP: Failed to apply app buffer: seqno: 861859, status: 1 | ||
| + | at galera/ | ||
| + | Retrying 2th time | ||
| + | [...] | ||
| + | 140509 12:51:34 [ERROR] Slave SQL: Could not execute Write_rows event on table keystone.token; | ||
| + | 140509 12:51:34 [Warning] WSREP: RBR event 2 Write_rows apply warning: 121, 861859 | ||
| + | 140509 12:51:34 [ERROR] WSREP: receiving IST failed, node restart required: Failed to apply trx 861859 10 times | ||
| + | 140509 12:51:34 [Note] WSREP: Closing send monitor... | ||
| + | 140509 12:51:34 [Note] WSREP: Closed send monitor. | ||
| + | 140509 12:51:34 [Note] WSREP: gcomm: terminating thread | ||
| + | 140509 12:51:34 [Note] WSREP: gcomm: joining thread | ||
| + | 140509 12:51:34 [Note] WSREP: gcomm: closing backend | ||
| + | 140509 12:51:34 [Note] WSREP: view(view_id(NON_PRIM, | ||
| + | dff534f1-d767-11e3-b8c5-ffab4b83e1d2, | ||
| + | } joined { | ||
| + | } left { | ||
| + | } partitioned { | ||
| + | 157f634e-d5ce-11e3-b07e-265b1d5e5cf0, | ||
| + | 27f583aa-c60a-11e3-af90-6a5bb07181cf, | ||
| + | }) | ||
| + | 140509 12:51:34 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 | ||
| + | 140509 12:51:34 [Note] WSREP: view((empty)) | ||
| + | 140509 12:51:34 [Note] WSREP: gcomm: closed | ||
| + | 140509 12:51:34 [Note] WSREP: Flow-control interval: [16, 16] | ||
| + | 140509 12:51:34 [Note] WSREP: Received NON-PRIMARY. | ||
| + | 140509 12:51:34 [Note] WSREP: Shifting JOINER -> OPEN (TO: 862345) | ||
| + | 140509 12:51:34 [Note] WSREP: Received self-leave message. | ||
| + | 140509 12:51:34 [Note] WSREP: Flow-control interval: [0, 0] | ||
| + | 140509 12:51:34 [Note] WSREP: Received SELF-LEAVE. Closing connection. | ||
| + | 140509 12:51:34 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 862345) | ||
| + | 140509 12:51:34 [Note] WSREP: RECV thread exiting 0: Success | ||
| + | 140509 12:51:34 [Note] WSREP: recv_thread() joined. | ||
| + | 140509 12:51:34 [Note] WSREP: Closing replication queue. | ||
| + | 140509 12:51:34 [Note] WSREP: Closing slave action queue. | ||
| + | 140509 12:51:34 [Note] WSREP: / | ||
| + | Aborted (core dumped) | ||
| + | 140509 12:51:34 mysqld_safe Number of processes running now: 0 | ||
| + | 140509 12:51:34 mysqld_safe WSREP: not restarting wsrep node automatically | ||
| + | 140509 12:51:34 mysqld_safe mysqld from pid file / | ||
| + | </ | ||
| + | |||
| + | The issue was due to the increased number of tokens, ~20000 in 24h, that are registered in the keystone DB, and not flushed. | ||
| + | |||
| + | < | ||
| + | # mysql -u root x xxxxxx | ||
| + | mysql> use keystone; | ||
| + | mysql> SELECT COUNT(*) FROM token; | ||
| + | +----------+ | ||
| + | | COUNT(*) | | ||
| + | +----------+ | ||
| + | | 18141 | | ||
| + | +----------+ | ||
| + | </ | ||
| + | |||
| + | ==== SOLUTION ==== | ||
| + | |||
| + | In order to reduce the number of tokens registered, to better handle the MySQL DBs it was decided to do the following steps: | ||
| + | * configure mysql with " | ||
| + | * set in Keystone the token expiration to 3h | ||
| + | * define a cron, in one of the hosts to flush tokens and optimize the token table in the keystone db | ||
| + | |||
| + | |||
| + | **STEPS**: | ||
| + | |||
| + | 1. Stop mysql and keystone on all nodes of the cluster | ||
| + | |||
| + | 2. Choose one node and do the following steps: | ||
| + | * start mysql | ||
| + | * mysqldump of all DBs | ||
| + | * stop mysql | ||
| + | * in my.cnf comment all lines containing wsrep* and add innodb_file_per_table in the [mysqld] section | ||
| + | <code bash> | ||
| + | # grep innodb_file_per_table / | ||
| + | innodb_file_per_table=1</ | ||
| + | * start mysql | ||
| + | * drop all databases except mysql, information_schema, | ||
| + | * stop mysql | ||
| + | * in my.cnf uncomment all wsrep* previously commented and change the wsrep_cluster_address | ||
| + | <code bash> | ||
| + | # grep wsrep_cluster_address / | ||
| + | wsrep_cluster_address=gcomm://</ | ||
| + | * due to a bug in Havana (see details bellow and [[expiration check is using different timezone other than UTC | Ref3]]) - in keystone.conf add " | ||
| + | <code bash> | ||
| + | [token] | ||
| + | driver = keystone.token.backends.sql.Token | ||
| + | # Amount of time a token should remain valid (in seconds) | ||
| + | # expiration = 86400 | ||
| + | expiration=10800</ | ||
| + | * restart mysql | ||
| + | * define a cron to execute the following script every 3h: | ||
| + | <code bash> | ||
| + | #!/bin/bash | ||
| + | / | ||
| + | /bin/ls -ltr / | ||
| + | echo " | ||
| + | /bin/ls -ltr / | ||
| + | |||
| + | 3. On the other nodes of the cluster: | ||
| + | * stop mysql | ||
| + | * in my.cnf comment all lines containing wsrep* | ||
| + | * start mysql | ||
| + | * drop all databases except mysql, information_schema, | ||
| + | * stop mysql | ||
| + | * in / | ||
| + | <code bash> | ||
| + | # cd / | ||
| + | # rm galera.cache grastate.dat ibdata1 ib_logfile* wsrep_recovery* </ | ||
| + | * add innodb_file_per_table in the [mysqld] section | ||
| + | < | ||
| + | # grep innodb_file_per_table / | ||
| + | | ||
| + | * start mysql | ||
| + | * stop mysql | ||
| + | * in my.cnf uncomment all wsrep* previously commented and change the wsrep_cluster_address | ||
| + | <code bash> | ||
| + | # grep wsrep_cluster_address / | ||
| + | wsrep_cluster_address=gcomm:// | ||
| + | * start mysql | ||
| + | |||
| + | 4. On the " | ||
| + | <code bash> | ||
| + | wsrep_cluster_address=gcomm://</ | ||
| + | |||
| + | back to | ||
| + | <code bash> | ||
| + | wsrep_cluster_address=gcomm:// | ||
| + | |||
| + | and restart mysql | ||
| + | |||
| + | |||
| + | Now there should be directories for each DB, in particular for keystone: | ||
| + | <code bash> | ||
| + | # ls -ltrh / | ||
| + | -rw-rw---- 1 mysql mysql 8.6K May 18 08:00 / | ||
| + | -rw-rw---- 1 mysql mysql 596M May 18 22:26 / | ||
| + | |||
| + | |||
| + | === Details of BUG #1195924 === | ||
| + | |||
| + | [[https:// | ||
| + | |||
| + | * the token are generated with start and expiration hour expresed in UTC, but the client libraries do the verification using the local time (in Italy = UTC +2h) during 1 hour. | ||
| + | * the patch was released for/in Icehouse - for which version we can think to leave the tokens duration of 1 h by default and configure the cron to run every hour. | ||
| + | |||
| + | |||
