How to Optimize MySQL DB - WAN Keystone

Reference

PROBLEM

After one week of activity the ibdata1 file was 8.1G and… growing

root@keystone:~# ls -lh /var/lib/mysql/ibdata1 
-rw-rw---- 1 mysql mysql 8.1G May 12 13:01 /var/lib/mysql/ibdata1

This caused a "disk full problem" in LNGS, followed up by a degradation of the DB and the impossibility to restart mysql on the primary node of the cluster, keystone.pd.infn.it

# less  /var/lib/mysql/keystone.err
[...]
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
InnoDB: for more information.
140509 12:51:34 [ERROR] Slave SQL: Could not execute Write_rows event on table keystone.token; Duplicate entry '494830e055f015b77457cefe1ff2c9cc' for key 'PRIMARY', Error_co
de: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 30457, Error_code: 1062
140509 12:51:34 [Warning] WSREP: RBR event 2 Write_rows apply warning: 121, 861859
140509 12:51:34 [Warning] WSREP: Failed to apply app buffer: seqno: 861859, status: 1
         at galera/src/replicator_smm.cpp:apply_wscoll():57
Retrying 2th time
[...]
140509 12:51:34 [ERROR] Slave SQL: Could not execute Write_rows event on table keystone.token; Duplicate entry '494830e055f015b77457cefe1ff2c9cc' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 30457, Error_code: 1062
140509 12:51:34 [Warning] WSREP: RBR event 2 Write_rows apply warning: 121, 861859
140509 12:51:34 [ERROR] WSREP: receiving IST failed, node restart required: Failed to apply trx 861859 10 times
140509 12:51:34 [Note] WSREP: Closing send monitor...
140509 12:51:34 [Note] WSREP: Closed send monitor.
140509 12:51:34 [Note] WSREP: gcomm: terminating thread
140509 12:51:34 [Note] WSREP: gcomm: joining thread
140509 12:51:34 [Note] WSREP: gcomm: closing backend
140509 12:51:34 [Note] WSREP: view(view_id(NON_PRIM,157f634e-d5ce-11e3-b07e-265b1d5e5cf0,1461) memb {
        dff534f1-d767-11e3-b8c5-ffab4b83e1d2,
} joined {
} left {
} partitioned {
        157f634e-d5ce-11e3-b07e-265b1d5e5cf0,
        27f583aa-c60a-11e3-af90-6a5bb07181cf,
})
140509 12:51:34 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
140509 12:51:34 [Note] WSREP: view((empty))
140509 12:51:34 [Note] WSREP: gcomm: closed
140509 12:51:34 [Note] WSREP: Flow-control interval: [16, 16]
140509 12:51:34 [Note] WSREP: Received NON-PRIMARY.
140509 12:51:34 [Note] WSREP: Shifting JOINER -> OPEN (TO: 862345)
140509 12:51:34 [Note] WSREP: Received self-leave message.
140509 12:51:34 [Note] WSREP: Flow-control interval: [0, 0]
140509 12:51:34 [Note] WSREP: Received SELF-LEAVE. Closing connection.
140509 12:51:34 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 862345)
140509 12:51:34 [Note] WSREP: RECV thread exiting 0: Success
140509 12:51:34 [Note] WSREP: recv_thread() joined.
140509 12:51:34 [Note] WSREP: Closing replication queue.
140509 12:51:34 [Note] WSREP: Closing slave action queue.
140509 12:51:34 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted (core dumped)
140509 12:51:34 mysqld_safe Number of processes running now: 0
140509 12:51:34 mysqld_safe WSREP: not restarting wsrep node automatically
140509 12:51:34 mysqld_safe mysqld from pid file /var/lib/mysql/keystone.pid ended

The issue was due to the increased number of tokens, ~20000 in 24h, that are registered in the keystone DB, and not flushed.

# mysql -u root x xxxxxx
mysql> use keystone;
mysql> SELECT COUNT(*) FROM token;
+----------+
| COUNT(*) |
+----------+
|    18141 |
+----------+

SOLUTION

In order to reduce the number of tokens registered, to better handle the MySQL DBs it was decided to do the following steps:

configure mysql with "innodb_file_per_table=1" (see reference)
set in Keystone the token expiration to 3h
define a cron, in one of the hosts to flush tokens and optimize the token table in the keystone db

STEPS:

1. Stop mysql and keystone on all nodes of the cluster

2. Choose one node and do the following steps:

start mysql
mysqldump of all DBs
stop mysql
in my.cnf comment all lines containing wsrep* and add innodb_file_per_table in the [mysqld] section

    # grep innodb_file_per_table /etc/mysql/my.cnf 
    innodb_file_per_table=1

start mysql
drop all databases except mysql, information_schema, performance_schema, certs and restore them from the dump previousley done
stop mysql
in my.cnf uncomment all wsrep* previously commented and change the wsrep_cluster_address

    # grep wsrep_cluster_address /etc/mysql/my.cnf |egrep -v '^    #|^#|^$'
    wsrep_cluster_address=gcomm://

due to a bug in Havana (see details bellow and Ref3) - in keystone.conf add "expiration = 10800" in the [token] section

    [token]
    driver = keystone.token.backends.sql.Token 
    # Amount of time a token should remain valid (in seconds)
    # expiration = 86400
    expiration=10800

restart mysql
define a cron to execute the following script every 3h:

    #!/bin/bash
    /usr/bin/keystone-manage -v -d token_flush
    /bin/ls -ltr /var/lib/mysql/keystone/token.ibd
    echo "optimize table token" | /usr/bin/mysql -u root -pxxxxxx keystone
    /bin/ls -ltr /var/lib/mysql/keystone/token.ibd

3. On the other nodes of the cluster:

stop mysql
in my.cnf comment all lines containing wsrep*
start mysql
drop all databases except mysql, information_schema, performance_schema, certs
stop mysql
in /var/lib/mysql remove galera.cache grastate.dat ibdata1 ib_logfile* wsrep_recovery*

    # cd /var/lib/mysql
    # rm galera.cache grastate.dat ibdata1 ib_logfile* wsrep_recovery*

add innodb_file_per_table in the [mysqld] section

     # grep innodb_file_per_table /etc/mysql/my.cnf 
     innodb_file_per_table=1

start mysql
stop mysql
in my.cnf uncomment all wsrep* previously commented and change the wsrep_cluster_address

    # grep wsrep_cluster_address /etc/mysql/my.cnf |egrep -v '^    #|^#|^$'
    wsrep_cluster_address=gcomm://193.206.210.164,90.147.112.3,90.147.102.131

start mysql

4. On the "chosen" server, in my.cnf change:

    wsrep_cluster_address=gcomm://

back to

    wsrep_cluster_address=gcomm://193.206.210.164,90.147.112.3,90.147.102.131

and restart mysql

Now there should be directories for each DB, in particular for keystone:

    # ls -ltrh /var/lib/mysql/keystone/token.*
    -rw-rw---- 1 mysql mysql 8.6K May 18 08:00 /var/lib/mysql/keystone/token.frm
    -rw-rw---- 1 mysql mysql 596M May 18 22:26 /var/lib/mysql/keystone/token.ibd

Details of BUG #1195924

BUG # 1195924 | expiration check is using different timezone other than UTC

the token are generated with start and expiration hour expresed in UTC, but the client libraries do the verification using the local time (in Italy = UTC +2h) during 1 hour.
the patch was released for/in Icehouse - for which version we can think to leave the tokens duration of 1 h by default and configure the cron to run every hour.