MySQL Data Fragmentation - What, When and How March 11, 2011

MySQL tables, including MyISAM and InnoDB, two of the most common types, experience fragmentation as data is inserted and deleted randomly. Fragmentation can leave large holes in your table, blocks which must be read when scanning the table. Optimizing your table can therefore make full table scans and range scans more efficient.

Fragmentation - an example MySQL has quite a few different storage engines to store data in tables. Whenever MySQL deletes rows from your table, the space left behind is then empty. Over time with a lot of DELETEs, this space can grow larger than the used space in your table. When MySQL goes to scan that data, it scans to the high water mark of the table, that is the highest point at which data has been added. If new inserts occur, MySQL will try to use that space, but nevertheless gaps will persist.

This extra fragmented space can make reads against the table less efficient than they might otherwise be. Let's look at an example.

We'll create a database (sometimes called a schema) and a test table:

(root@localhost) [test]> create database frag_test;
Query OK, 1 row affected (0.03 sec)

(root@localhost) [test]> use frag_test;
Database changed

(root@localhost) [frag_test]> create table frag_test (c1 varchar(64));
Query OK, 0 rows affected (0.05 sec)

Next let's add some rows to the table:

(root@localhost) [frag_test]> insert into frag_test values ('this is row 1');
Query OK, 1 row affected (0.01 sec)

(root@localhost) [frag_test]> insert into frag_test values ('this is row 2');
Query OK, 1 row affected (0.00 sec)

(root@localhost) [frag_test]> insert into frag_test values ('this is row 3');
Query OK, 1 row affected (0.00 sec)

Now we'll check for fragmentation:

(root@localhost) [frag_test]> show table status from frag_test\G;
*************************** 1. row ***************************
           Name: frag_test
         Engine: MyISAM
        Version: 10
     Row_format: Dynamic
           Rows: 3
 Avg_row_length: 20
    Data_length: 60
Max_data_length: 281474976710655
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2011-02-23 14:55:27
    Update_time: 2011-02-23 15:06:55
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

Now let's delete a row and check again:

(root@localhost) [frag_test]> delete from frag_test where c1 = 'this is row 2';
Query OK, 1 row affected (0.00 sec)

(root@localhost) [frag_test]> show table status from frag_test\G;
*************************** 1. row ***************************
           Name: frag_test
         Engine: MyISAM
        Version: 10
     Row_format: Dynamic
           Rows: 2
 Avg_row_length: 20
    Data_length: 60
Max_data_length: 281474976710655
   Index_length: 1024
      Data_free: 20
 Auto_increment: NULL
    Create_time: 2011-02-23 14:55:27
    Update_time: 2011-02-23 15:07:49
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

Notice the “data_free” column shows the space left by the second row that we deleted. Imagine you had 20,000 rows. They would take 400k bytes of space. Now if you deleted 19,999 rows, there are 20bytes of useful space in the table, but MySQL will still read 400k and data_free will show 39980.

Eliminating fragmentation Luckily MySQL comes with a simple way to fix this once you've identified it. It's called optimize table. Take a look:

(root@localhost) [frag_test]> optimize table frag_test;
+---------------------+----------+----------+----------+
| Table               | Op       | Msg_type | Msg_text |
+---------------------+----------+----------+----------+
| frag_test.frag_test | optimize | status   | OK       | 
+---------------------+----------+----------+----------+
1 row in set (0.00 sec)

(root@localhost) [frag_test]> show table status from frag_test\G;
*************************** 1. row ***************************
           Name: frag_test
         Engine: MyISAM
        Version: 10
     Row_format: Dynamic
           Rows: 2
 Avg_row_length: 20
    Data_length: 40
Max_data_length: 281474976710655
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2011-02-23 14:55:27
    Update_time: 2011-02-23 15:11:05
     Check_time: 2011-02-23 15:11:05
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

Performance considerations OPTIMIZE TABLE will lock the entire table while doing its work. For small tables this will work fine because the whole table can be read and written quickly. For very large tables, this can take a very long time and interrupt or reduce available to your application. What to do?

Luckily MySQL has a great feature called Master-Master replication. In this configuration, your backend database is actually two separate databases, one active and one passive. Both of the databases are identical in every way. To perform the operations online - including the OPTIMIZE TABLE operation, simply run them on your PASSIVE database. This will not interrupt your application one iota. Once the operation has completed, switch roles so that your application is pointing to your secondary database and make it active. Then make or original database the passive one.

Now that the roles are switched and the application is happily pointing at db2, perform the same OPTIMIZE TABLE operation on db1. It's now the passive database, so it won't interrupt the primary either.

Other commands Show all the fragmented tables in your database:

(root@localhost) [(none)]> select table_schema, table_name, data_free, engine from information_schema.tables where table_schema not in ('information_schema', 'mysql') and data_free > 0;
+--------------+-----------------------------+-----------+--------+
| table_schema | table_name                  | data_free | engine |
+--------------+-----------------------------+-----------+--------+
| aitc         | wp_comments                 |    346536 | MyISAM | 
| aitc         | wp_options                  |     64308 | MyISAM | 
| aitc         | wp_postmeta                 |       124 | MyISAM | 
| cactidb      | poller_item                 |       160 | MyISAM | 
| cactidb      | poller_output               |       384 | MyISAM | 
| drupal       | sessions                    |     30976 | MyISAM | 
| drupal       | users                       |        92 | MyISAM | 
| drupal       | variable                    |        20 | MyISAM | 
| gg           | wp_comments                 |       232 | MyISAM | 
| gg           | wp_options                  |       696 | MyISAM | 
| gg           | wp_postmeta                 |       560 | MyISAM | 
| ihi          | wp_comments                 |       536 | MyISAM | 
| ihi          | wp_options                  |       444 | MyISAM | 
| ihi          | wp_postmeta                 |       288 | MyISAM | 
| ihi          | wp_redirection_items        |      1292 | MyISAM | 
| ihi          | wp_redirection_logs         |    140352 | MyISAM | 
| nds          | wp_comments                 |      4704 | MyISAM | 
| nds          | wp_options                  |    150580 | MyISAM | 
| nds          | wp_postmeta                 |        76 | MyISAM | 
| oos          | wp_comments                 |    317124 | MyISAM | 
| oos          | wp_options                  |     88196 | MyISAM | 
| oos          | wp_postmeta                 |        76 | MyISAM | 
| phplist      | phplist_listuser            |       252 | MyISAM | 
| phplist      | phplist_sendprocess         |        52 | MyISAM | 
| phplist      | phplist_user_user           |     32248 | MyISAM | 
| phplist      | phplist_user_user_attribute |       120 | MyISAM | 
| phplist      | phplist_user_user_history   |       288 | MyISAM | 
| phplist      | phplist_usermessage         |      1428 | MyISAM | 
| pn_nds       | nuke_session_info           |     12916 | MyISAM | 
| psa          | exp_event                   |     10024 | MyISAM | 
| test         | active_sessions             |     30144 | MyISAM | 
+--------------+-----------------------------+-----------+--------+
31 rows in set (0.26 sec)

You can also cause a table to be defragmented if you change the storage engine. That's because MySQL has to read the entire table, and write it back to disk using the new storage engine, and in the process the rows and data gets compressed down more efficiently.

Here's what that looks like:

(root@localhost) [frag_test]> alter table frag_test engine = innodb;
Query OK, 2 rows affected (0.17 sec)
Records: 2  Duplicates: 0  Warnings: 0

(root@localhost) [frag_test]> show table status from frag_test
    -> \G;
*************************** 1. row ***************************
           Name: frag_test
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 2
 Avg_row_length: 8192
    Data_length: 16384
Max_data_length: 0
   Index_length: 0
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2011-02-23 15:41:12
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: InnoDB free: 7168 kB
1 row in set (0.00 sec)

Conclusion Fragmentation is something to keep an eye on but not necessarily to obsess over. If you see a few tables with a very large data_free value, it can be worthwhile to OPTIMIZE those, as read performance on the table will improve afterward.


Source: www.databasejournal.com/features/mysql/article.php/3927871