MySQL: adding an index on a big-ass table

Okay, I'll admit it: SQL and big data things aren't at the top of my skillset.

No earlier than today, I resolved an issue we had in our customer databases: adding a missing index.
Seems easy at first but if the table is HUGE (yea, 8GB+ is big for me) with about 246MM rows (!!), it gets damn slow and the downtime doesn't last for an eye blink.

Reminder: ALTER TABLE allows read operations while altering the table then blocks reads and writes ops when the alteration is done and needs to go live. This may vary according to the configuration but the point is: altering a table has a downtime impact. RTFM.

Altering a ~246MM rows table takes time, unless you sold your soul to the devil to own a kickin'-ass database cluster that is able to stand any kind load.

Besides, in this table, most of the rows were junk rows that needed to be cleared but even performing a massive deletion would take hours (this time per customer, yey!).

With some help, we came up with this first idea:

Create a temporary table
Fill it with rows we want to keep from the big-ass table
Empty the big-ass table
Add the index to the big-ass table
Refill the indexed big-ass table with the temporary table rows
Drop the temporary table

I wrapped the whole thing within a transaction and tested it. Of course, it failed (because of a syntax error). As a result and also because the TRUNCATE is not an atomic operation, I was good to reimport the whole database *table-flip*.

Let's go for another 20min of database import... *cereal-eating*
This left me plenty time to research how to do this correctly.

Since TRUNCATE is not atomic and RENAME is, I tweaked our first idea to this:

Create a big-ass-2 table from current big-ass-1 table schema
Add the index to big-ass-2 table
Fill the big-ass-2 table with rows we want to keep from big-ass-1 table
Rename big-ass-1 table into big-ass-3 table
Rename big-ass-2 table into big-ass-1 table
Drop big-ass-3 table

I could test that the transaction would successfully rollback (since my SQL syntax isn't perfect) and surprisingly, when successful, the whole operation took about 2 seconds on my MacBook Pro (to handle ~246MM rows).

Impressive!

If you are interested into the SQL code:

DROP TABLE IF EXISTS `foo_new`;
CREATE TABLE `foo_new` LIKE `foo`;
ALTER TABLE `foo_new` ADD INDEX `indexName` (`fieldName`);
INSERT INTO `foo_new`
SELECT `foo`.*
FROM `foo`
INNER JOIN `bar` ON `bar`.`pkField` = `foo`.`fkField`
AND `bar`.`stringField` NOT LIKE "pattern%";
RENAME TABLE `foo` TO `foo_old`, `foo_new` TO `foo`;
DROP TABLE `foo_old`;

Thanks for reading and don't hesitate to share.

rand(Ø)

> beautiful code & photos

MySQL: adding an index on a big-ass table

Joris Berthelot