We have a process at Dyn which scales nicely by doing application level partitioning, but the result is a database of well over 100,000 tables. Six figures of tables is probably more than “many” for most MySQL users and to add to total number of artifacts, the tables are also partitioned.
MyISAM makes a separate file for each partition and with the table definition files, the MySQL process required access to well over 1,000,000 files in the database directory. For a process that created tables and partitions on the fly and executed inserts from parallel processes, MyISAM was our performance choice for a long time.
Eventually however, the process nearly halted. Processes averaging over 1000 Queries Per Second (QPS) were brought to their knees, peaking in double digits. What had happened? What threshold had triggered contention? What was the contention?!? The following demonstrates an interesting limit on the MyISAM engine and hopefully provides some lessons to leverage for your own MySQL installations. Read on, MySQL followers!
The application issued dynamic statements to create tables and partitions when needed. In the case of an initial table and partition creation, life was good and data flowed quickly. Once in a while, a job came through to archive data and retire partitions that were no longer within our data retention policy. The application also altered tables with new partitions during data processing.
This activity requires file handles, which impose a lock on global file operations. For a time this was fine, but a quick review of the graph contrasting the growing size of partitions (via file handles) against the time to process data illustrates how those file handles were starting to slow down the process.
Slowing down the process is one thing, but eventually a threshold was reached and the file handle cache was thrashed as more and more requests queued for files. The time to process became run away early in week 26 and it wasn’t until later in the week when we were able to catch back up.
A switch to InnoDB proved to be the solution.
This is interesting because without a particular need for row locking mechanisms, initial testing proved MyISAM as the speedier choice. However, there came a point when the combination of alter statements and so many tables stopped processing from contention for file handles and the locks to modify them. Switching over to InnoDB reduced the number of required file handles by several orders of magnitude and increasing the open table cache pushed throughput back up to well over several thousand queries per second. Testing only at much higher orders of magnitude proved InnoDB to be the more scalable solution.
The lesson is that the classic volatility of partitions and scale can mean sudden behavior you did not expect.
Test and test at scale! Also from this case, be especially aware of the difference in locking behavior between typical data manipulation operations and data definition operations. Dynamic modification of database artifacts fundamentally change the nature of your MySQL instance, make sure your expectations match.
I hope these lessons are helpful. Here’s to engineering excellence and happy scaling!