BlitzDB Primary Key Based Insertion Performance
Like most things, I think storage engine development is about divide and conquer. The first sub-problem that I’m tackling with BlitzDB is squeezing as much juice out as possible from Tokyo Cabinet to achieve fast write performance. This by the way happens to be the primary reason that I wrote skyload.
Writing skyload turned out to be worthwhile since it helped me find several critical bugs in the engine that only occurred under concurrent insertion load. Thanks to Kazuho Oku for helping me through the issues that I was facing.
I think I’ve now reached a stage where I can share how well BlitzDB can perform insertion from concurrent connections. But before moving ahead, I’d like to emphasize that for a real guideline, I believe that performance comparison should be done by an unbiased third party. So please don’t take the results in this post as the “truth”. Heh, I did write both the storage engine and the load emulator after all :)
So, with the above in mind, here’s a skyload result on inserting one-hundred-thousand rows under different concurrency levels with BlitzDB and MyISAM (both engines under default configuration).
Figures presented above are calculated from an average of 5 runs per each concurrency level. Admittedly, an average from 5 runs is not sufficient to claim credibility of my result since the figures can easily be affected by the dirty buffer flush between the kernel and the filesystem (ext3 in this particular benchmark). For this, I plan on extending skyload to run multiple runs of an identical test and compute the median and average.
For those that are interested, this is what the test table looks like:
CREATE TABLE t1 ( id int PRIMARY KEY, col1 int, col2 double, col3 varchar(255) ) ENGINE=blitz;
I didn’t bother benchmarking anything beyond 32 connections since I ran both the client and the server on the same quad core machine (there’s no point). This is probably why you can see a nice curve up to four concurrent connections with BlitzDB in the graph. Yet another reason why you should not believe everything I’ve provided in this entry.
BlitzDB needs you
BlitzDB is still very early in it’s making and I still have insane amount of work to do. For example, BlitzDB currently requires you to supply a primary key on your table. I plan on removing this limitation by generating a “fake” primary key internally but I still haven’t got around to it at this point.
Support for multiple indexes is not done yet despite having all the necessary components to achieve it. I could do all this on my own but I prefer not to. I’m totally open for ideas and contributors. If you’re interested in this storage engine project, please don’t hesitate to ping me (dev @ this domain) or the Drizzle community. More eyeballs the merrier :)

