Archive

Archive for July, 2009

BlitzDB Primary Key Based Insertion Performance

July 17th, 2009

Like most things, I think storage engine development is about divide and conquer. The first sub-problem that I’m tackling with BlitzDB is squeezing as much juice out as possible from Tokyo Cabinet to achieve fast write performance. This by the way happens to be the primary reason that I wrote skyload.

Writing skyload turned out to be worthwhile since it helped me find several critical bugs in the engine that only occurred under concurrent insertion load. Thanks to Kazuho Oku for helping me through the issues that I was facing.

I think I’ve now reached a stage where I can share how well BlitzDB can perform insertion from concurrent connections. But before moving ahead, I’d like to emphasize that for a real guideline, I believe that performance comparison should be done by an unbiased third party. So please don’t take the results in this post as the “truth”. Heh, I did write both the storage engine and the load emulator after all :)

So, with the above in mind, here’s a skyload result on inserting one-hundred-thousand rows under different concurrency levels with BlitzDB and MyISAM (both engines under default configuration).

Skyload Result - MyISAM and BlitzDB

Figures presented above are calculated from an average of 5 runs per each concurrency level. Admittedly, an average from 5 runs is not sufficient to claim credibility of my result since the figures can easily be affected by the dirty buffer flush between the kernel and the filesystem (ext3 in this particular benchmark). For this, I plan on extending skyload to run multiple runs of an identical test and compute the median and average.

For those that are interested, this is what the test table looks like:

CREATE TABLE t1 (
    id int PRIMARY KEY,
    col1 int,
    col2 double,
    col3 varchar(255)
) ENGINE=blitz;

I didn’t bother benchmarking anything beyond 32 connections since I ran both the client and the server on the same quad core machine (there’s no point). This is probably why you can see a nice curve up to four concurrent connections with BlitzDB in the graph. Yet another reason why you should not believe everything I’ve provided in this entry.

BlitzDB needs you

BlitzDB is still very early in it’s making and I still have insane amount of work to do. For example, BlitzDB currently requires you to supply a primary key on your table. I plan on removing this limitation by generating a “fake” primary key internally but I still haven’t got around to it at this point.

Support for multiple indexes is not done yet despite having all the necessary components to achieve it. I could do all this on my own but I prefer not to. I’m totally open for ideas and contributors. If you’re interested in this storage engine project, please don’t hesitate to ping me (dev @ this domain) or the Drizzle community. More eyeballs the merrier :)

Toru Maesaka drizzle, oss , ,

memcached Binary Protocol is here for real

July 13th, 2009

I’m playing a little behind but I’d like to spread the word that memcached-1.4.0 has been released.

You can also find blog posts by other memcached developers floating around on the internet. This 1.4 release is quite special for me too since the binary protocol is something I’ve been eager to see out in the open for sometime now. I remember printing out the early binary protocol specification (the draft that was made before I joined the community) and reading it on the flight to SFO back in 2007.

Much has happened since then with many input from both developers and non-developers. Several commercial organizations from small to large has helped us evaluate the old experimental branch(es) that we’ve been working on at github. The 1.4 release is a result of community effort that I’m really proud to be part of.

I recommend all of you to update to 1.4 and also take a look at the latest version of your client library. See if it supports the binary protocol. If it doesn’t, you should bug the author about it ;)

Which reminds me, I need to ping Dormando and Hachi about the binary protocol patch that I wrote for Cache::Memcached last year…

Toru Maesaka memcached, oss ,

Notes on changes made to the Drizzle Storage Subsystem

July 9th, 2009

Yesterday I merged the BlitzDB tree with Drizzle‘s trunk for the first time in a long time (yeah…) and discovered some interesting changes made to the storage subsystem while I was away.

Previously all functions that caused an action to the storage engine was a member of the handler class but various things like table creation and transaction related functions have now moved to the StorageEngine class. These changes are somewhat drastic but makes good sense for Drizzle to grow further since it makes the subsystem easier to understand and frees Drizzle from the interface design that was strongly affected by MyISAM. For those that are interested, the StorageEngine class is located in “drizzled/plugin/storage_engine.h”.

For me it was pretty easy to update BlitzDB to work with the new subsystem since I don’t have anything special in the engine that required me to use my brain. I only had to move bas_ext(), table creation and rename functions over to the StorageEngine class and adjust it to the new interface:

int createTableImpl(Session *session, const char *table_name, 
                    Table *table_arg, HA_CREATE_INFO *ha_create_info); 
 
int renameTableImpl(Session *session, const char *from, const char *to);

For a real example, I recommend comparing the old InnobaseEngine class declaration with the updated one. As for where this redesign is going, this is the answer I got on the Drizzle channel from Stewart who did the actual work for all this.

stewart: tmaesaka: the basic idea is that handler becomes a cursor. the StorageEngine is for actions on the engine.
stewart: tmaesaka: and handler is a cursor on a table.

Something to keep in mind if you’re thinking about creating or porting a storage engine to Drizzle :)

Toru Maesaka drizzle, oss , ,

Introducing skyload: a libdrizzle based load emulator

July 7th, 2009

Today, I would like to introduce “skyload“, a small project that I’ve been working on for the last couple of weeks. In brief, skyload is a libdrizzle based load emulation tool that is capable of running concurrent load tests against database instances that can speak Drizzle (and/or) the MySQL protocol.

Something I’d like to emphasize here is that, skyload is not a replacement for mysqlslap or drizzleslap since it only provides a subset of what they can do. As I’ve stated on the project description, skyload is designed to do a good job at this subset of tasks by giving you more control over how you emulate the load in an intuitive way. For instructions on installing skyload and quickly getting up to speed, take a look at the following URL:

As you will see, the first release only contains bare minimum specifications (only INSERT load emulation). The next step I want to take is to discuss features that other storage engine developers would actually find useful. This is because I started writing skyload for primarily myself and other storage engine developers (more on this next).

Original Intentions

I originally began writing skyload for BlitzDB development since I wanted to see the concurrent insertion performance of Tokyo Cabinet based row storage mechanism that I wrote. I first tried benchmarking the write performance with drizzleslap but it turned out that drizzleslap’s original code (inherited from MySQL 6.0) is rather buggy and segfaulted quite easily (I’m planning on contributing a fix for this).

So I gave up on drizzleslap for the time being and started looking at the sysbench port for Drizzle that Monty Taylor has been working on:

Sysbench for Drizzle is a lovely piece of software but it couldn’t quite provide what I was looking for (concurrent insertion benchmark). After having a quick conversation with Monty about my requirements on the Drizzle IRC channel, I decided to write a libdrizzle based benchmark tool that can be used for both Drizzle and MySQL.

Future Plans

I don’t want to reinvent existing software that works (or those that can be fixed). The project positioning that I’m hoping for skyload is a good mix between (mysql|drizzle)slap and sysbench. Hopefully it will be useful to folks that works on Drizzle and MySQL related projects.

I’m totally open for ideas, patches, and contributors. If this project had caught your attention, please don’t hesitate to ping me or the Drizzle community :)

I haven’t setup a mailing list since I don’t see the need for it yet so if you’d like to share your thoughts I think either the Drizzle mailing list or IRC (#drizzle @ irc.freenode.net) is the quickest way for me to get back to you.

Happy Hacking!

Toru Maesaka drizzle, oss , , , , ,