Archive

Posts Tagged ‘oss’

Introducing skyload: a libdrizzle based load emulator

July 7th, 2009

Today, I would like to introduce “skyload“, a small project that I’ve been working on for the last couple of weeks. In brief, skyload is a libdrizzle based load emulation tool that is capable of running concurrent load tests against database instances that can speak Drizzle (and/or) the MySQL protocol.

Something I’d like to emphasize here is that, skyload is not a replacement for mysqlslap or drizzleslap since it only provides a subset of what they can do. As I’ve stated on the project description, skyload is designed to do a good job at this subset of tasks by giving you more control over how you emulate the load in an intuitive way. For instructions on installing skyload and quickly getting up to speed, take a look at the following URL:

As you will see, the first release only contains bare minimum specifications (only INSERT load emulation). The next step I want to take is to discuss features that other storage engine developers would actually find useful. This is because I started writing skyload for primarily myself and other storage engine developers (more on this next).

Original Intentions

I originally began writing skyload for BlitzDB development since I wanted to see the concurrent insertion performance of Tokyo Cabinet based row storage mechanism that I wrote. I first tried benchmarking the write performance with drizzleslap but it turned out that drizzleslap’s original code (inherited from MySQL 6.0) is rather buggy and segfaulted quite easily (I’m planning on contributing a fix for this).

So I gave up on drizzleslap for the time being and started looking at the sysbench port for Drizzle that Monty Taylor has been working on:

Sysbench for Drizzle is a lovely piece of software but it couldn’t quite provide what I was looking for (concurrent insertion benchmark). After having a quick conversation with Monty about my requirements on the Drizzle IRC channel, I decided to write a libdrizzle based benchmark tool that can be used for both Drizzle and MySQL.

Future Plans

I don’t want to reinvent existing software that works (or those that can be fixed). The project positioning that I’m hoping for skyload is a good mix between (mysql|drizzle)slap and sysbench. Hopefully it will be useful to folks that works on Drizzle and MySQL related projects.

I’m totally open for ideas, patches, and contributors. If this project had caught your attention, please don’t hesitate to ping me or the Drizzle community :)

I haven’t setup a mailing list since I don’t see the need for it yet so if you’d like to share your thoughts I think either the Drizzle mailing list or IRC (#drizzle @ irc.freenode.net) is the quickest way for me to get back to you.

Happy Hacking!

Toru Maesaka drizzle, oss , , , , ,

Some Progress in Drizzle’s Pluggable Query Cache

January 13th, 2009

I’m happy to report that I’ve made some progress in the Query Cache plugin interface that I’ve been working on for Drizzle. The first version of the Query Cache interface definition is now merged to Drizzle’s trunk.

Unfortunately though, you cannot actually write a query cache module for Drizzle yet. I still haven’t implanted the hooks for this interface to Drizzle’s codebase since this requires much more thinking and thorough testing (which is my next milestone for this mini project).

So in other words, I’ve only done the easy part heh. Do let me take this opportunity to introduce the interface that just went in though. Currently the API consists of five functions which looks like this:

 bool (*qcache_try_fetch_and_send)(Session *session, bool transactional);
 bool (*qcache_set)(Session *session, bool transactional);
 bool (*qcache_invalidate_table)(Session *session, bool transactional);
 bool (*qcache_invalidate_db)(Session *session, const char *db_name,
                              bool transactional);
 bool (*qcache_flush)(Session *session);

That’s it! all you need to do is implement the above functions to write a query cache module. Here’s a brief description of what the above functions are intended for (though the name is pretty self-explanatory):

qcache_try_fetch_and_send
If the record of interest is cached, transmit it back to the client.

qcache_set
Cache a given resultset.

qcache_invalidate_table
Invalidate (delete) every record in the cache that is related to the given table.

qcache_invalidate_db
Invalidate (delete) everything in the cache that is related to the given database.

qcache_flush
Invalidate (delete) everything in the cache.

In an ideal world, we would want to do row-based invalidation but this is extremely difficult to do in practice so I will not touch it for now (not sure if I ever will). Instead, I’m planning on taking advantage of primary keys (and possibly the slow query log) and see how that goes before attempting to touch the forbidden fruit… I’m going to keep things dead simple to begin with.

So there you go! a preview of work in progress for one of the many projects that are concerned with modularizing the Drizzle SQL Server. Stay tuned for updates :)

Toru Maesaka drizzle, oss ,

Drizzle’s String Library Diet

December 16th, 2008

Lately I’ve been spending most of my time with Drizzle working towards the Cirrus milestone. Specifically speaking, I’ve been slowly standardizing the codebase by throwing out lots of code in MySQL’s string library and replacing them with appropriate libc and C++ alternatives.

You see, back in the 80s MySQL had reinvented a lot of the string functionalities provided by libc for reasons that I do not know (because it was before my time). Turns out that most of the code is still in use today and I guess there was a good reason back in the day but nowadays this doesn’t seem to make much sense, since:

  • Despite the criticisms, glibc works darn well.
  • The priority of optimizing library functions is much higher for standard library developers than it is for you as an application developer.
  • Using the standard library also helps new Drizzle community developers understand the codebase much faster from seeing functions that they are already familiar with.

Arguably, being returned a pointer to the terminating NULL like most of MySQL functions makes string appending slightly easier but if you ask me, many people (including myself) are not comfortable with this and it makes the codebase look weird, IMHO. An example of this is having to rewind the pointer when passing the string to a third-party function.

Benefits gained from narrowing to UTF-8

Because UTF-8 is the prominent encoding in the areas that we are targeting (web and the cloud), currently Drizzle uses only UTF-8 for its internal representation. So needless to say, support for anything other than UTF-8 were thrown out from the library which helped reduce the size of the library greatly.

Interested in how much slimmer the Drizzle string library is compared to the original one in MySQL 5.1? To illustrate the difference, here are the results from counting the files and lines:

$ wc -l mysql-5.1.30/strings/*.c
...
96798 total
 
$ ll mysql-5.1.30/strings/ | wc -l
78
$ wc -l drizzle/mystrings/*.cc
...
24634 total
 
$ ll drizzle/mystrings/ | wc -l
31

AWESOME.

Toru Maesaka drizzle, oss ,

Open Source Conference in Malaysia

October 24th, 2008

So I’ve been bugging Colin Charles to invite me over to Malaysia for the last couple of months and what does he offer me? an opportunity to speak at a open source conference in Malaysia :)

FOSS.my is a two day event (9th & 10th next month) and as stated on their conference homepage, the aim of this conference is to cover technical aspects of various OSS projects without any business/sales intervention for those that follow open source technology in South East Asia (and other regions too of course). The cool thing is that after telling mixi about this event, they liked the idea so much they decided to sponsor the event immediately. I didn’t really expect this but hey, awesome.

At this conference, I will be doing two talks where one will go over how mixi uses various OSS technologies to power the largest social networking service in Japan. The other talk will cover how the memcached internals work and latest hot topics in the development community like the upcoming binary protocol.

I’ve never been to Malaysia before so I’m totally looking forward to this trip.

Toru Maesaka oss ,

Perl, Binary and Memcached

August 15th, 2008

The last few days I’ve been working on updating the binary protocol test in the latest memcached development branch to comply with the latest binary protocol specification. Prior to this update, the test client was sending an invalid request to the server, which as a consequence made the test hang and never finish.

In brief, this is the big difference:

Previously, CAS (compare and swap) value was treated as part of the extra header that is appended/serialized behind the request/response header. In the latest specification, CAS value is a required 8 byte field in the 24 byte request/response header (header size in the previous version was 16 bytes). Other than that, the rest were minor differences in the packet format of extra fields in certain commands. Easy work :)

Here is the actual diff:

http://github.com/tmaesaka/memcached/commit/67b4da9eb855ebe7695a197320232b8d25692f84

As you can see, the test suite currently used by memcached is Perl based. This was fortunate for me since Perl is the second language to C that I like. I also made the code style to be more “perl-like” by fixing the indents. Although heh, I can see a Perl programmer arguing that the use of if/else blocks in the test is not best practice.

You know, fixing the test was pretty meaningful to me since it had forced me to study the binary protocol specification, which I knew almost nothing about at the hackathon in Santa Clara, CA back in april. Hopefully I can make productive suggestions at the upcoming hackathon in Menlo Park, CA in october.

Toru Maesaka memcached, oss ,

Mac OS X, Ubuntu and Drizzle

July 30th, 2008

So admittedly, Mac OS X is currently not the most friendly platform to work with Drizzle, mostly due to library issues.

OS X has several weird hacks in it due to licensing issues (libreadline comes into mind first). Sure, MacPorts, Darwin Ports and etc could get around this problem but should this be necessary? Personally I dislike resorting to these solutions. Fortunately I’ve been doing all my Drizzle work with Ubuntu on a dedicated server so I’ve yet to come across any build related issues. However, it kind of sucks not to be able to take my Mac out to a cafe in the weekend and work there without connectivity.

So to make my life happier, I installed Ubuntu on my MacBook Pro (alongside OS X of course).

I came across few problems like corrupted partition table in the process of getting Ubuntu working but the following Ubuntu threads helped greatly:

General Instructions

Boot related problems when using Hardy Heron (Ubuntu 8.04)

You know, getting Ubuntu running on my Mac was entertaining since I was talking to Monty Taylor about his thoughts on how using a Mac is selling out yesterday. Now what does this make me now?

Happy Hacking :)

Toru Maesaka oss ,

Drizzle, out in the open

July 23rd, 2008

So I’ve been fortunate enough to participate in developing Drizzle, which is a microkernel fork of MySQL that you can read more about on Brian Aker’s blog post.

In brief we are getting rid of components that we find unnecessary in MySQL by default, and instead making them optional by refactoring the server to be modular, aka microkernel. Another words, we are trying to develop a lean, fast, simple and extensible RDBMS that would fit well in mid and large scale web applications.

How? well, take Query Cache for example. QC works well in a one-man database but it has very small (if not no) effect when we start thinking big, and especially in the web industry. So why bother keeping it? what would be better is if we could _optionally_ make Drizzle use a cluster of memcached for query caching, which would also allow many database instances to share a common cache. Same things can be said about many other components, such as ACL and Stored Procedures. This is exactly why we are moving to a microkernel architecture. If you want something special, you should be able to customize the server in a relatively easy fashion and satisfy your requirements, rather than having to refactor the server code yourself.

Indeed, not everyone needs a microkernel database, in fact I assume most people won’t. However, there are enough web developers and companies in the small portion of the pie that would love a microkernel database to solve the problems that they are facing today. This is exactly why we don’t consider Drizzle to be a MySQL replacement.

If you’d like more information, do check out our project page on Launchpad and browse through the mailing list archive. Drizzle development is done in a true open source fashion by using open resources and tools like Bazaar and Launchpad. This means that everyone is free to come up with improvement suggestions/patches and submit it to the drizzle community.

Drizzle has been very fun and I thank Brian for getting me involved in such a fun project :)

Btw, I wrote a blog post on Drizzle in Japanese on the Mixi engineering blog too.

Toru Maesaka drizzle, oss ,