Archive

Posts Tagged ‘performance’

BlitzDB Concurrent Testing and Write Performance

May 12th, 2010

Last month while being at the MySQL Conference, several people asked me about the status of BlitzDB. Specifically, they were interested in when I’ll release BlitzDB. Fair enough – I’ve been working on this project long enough for people to start questioning this.

The answer is, BlitzDB is done in terms of implementing the design. Right now it’s about finding bugs, fixing it and testing BlitzDB’s stability under concurrent load. Thanks to the motivation boost I gained at the conference, I’ve now fixed the bugs that were slowing me down and I’m gradually adding more tests into BlitzDB’s test suite. I consider BlitzDB’s initial release to be the day it gets merged into Drizzle’s trunk. This is almost ready as BlitzDB seems to be building fine on Drizzle’s Build Farm infrastructure. However, I won’t move to the next step until I’m satisfied with BlitzDB’s stability.

Yesterday I spent some time doing some concurrency testing on BlitzDB’s INSERT code with skyload. Needless to say, concurrency testing is also a convenient way to look at the performance of a particular component. So, I decided to publish my findings from this test. First, here is the background of the test.

Purpose of the Test

  • Test BlitzDB’s slot-lock mechanism.
  • Confirm that BlitzDB will not crash under concurrent INSERT workload.
  • Confirm that key insertion to the index is working as expected.
  • Confirm that writes to multiple indexes work as expected.
  • Observe the write-performance impact of adding an index.

Two commodity boxes were used. One dedicated for the client and the other dedicated for the server (Drizzle + BlitzDB). Both boxes has the same spec: Intel Quad Xeon E5345 (2×4MB L2 cache), 8GB Memory, 500GB SATA II, gigabit NIC. Servers were connected by a gigabit switch. File system on the server was ext3.

By default, a BlitzDB table is optimized for up to 1 million rows. Therefore this test inserted 1 million rows to a table with different concurrency levels. A different concurrency level is used per run. The table used in this test only contains three integer columns. Tests are performed up to three indexes. The linux kenel’s dirty buffer is flushed before each test run. Tests were run until the performance curve flattened.

Result

BlitzDB Table Insertion - Multi Index

As seen above, scalability from 1 thread to 4 thread showed an ideal curve. This is expected since the server is a 4 core box. From 4 threads, performance showed some improvements up to 12 threads. From there on, concurrency greatly exceeds the number of physical cores so we can’t observe decent performance growth. The highest insert QPS gained in this test was just over 86,000 QPS. With more cores on the server and more clients, I suspect BlitzDB can hit over 100k QPS.

Although this graph looks good at first sight, I’m not happy with it. The performance penalty for adding multiple indexes should be greater than what’s observed in this result. This is because TC’s B+Tree is internally protected by a single lock on writes. I suspect that the performance penalty is not observed in this graph because I didn’t give BlitzDB enough load to make TC work hard. This implies that a bottleneck could exist elsewhere (Network, Drizzle or BlitzDB’s handler level code).

However, I’m glad that BlitzDB stood stable on this concurrency test which was what I wanted to test in the first place. Admittedly I need to mix several types of queries to properly test BlitzDB’s stability. I plan on doing this next with sysbench and hopefully RQG.

Once this is done, I’ll submit a merge proposal to the Drizzle Project :)

Future Development Plans

  • Find bugs, Fix bugs, Repeat.
  • Write an inbuilt auto recovery routine.
  • Eventually add a crash safe option to BlitzDB.

Toru Maesaka drizzle, oss , ,

Fascinating libdrizzle benchmark results

April 2nd, 2009

Spreading the word about Jay’s awesome findings on the libdrizzle benchmark against the original library inherited from MySQL. For those that aren’t familiar with libdrizzle, it is a fresh new (modern implementation) MySQL compatible client library for Drizzle that leverages asynchronous I/O and smarter memory usage founded by Eric Day.

You can read how this library came to life in this thread:

As you can see in Jay’s findings with sysbench, libdrizzle outperforms the original library in all concurrency levels by a rather significant figure (e.g. 41.16% performance increase at only two threads). If you’re interested in gaining more performance from Drizzle or MySQL in the future, you should really start looking into this library now.

This was the first blog entry I read this morning and hey, it really kick started my day. Eric you rock! and thanks to Jay for sharing his findings.

Toru Maesaka drizzle, oss , , ,

Rethinking the Query Cache for Drizzle

October 10th, 2008

There is a mutual understanding in the Drizzle community that the MySQL query cache works well for a small database but isn’t sufficient for relatively large scale usages. Does your application involve a lot of database updates? if so, you’ll probably face fragmentation issues in the query cache (though using the query cache isn’t suitable for use cases like this).

Caching is the key ingredient in boosting the performance of any software that requires significant amount of computation, hence it is something that can’t be overlooked. So how can we improve Drizzle?

The idea is to create a pluggable query cache subsystem that can work in a large scale environment. Drizzle, being a micro-kernel DBMS, it makes sense to make the cache component pluggable and let the DBA choose the caching solution of their choice. This is exactly what I’m working on at the moment and my first plugin will allow Drizzle to use memcached as its query cache.

For example, a DBA could hook up their memcached pool to Drizzle and use several gigabytes of fast cache space to cache their results.

Things to consider

  • Does the DBA really want to cache results?
  • Does the result construction take long enough to care?
  • Do we want to specify a specific SQL statement to always cache?
  • Do we want to enforce a certain table to be cached?
  • Transactional Engines

If we can satisfy the above points and achieve modularity, I think its a total win. For those that like diagrams, here is the architecture that is on my mind at the moment:

 Drizzle Query Cache Plugin Example

Benefits of using memcached

memcached is proven to work and help scale web applications in a cost effective fashion by various players in the web industry. It is also fast. The time complexity of fetching a cached result from memcached is O(1), which is an order we all love. Furthermore, by using memcached, the fragmentation issue disappears since this is a problem that the memcached community had to face in the past and successfully overcame by developing the slab subsystem.

Want to scale? with consistent hashing enabled, you can greatly reduce the number of cache misses from adding/removing a node from a live pool. Got spare boxes lying around? hook them up and powerup Drizzle! Need support? both memcached and Drizzle community members are heartwarming people.

Other Solutions work Too!

The beauty of modularity is that you can create and use your own solution for your unique requirements. For example lets assume that there is a webshop that wants to keep the number of physical servers down (e.g. limited monetary/space resource).

To satisfy the requirement stated above, you could cache to a fantastically fast hash database, such as Tokyo Cabinet (much, much faster than BDB). If you haven’t heard of it, you should look at the incredible benchmark comparison). So, what I really wanted to say is that the microkernel property of Drizzle will open up a lot of new possibilities for your application and help you tackle the new requirements that seem to come out of no where.

Where from here?

Currently going through the UDF -> Plugin Architecture conversion done by Mark, and planning on basing the code on his logging plugin while its fantastically simple. My work will be done in:

  • lp:~tmaesaka/drizzle/pluggable-qcache

I’ll hopefully have something decent to show soon, and I will keep people updated on my blog, IRC and the Mailing List (drizzle-discuss).

So that is all I have to say for now… If you have any suggestions, please do enlighten me :)

Toru Maesaka drizzle, memcached, oss , ,