Archive

Posts Tagged ‘drizzle’

Extending CREATE TABLE Syntax in Drizzle

July 21st, 2010

The flexibility to add table-specific options for things like compression, encryption and optimization can be useful to storage engine developers as this flexibility can open up new possibilities. Here’s what I’m talking about:

CREATE TABLE t1 (
  ...
) ENGINE = my_engine, MY_OPTION = your_arg;

Supporting this is relatively easy in Drizzle and this API feature (and a bit more) is available in MariaDB as well. Unfortunately Drizzle’s method to do this isn’t documented in the Wiki yet but it should be added when our Storage Engine API becomes stable (as in, no interface changes).

Implement StorageEngine::doValidateTableOptions()

Here’s the actual interface.

bool StorageEngine::doValidateTableOptions(const std::string &key,
                                           const std::string &state);

This function is called for each table options given at CREATE TABLE syntax execution. The first argument, key is a const reference to a string that represents the option name. The second argument, state represents the argument given for that option.

Therefore, given: COMPRESSION = YES_PLEASE, key would be “COMPRESSION” and state would be “YES_PLEASE”. The objective of this function is to check whether the key/state pair makes sense to your storage engine. If this function returns false, Drizzle will return an error for the CREATE TABLE query. Personally I think this interface can be improved to be a bit more Developer friendly, such as making life easier to validate numeric values without enforcing the developer to play around with the data. Saying that, given the pace that Drizzle is growing, this could be improved before we know it.

Access Options at StorageEngine::doCreateTable()

Here’s the actual interface for doCreateTable().

int doCreateTable(drizzled::Session &session,
                  drizzled::Table &table_arg,
                  const drizzled::TableIdentifier &identifier,
                  drizzled::message::Table &table_proto);

Given that the options were successfully validated, doCreateTable() is called next. In Drizzle, all information regarding a table (including options) is represented in a Google Protocol Buffer message. A reference to that message object is passed to doCreateTable() as the fourth argument so all you need to do is loop through the options list in the message object and extract what you need. Here’s a minimal example that only takes care of one option.

int n_options = table_proto.engine().options_size();
 
for (int i = 0; i < n_options; i++) {
  if (table_proto.engine().options(i).name() == "my_option_name") {
    // Do whatever you like with this stream.
    std::istringstream stream(table_proto.engine().options(i).state());
  }
}

The above example should be simple to extend to handle multiple options. What’s really important in the above example is that the option name can be accessed with the name() accessor and the state (value) of that option with the state() accessor.

So, that’s all I have to cover for now. I hope this feature will help storage engine developers create and provide useful table specific features for their engine.

Happy Hacking.

Toru Maesaka drizzle, oss ,

BlitzDB is now in Drizzle’s Trunk Repository

June 21st, 2010

Happy to announce that BlitzDB has been merged with Drizzle’s Trunk.

As much as I’m excited, it’s time to come back to reality. This merge is merely a beginning. There is much more work that needs to be done to BlitzDB such as ensuring stability by adding more tests, find bugs, and eliminate them. I’m hoping that the likelihood of bugs being found will increase due to this merge. Admittedly, I want to hack on fancy (yet important) things like auto recovery but I’m going to resist doing this until I’m truly satisfied with the quality of BlitzDB. My plan is to have BlitzDB rock solid by Drizzle’s Beta release.

The review process to get BlitzDB into Drizzle was straight forward and smooth. This is mostly due to the fact that the community was very supportive about testing. Folks like Stewart Smith and Patrick Crews from Rackspace pointed out several bugs that I would not have found myself. I’m certainly lucky to have a supportive professional QA engineer (looking at you Patrick) to test out and give punishment to BlitzDB.

All I’ll be doing on BlitzDB for the next couple of weeks is debugging and refactoring to improve readability. What I need more of at the moment is test cases on JOINs that are likely to be used in practice. If you have a good test case, I would greatly appreciate it!

Toru Maesaka drizzle, oss ,

BlitzDB Concurrent Testing and Write Performance

May 12th, 2010

Last month while being at the MySQL Conference, several people asked me about the status of BlitzDB. Specifically, they were interested in when I’ll release BlitzDB. Fair enough – I’ve been working on this project long enough for people to start questioning this.

The answer is, BlitzDB is done in terms of implementing the design. Right now it’s about finding bugs, fixing it and testing BlitzDB’s stability under concurrent load. Thanks to the motivation boost I gained at the conference, I’ve now fixed the bugs that were slowing me down and I’m gradually adding more tests into BlitzDB’s test suite. I consider BlitzDB’s initial release to be the day it gets merged into Drizzle’s trunk. This is almost ready as BlitzDB seems to be building fine on Drizzle’s Build Farm infrastructure. However, I won’t move to the next step until I’m satisfied with BlitzDB’s stability.

Yesterday I spent some time doing some concurrency testing on BlitzDB’s INSERT code with skyload. Needless to say, concurrency testing is also a convenient way to look at the performance of a particular component. So, I decided to publish my findings from this test. First, here is the background of the test.

Purpose of the Test

  • Test BlitzDB’s slot-lock mechanism.
  • Confirm that BlitzDB will not crash under concurrent INSERT workload.
  • Confirm that key insertion to the index is working as expected.
  • Confirm that writes to multiple indexes work as expected.
  • Observe the write-performance impact of adding an index.

Two commodity boxes were used. One dedicated for the client and the other dedicated for the server (Drizzle + BlitzDB). Both boxes has the same spec: Intel Quad Xeon E5345 (2×4MB L2 cache), 8GB Memory, 500GB SATA II, gigabit NIC. Servers were connected by a gigabit switch. File system on the server was ext3.

By default, a BlitzDB table is optimized for up to 1 million rows. Therefore this test inserted 1 million rows to a table with different concurrency levels. A different concurrency level is used per run. The table used in this test only contains three integer columns. Tests are performed up to three indexes. The linux kenel’s dirty buffer is flushed before each test run. Tests were run until the performance curve flattened.

Result

BlitzDB Table Insertion - Multi Index

As seen above, scalability from 1 thread to 4 thread showed an ideal curve. This is expected since the server is a 4 core box. From 4 threads, performance showed some improvements up to 12 threads. From there on, concurrency greatly exceeds the number of physical cores so we can’t observe decent performance growth. The highest insert QPS gained in this test was just over 86,000 QPS. With more cores on the server and more clients, I suspect BlitzDB can hit over 100k QPS.

Although this graph looks good at first sight, I’m not happy with it. The performance penalty for adding multiple indexes should be greater than what’s observed in this result. This is because TC’s B+Tree is internally protected by a single lock on writes. I suspect that the performance penalty is not observed in this graph because I didn’t give BlitzDB enough load to make TC work hard. This implies that a bottleneck could exist elsewhere (Network, Drizzle or BlitzDB’s handler level code).

However, I’m glad that BlitzDB stood stable on this concurrency test which was what I wanted to test in the first place. Admittedly I need to mix several types of queries to properly test BlitzDB’s stability. I plan on doing this next with sysbench and hopefully RQG.

Once this is done, I’ll submit a merge proposal to the Drizzle Project :)

Future Development Plans

  • Find bugs, Fix bugs, Repeat.
  • Write an inbuilt auto recovery routine.
  • Eventually add a crash safe option to BlitzDB.

Toru Maesaka drizzle, oss , ,

Testing BlitzDB on Drizzle’s Build Farm

May 6th, 2010

One of many important things that the Drizzle project takes seriously is for the project sourcecode to successfully build in all our target platforms AND pass tests in them. This is not really specific to Drizzle as most open source projects would have the same policy. For example we do the same thing in memcached thanks to Dustin Sailing’s buildbot kungfu.

Yesterday, Monty Taylor gave me access to Drizzle’s Build Farm Infrastructure so that I could test BlitzDB on various Linux distributions and FreeBSD. Unfortunately most build machines didn’t have Tokyo Cabinet installed so I could only test builds on Ubuntu and Debian. Fortunately the build went fine on those platforms though this was predictable since Ubuntu is my primary development platform. What was disturbing was getting test errors on my index test suite. I guess it’s time to put my thinking cap on and see what the problem is there.

This is a big leap towards getting BlitzDB in Drizzle’s trunk which I’m steadily working towards. I also want to benchmark BlitzDB at it’s current state with sysbench‘s OLTP tests. This is still low in my priority queue but hopefully I’ll do it in the next couple of months.

Toru Maesaka drizzle, oss , ,

Drizzle Google Summer of Code Projects

April 27th, 2010

This morning while being half a sleep, I was delighted to see an announcement email for Drizzle’s Google Summer of Code projects in my inbox. Congratulations to not only those that are taking part in GSoC via Drizzle but all of you participating in GSoC this year. Here’s the actual announcement email that contains the list of Drizzle projects that will take place this year.

This year I’m mentoring Djellel Eddine Difallah on “A Memcached Query Cache Plugin for Drizzle“. This happens to be a project I abandoned a long time ago so I was happy to see someone digging it up and seeking interest in it.

I’m excited to work with Djellel over the summer. Looking forward to having lots of fun with all the technical challenges and most importantly hacking under the open community environment.

Toru Maesaka drizzle, oss ,

DATE type under the hood in Drizzle/MySQL

March 1st, 2010

Learned something new from my own bug in BlitzDB today. The problem was that writing a DATE column index would always return a duplicate key error (regardless of what I feed it). There are two suspicious candidates that can cause this.

  • Comparison Function has a defect.
  • Key Generator has a defect.

The latter suspect was going to be tricky if it was true since BlitzDB currently uses Drizzle’s native “field packer” (except for VARCHAR) inherited from MySQL. This would mean that Drizzle’s field system has a bug in it which was somewhat difficult to believe. Furthermore, you should always blame yourself before you start suspecting other people’s code. So, I decided to look into the comparison function which was completely written by me. Turned out that’s where the bug was.

Comparison Function

Allow me to quickly clarify what I mean by “comparison function” in this context. TC’s B+Tree API has an interface that allows you to provide your own comparison function for all operations that involves traversing.

bool tcbdbsetcmpfunc(TCBDB *bdb, TCCMP cmp, void *cmpop);

What BlitzDB’s comparison function callback does is, it looks at the data type of the values to be compared and performs appropriate processing on the values then compares them. You can also look at it as a long switch statement. For those that are interested, this code is in blitzcmp.cc (blitz_keycmp_cb).

DATE under the hood

After inspecting the “type number” with GDB and looking at the corresponding ha_base_keytype enum, it turns out that the DATE type is internally represented as an unsigned 3 byte integer (HA_KEYTYPE_UINT24). This was pleasant to discover since I’ve been wondering what a 3 byte integer is still used for in Drizzle. The problem I had was that I didn’t take this type into account in the comparator and it also showed how silly I am since the answer was always there.

Now, the question is should it be kept this way? Respect alignment or reduce total I/O and space by keeping it this way? This should hopefully be a fun discussion to have in the Drizzle community :)

P.S. My two cents is that it should respect alignment since folks that seek performance should have most of their data on memory. Respecting alignment in this environment should make some difference. Although, I can only say this after benchmarking it of course.

Toru Maesaka drizzle, oss , ,

Speaking at the MySQL Conference 2010

February 18th, 2010

I’m a little behind in announcing this but I’m going to be speaking at O’Reilly’s MySQL Conference this year. My presentation is a three hour tutorial titled, Drizzle Storage Engine Development. Practical Example with BlitzDB. Three hours is a long time but I assure you that there will be a break.

This session isn’t solely about going through Drizzle’s Storage Engine API. Various performance topics like B+Tree structure, memory handling and concurrency control will be covered. I will also go through BlitzDB’s design concept and it’s internal stuff. So, needless to say I’ll talk a lot about Tokyo Cabinet and it’s internals as well.

Hopefully those that come along will walk out of the tutorial standing far ahead of the start line. It will help you get started on reading the implementation of other storage engines in the MySQL ecosystem (MyISAM, InnoDB, PBXT, Federated and so forth). Better yet you will start writing one.

Looking forward to seeing you there :)

Toru Maesaka drizzle, event, oss, travel , , ,

Progress on BlitzDB’s Index Component

February 18th, 2010

I recently gained some decent momentum on developing the indexing component of BlitzDB. Most of my time spent on BlitzDB for the last couple of weeks have been studying the indexing API and digging into how other engines have implemented it. I even referred back to MySQL 4.x to see how the BDB engine pulls off the Indexing API.

The actual coding wasn’t too bad thanks to Tokyo Cabinet’s awesome B+Tree API. I’ve been busier adding new tests and fixing silly bugs as they arise. I also implemented the Primary Key optimization that I blogged about a while back. As a result of all this, the following goodness has been added to BlitzDB’s Trunk.

  • Index Lookup
  • Forward Index Scan
  • Reverse Index Scan

This means that BlitzDB is now equipped with both a Table Scanner and an Index Scanner which are two essential components for a general purpose storage engine. As much as I’d like to work on optimizing the code and adding features (like recovery), I’m going to take a break and spend the rest of the month working on testing and debugging. There’s no point in adding features if the base has notable flaws in it.

Challenges Encountered

Writing the Index Scanner itself is easy. The most difficult thing that slowed me down was developing the comparison function for index keys. The end result was a simple piece of code but I had to study various things before I could start writing any code.

  • How to respect collation
  • How keys are represented internally
  • How types are represented internally
  • How to write a custom comparison function for Tokyo Cabinet
  • … and so on

I’ve also started using Evernote to jot down my spontaneous ideas on optimizing BlitzDB. I’ve made these notes public and they will most likely be updated while I’m commuting on the train.

There are much more that I’d like to write about like how I intend on developing the table recovery routine without simply using TC’s recovery mechanism but I shall restrain myself for another day.

Toru Maesaka drizzle, oss , ,