Archive

Archive for April, 2009

Happy fourth figure revision Drizzle!

April 30th, 2009

I just updated my Drizzle trunk and what’d you know? our revision has hit four figures.

Drizzle @ Revision 1001

This may not seem like a big deal but to me, it represents how much hard work, effort and thought by those that believe in the Drizzle project has gone into the tree. Of course, we’re not going to slow down and I suspect things are going to comfortably accelerate from here on.

Toru Maesaka drizzle, oss

Web Seminar on memcached 1.3

April 29th, 2009

As I’ve been tweeting the last couple of weeks, I’m going to do a web seminar on the next series of memcached (1.3) that is currently under beta release. I’m afraid there won’t be enough time to cover all the nuts and bolts but I will cover most of the great thoughts and engineering that went into the source tree by the community.

For Blog

Thank you for the opportunity Sun/MySQL and I’ll hopefully see you all there!

Toru Maesaka event, memcached, oss ,

Drizzle’s Storage Engine subsystem

April 28th, 2009

Today I was doing some work on writing a small storage engine for Drizzle that we’re hoping will perform good enough to replace the MEMORY/HEAP engine inherited from MySQL. For those that are interested, this engine is going to be based on Tokyo Cabinet’s on-memory b+tree database.

I was a little worried that this task is going to be heavy since I’ve never hacked on a relational database engine before (other than tracing through InnoDB) but it turns out it’s not so bad. I first looked up the engine interface for MySQL then looked at the engines in the Drizzle tree. The interface itself looks very similar but what’s really cool about Drizzle is how the storage engine is registered to the core server with the new plugin system. Here’s a snippet of how this looks with the Blackhole engine:

blackhole_engine= new BlackholeEngine(engine_name);
registry.add(blackhole_engine);

and the de-initialization:

registry.remove(blackhole_engine);
delete blackhole_engine;

If you’re familiar with how a Drizzle plugin is written, you will notice that this is exactly the same; You implement the interface and boom, you register it with the PluginRegistry object. So a storage engine is treated equally just like any other plugin system.

Needless to say, the storage engine declaration is exactly the same as what you would do with any Drizzle plugin:

drizzle_declare_plugin(blackhole)
{
  "BLACKHOLE",
  "1.0",
  "MySQL AB",
  "/dev/null storage engine (anything you write to it disappears)",
  PLUGIN_LICENSE_GPL,
  blackhole_init, /* Plugin Init */
  blackhole_fini, /* Plugin Deinit */
  NULL,           /* status variables */
  NULL,           /* system variables */
  NULL            /* config options */
}
drizzle_declare_plugin_end;

This is nice! a storage engine is just a plugin after all and Drizzle treats it as it ought to be. I will have some more time to hack on this after my memcached webinar so hopefully I’ll have more details posted about the progress on this work soon.

Toru Maesaka drizzle, oss , , ,

Fun times at MySQL UC and Drizzle Developer Day 09

April 26th, 2009

I’m writing this entry on my way back to Tokyo from Narita. So, I was in the US all week for MySQL UC, Percona Performance Conference and the Drizzle Developer Day. It was great to meet new people and also catch up with developer friends from all over the world. These events are great excuse to bring together folks that work together online and receive the free beers that were promised on IRC. Looking back, the week just flew! I can’t believe I’m back in Japan already.

What wasn’t pleasant however was Drizzle being introduced as “MySQL Drizzle” and described as MySQL’s technology incubator at the opening keynote. The truth is, Drizzle is a community driven project that is not affiliated with any commercial organization. The project is shepherded by the community and that is the whole point of our model. Jay and Baron has expressed this on their blogs too:

Guess there is no point in getting worked up about this so I will say no more.

Drizzle developer day turned out to be a great success with over 60 developers (including new developers) turning up. We covered wide variety of topics from “how to download bzr” for all levels of contributors which I thought was nice. Why? well this means that everyone has something to do and it creates a welcoming atmosphere. As a result, it makes the event active. Here are some photos I uploaded to flickr in the last few minutes I had in the US. I’ll get more up there asap.

Compared to these stimulating events and the discussions we had, Oracle’s acquisition tale that many people seemed to like talking about at the conference breakfast/lunch/boozing was uninteresting.

Toru Maesaka drizzle, travel , ,

Thoughts on tackling InnoDB’s auto increment

April 17th, 2009

As I mentioned in my previous entry, InnoDB has it’s own auto increment counter which it uses to generate the next value for the database kernel (as we call it in Drizzle). At Drizzle project, we came across a suspicion that InnoDB doesn’t increment it’s internal counter on row updates. So what can this mean to you as a database admin?

Well, consider this simple table:

CREATE TABLE t1 (
    a INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    val INT
) ENGINE=InnoDB;

and the following statements:

drizzle> INSERT INTO t1 (val) VALUES (1);
Query OK, 1 row affected (0.01 sec)
 
drizzle> UPDATE t1 SET a=4 WHERE a=1;
Query OK, 1 row affected (0.01 sec)

We should now have a table with one row where the primary key value is 4:

drizzle> SELECT * FROM t1;
+---+------+
| a | val  |
+---+------+
| 4 |    1 | 
+---+------+
1 row IN SET (0.00 sec)

So far so good! Now remember our hypothesis that InnoDB doesn’t temper with the internal counter on an update. This would mean that our next row should have a primary key value of 2 since the counter variable should still be at 1 from our initial INSERT statement:

drizzle> INSERT INTO t1 (val) VALUES (1);
Query OK, 1 row affected (0.01 sec)
 
drizzle> SELECT * FROM t1;
+---+------+
| a | val  |
+---+------+
| 2 |    1 | 
| 4 |    1 | 
+---+------+
2 rows IN SET (0.00 sec)
 
drizzle> SELECT LAST_INSERT_ID();
+------------------+
| LAST_INSERT_ID() |
+------------------+
|                2 | 
+------------------+
1 row IN SET (0.00 sec)

As expected, we have a new row with a primary key value of 2. But hey, what happens if we keep inserting into this table? specifically twice (assuming that you haven’t changed the auto-increment-increment system variable).

drizzle> INSERT INTO t1 (val) VALUES (1);
Query OK, 1 row affected (0.01 sec)
 
drizzle> INSERT INTO t1 (val) VALUES (1);
ERROR 1062 (23000): Duplicate entry '4' FOR KEY 'PRIMARY'

Yep, a duplicate key error! This isn’t really surprising since the increment was resumed from a value less than what we had updated the first value to. I’m not even sure if this sort of operation would be done in production but it shows how this behavior can be dangerous.

Something to note here is that, even on the INSERT error that we just experienced, InnoDB will update the internal count because get_auto_increment() was called when processing the query. So, this means that the next INSERT query will work fine since we’ve now passed the maximum value:

drizzle> INSERT INTO t1 (val) VALUES (1);
Query OK, 1 row affected (0.02 sec)
 
drizzle> SELECT * FROM t1;
+---+------+
| a | val  |
+---+------+
| 2 |    1 | 
| 3 |    1 | 
| 4 |    1 | 
| 5 |    1 | 
+---+------+
4 rows IN SET (0.00 sec)

Great! now we can hopefully keep incrementing happily.

I’ve confirmed the internal behavior of this example with GDB and also verified the same behavior in MySQL 5.1.30. Whether this behavior is inappropriate or not deserves a discussion of it’s own and you could of course, absorb this problem in the application layer but the important question here is, what is the right behavior for Drizzle?

Well, take a look at how MyISAM handles the same set of queries:

drizzle> INSERT INTO t2 (val) VALUES (1);
Query OK, 1 row affected (0.00 sec)
 
drizzle> UPDATE t2 SET a=4 WHERE a=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
 
drizzle> INSERT INTO t2 (val) VALUES (1);
Query OK, 1 row affected (0.00 sec)
 
drizzle> SELECT * FROM t2;
+---+------+
| a | val  |
+---+------+
| 4 |    1 | 
| 5 |    1 | 
+---+------+
2 rows IN SET (0.00 sec)
 
drizzle> SELECT last_insert_id();
+------------------+
| last_insert_id() |
+------------------+
|                5 | 
+------------------+
1 row IN SET (0.00 sec)

As you can see above, MyISAM will resume incrementing from the updated primary key value. To me this is the correct behavior that should be consistently provided to DBAs. So the next question is, how can we make InnoDB behave that way?

Ideas on tackling this issue

A simple solution would be to make InnoDB update the internal count on updates (e.g. ha_innobase::update_row) when it deals with with auto increment. This approach, however can end up being complicated (code wise) and worst of all, it can create a minor overhead to the update operation on InnoDB which may not be favorable for our target users.

So after briefly discussing this issue with Monty Taylor, seems the appropriate fix for this is to leave ha_innobase::update_row() as it is (don’t increment the internal count) and handle this on INSERT. Specifically speaking, on a row that doesn’t have a user specified auto increment value.

This means that we can recalculate the auto increment value on duplication/collision since the user is expecting an auto-generated discrete value. Needless to say, an error must be thrown if a collision occurs on a user-specified value.

So yeah, I’m flying to the US in couple of days so this should be something fun to hack on amongst other work related tasks on my boring flight :)

Toru Maesaka drizzle, oss , , ,

Understanding how auto increment works with InnoDB

April 14th, 2009

Lately I’ve been having lots of fun going through Drizzle and InnoDB‘s sourcecode to get a grasp of how auto increment is processed internally. I think I now have a fairly good grasp of what’s going on so I’m writing this entry as a note for myself. I’m also hoping that this will be helpful to those that are interested in this topic too.

So in MySQL and Drizzle, the storage engine (in this case InnoDB) is responsible for computing the auto increment value. Here’s an abbreviated execution path for a simple INSERT statement to a table with an auto increment column:

mysql_parse() -> mysql_execute_command() -> mysql_insert() ->
write_record() -> handler::ha_write_row() -> ha_innobase::write_row() ->
handler::update_auto_increment() -> ha_innobase::get_auto_increment()

As you may have noticed, “handler” is an abstract class so you can apply this execution path to other engines too. For example, if you look inside MyISAM there is a get_auto_increment() in there too.

InnoDB’s internal counter

InnoDB keeps track of it’s own auto increment count (an unsigned 64bit integer) called “autoinc” inside a structure that represents a database table (dict_table_struct). As you would expect, this is what InnoDB uses to compute the auto increment value that the server will use for further processing.

Diving in a little deeper

As you can see in the execution path, handler::update_auto_increment() function asks InnoDB’s get_auto_increment() function for an incremented value. Looking slightly deeper, you will see that this function:

innobase_get_autoinc(uint64_t *value);

is called inside ha_innobase::get_auto_increment() which will attempt to obtain the autoinc lock for that table (look at ha_innobase::innobase_lock_autoinc()), then given that the lock was successfully obtained, innobase_get_autoinc() will set the provided uint64_t variable with the next auto increment value:

 *value = dict_table_autoinc_read(prebuilt->table);

If you look inside dict_table_autoinc_read() you will notice that it returns table->autoinc which is InnoDB’s internal auto increment counter described in the previous section. We don’t increment table->autoinc before returning it since this value is going to be updated after it is returned. In other words, the next auto increment value is precomputed. Needless to say, this critical region is still protected by the autoinc mutex so you don’t have to worry about updated values to clash.

After obtaining the value from InnoDB

Once we return from ha_innobase::get_auto_increment(), the core server can now store the returned 64bit integer as binary to the next_number_field object for further processing.

If you wish to see the value of this data, you can print it as a primitive type (uint64_t in this case) by obtaining it with the val_int() function. So if you were going through the code with gdb, performing this at an appropriate time (e.g. after the autoinc value is stored) will print the value that will be used to update the auto increment column of your table:

(gdb) print table->next_number_field->val_int()

Finally at the end of handler::update_auto_increment(), the next auto increment value is computed so that the storage engine won’t have to do any work in case the request is a multi row statement (note that this is different to InnoDB’s autoinc precomputation). We then return back to ha_innobase::write_row(), where the row is actually inserted into the table with row_insert_for_mysql().

The rest of the flow is what you would expect: log what just happened, cleanup the mess caused in generating the autoinc value, and respond to the client.

Final words

So, this is all I have to cover in this entry. Admittedly my explanation is really abbreviated and perhaps a little over simplified but I figured this is enough to get back on the sourcecode after I forget my findings from the past couple of days in the future (my memory sucks).

I haven’t covered the auto increment internals for UPDATE statements but covering this would make this entry longer than I want it to be so I will save it for another day. Nevertheless, this entry will hopefully help curious individuals like myself to stare at Drizzle and InnoDB’s source code and enjoy an hour or two there.

Oh yea, and as a warning, do note that this entry could be outdated by the time you read it!

Toru Maesaka drizzle, knowledge, oss , , ,

Saying farewell to drizzleadmin

April 9th, 2009

For the last couple of days I’ve been working on removing drizzleadmin (formerly mysqladmin) from Drizzle‘s source tree and I’m happy to announce that the code is now merged to the trunk.

So why did we decide to throw out a program that’s been around for a long time? Well, the tool wasn’t really useful to drizzle since mysqladmin is designed for MySQL and we wanted to factor out these old command line tools in the tree (explained later). With that in mind, I’ve been gradually removing code from drizzleadmin and by the end of it’s lifetime, only two commands remained: “shutdown” and “ping”. These commands are now moved to the drizzle command line tool.

Here’s the new standard way to shutdown drizzled:

$ drizzle --shutdown --verbose
shutting down drizzled... done

Shutting down drizzled that’s listening to a certain port:

$ drizzle --shutdown --port=12321 --verbose 
shutting down drizzled on port 12321... done

Check whether drizzled is still alive on a certain port:

$ drizzle --ping --port=9306
drizzled is alive

As you can see above, it makes better sense for the user to use the standard CLI tool to perform such simple tasks. For anything more sophisticated, we’re encouraging new open source projects to arise. Now with libdrizzle out in the open, anyone (yes, including you) can easily write a client program for Drizzle. You could even write one in your favorite programming language once the binding for that language comes out.

If you have any cool ideas for a project, please do share it with us on the mailing list!

Toru Maesaka drizzle, oss , ,

Fascinating libdrizzle benchmark results

April 2nd, 2009

Spreading the word about Jay’s awesome findings on the libdrizzle benchmark against the original library inherited from MySQL. For those that aren’t familiar with libdrizzle, it is a fresh new (modern implementation) MySQL compatible client library for Drizzle that leverages asynchronous I/O and smarter memory usage founded by Eric Day.

You can read how this library came to life in this thread:

As you can see in Jay’s findings with sysbench, libdrizzle outperforms the original library in all concurrency levels by a rather significant figure (e.g. 41.16% performance increase at only two threads). If you’re interested in gaining more performance from Drizzle or MySQL in the future, you should really start looking into this library now.

This was the first blog entry I read this morning and hey, it really kick started my day. Eric you rock! and thanks to Jay for sharing his findings.

Toru Maesaka drizzle, oss , , ,