Archive
End of Year Progress on BlitzDB
FURTHER UPDATE: Further thoughts on BlitzDB’s Index Handling
My open source friends might have noticed that I’ve been working quite a bit on BlitzDB lately. To tell the truth, I had a hidden goal to get Version-1 done by Christmas. Unfortunately it doesn’t look like I can reach that goal. However, looking at the brightside I got a lot done in the past few weeks so allow me to “journal” it in this blog post.
Agony of Knowing
The more I understood Drizzle’s storage mechanism and Tokyo Cabinet’s internals, the more I disliked what I previously had. This led me to spending quite a bit of time rewriting BlitzDB’s codebase. I was using pthread’s rwlock for concurrency control but I decided to design and write BlitzDB’s own lock mechanism to get the best out of TC (in terms of concurrency). I also rewrote the entire table scan code which is something you’d hope won’t be executed that often (people should use indexes!) but needless to say, it’s an important component of a relational storage engine so I’ve put in a lot of effort there.
Rewriting the Table Scanner
In the process of rewriting the table scanner, Jay Pipes’ gave me a fantastic advise on using Drizzle’s internal atomic type (drizzled::atomics). He gave me this advise because he noticed that my atomic ID generator was securing atomicity with pthread’s mutex. It is debatable that this mutex was only enabled for only few CPU instructions but the philosophy of using the most efficient method on the platform where BlitzDB is to be run was appealing enough for me to use drizzled::atomics. Mikio did some experiments on this and found that in a competitive/congested environment, using the compiler’s builtin function can gain you 3x throughput.
Hacking on Index Support
I’ve finally started hacking on index support and I just finished supporting basic operations on a primary key. By design, BlitzDB’s index is a dense clustered b+tree but in the first release I am going to limit PK to only be a HASH index. This is because I want BlitzDB to treat all PKs as direct keys inside the data dictionary (hash database where the actual rows are stored). So in other words, I want people to use PK for “needle in a haystack” like queries only. An example of a needle in a haystack like query is:
SELECT * FROM TABLE WHERE primary_key_column = whatever;
Saying that, I don’t like to force people to do things the way I like so I plan on providing best of both worlds by supporting both data structures for PKs in Version-2:
CREATE TABLE t1 (id int, PRIMARY KEY(id) USING btree) ENGINE=blitzdb; CREATE TABLE t1 (id int, PRIMARY KEY(id) USING hash) ENGINE=blitzdb;
BlitzDB’s default configuration will use PK as a “direct” data dictionary index. If you wish to do range queries on PK, the solution is to create a index on the PK column.
Primary Key lookup Performance
So, how does my implementation perform? Here’s a quick benchmark with a test-run that randomly fetches 100 thousand rows from a BlitzDB table with 1 million rows. This is the table I used:
CREATE TABLE t1 (id int PRIMARY KEY, a int, b int) ENGINE=blitzdb;
and the query looks like this:
SELECT * FROM t1 WHERE id = random_number_under_one_million;
The hardware I used is the following commodity server: Intel Quad Xeon E5345 (2x4MB L2 cache), 8GB Memory, 500GB SATA II. Unfortunately I could not prepare a standalone client server today so both the server and the test program were run on the same machine. Yeah… this sucks so I can’t claim that this benchmark is 100% creditable.
Here is the result I obtained from skyload. Please only view it as a guideline to BlitzDB’s lookup performance. I’ll do a proper benchmark with the Drizzle Community and publish it after I get Version-1 released.
[ READ LOAD EMULATION RESULT ] SQL File : 100k_select.sql Concurrent Connections : 1 Task Completion Time : 5.88856 secs Number of Queries: : 100000 Number of Test Runs: : 1 [ READ LOAD EMULATION RESULT ] SQL File : 100k_select.sql Concurrent Connections : 2 Task Completion Time : 6.94474 secs Number of Queries: : 100000 Number of Test Runs: : 1 [ READ LOAD EMULATION RESULT ] SQL File : 100k_select.sql Concurrent Connections : 4 Task Completion Time : 7.04455 secs Number of Queries: : 100000 Number of Test Runs: : 1
As you can see, “needle in a haystack” queries can be executed pretty efficiently in BlitzDB. Looking at the first result, we can observe that it took an average of 0.058 milliseconds to process a query.
Future Plans
Admittedly, primary key support isn’t completely done so I’ll continue working on it. After that, I will start hacking on b+tree indexes and write more tests as I go. Once I support at least two indexes, I’ll ask the Drizzle Community to consider merging BlitzDB into Drizzle’s trunk. This is my goal for BlitzDB at the moment.
I also happen to own blitzdb.com so I’m planning on putting user documentation (including tutorial) and architectural notes there. This is currently not so high on my TODO list so I suspect it won’t happen until I get Version-1 released. All I can say about the release schedule at the moment is, “before the MySQL conference in april”.
So, that’s all I have to summarize for now. Thanks for reading this far. Merry Christmas and have a Happy New Year. Don’t trip on ice :)
Storage Engine Tests in Drizzle. Organized!
Good news to storage engine developers. In Drizzle, you can now place your engine specific test files (.test and .result) in your engine’s directory. Here’s an example in BlitzDB:
First, let’s look inside BlitzDB’s directory.
$ ls -l blitzdb/ Total 60 -rw-r--r-- 1 maesaka maesaka 649 2009-12-13 20:51 AUTHORS -rw-r--r-- 1 maesaka maesaka 5878 2009-12-13 20:51 blitzdata.cc -rw-r--r-- 1 maesaka maesaka 3347 2009-12-13 20:51 blitzlock.cc -rw-r--r-- 1 maesaka maesaka 18146 2009-12-13 20:51 ha_blitz.cc -rw-r--r-- 1 maesaka maesaka 8360 2009-12-13 20:51 ha_blitz.h -rw-r--r-- 1 maesaka maesaka 289 2009-12-13 20:51 plugin.ac -rw-r--r-- 1 maesaka maesaka 261 2009-12-13 23:51 plugin.ini drwxr-xr-x 4 maesaka maesaka 4096 2009-12-13 23:51 tests
Notice the final line? that’s where the tests are kept. So, let’s look inside it.
$ ls -l blitzdb/tests/ Total 8 drwxr-xr-x 2 maesaka maesaka 4096 2009-12-13 23:51 r drwxr-xr-x 2 maesaka maesaka 4096 2009-12-13 23:51 t
As you can see, there are two directories. By now, storage engine developers would have caught on to what’s going on. The r/ directory is where the .result files are kept and t/ is where the .test files are kept. This is exactly the same layout as what we’re used to working on (“src/tests/t/” and “src/tests/r/”).
$ ls -l blitzdb/tests/t/ Total 8 -rw-r--r-- 1 maesaka maesaka 21 2009-12-13 23:51 blitzdb-master.opt -rw-r--r-- 1 maesaka maesaka 1964 2009-12-13 23:51 blitzdb.test
The .opt file is used to make sure that the server is started with your storage engine enabled. You simply write the startup option inside the .opt file. Here’s what mine looks like at the moment (there’s only a single line in it).
$ less blitzdb/tests/t/blitzdb-master.opt --plugin_add=blitzdb blitzdb/tests/t/blitzdb-master.opt (END)
Next step is actually running it. You simply specify your engine name with the --suite option to dtr and you’re done! Unfortunately the symlink permission for dtr seems broken on my repository so I’ll directly call test-run.pl in this example.
$ ./test-run.pl --suite=blitzdb Logging: ./test-run.pl --suite=blitzdb MySQL Version 2009.12.1245 Use of uninitialized value in scalar assignment at ./test-run.pl line 1416. Using MTR_BUILD_THREAD = -69.4 Using MASTER_MYPORT = 9306 Using MASTER_MYPORT1 = 9307 Using SLAVE_MYPORT = 9308 Using SLAVE_MYPORT1 = 9309 Using SLAVE_MYPORT2 = 9310 Using MC_PORT = 9316 Killing Possible Leftover Processes Removing Stale Files Creating Directories ======================================================= DEFAULT STORAGE ENGINE: innodb TEST RESULT TIME (ms) ------------------------------------------------------- blitzdb.blitzdb [ pass ] 63 ------------------------------------------------------- Stopping All Servers All 1 tests were successful. The servers were restarted 1 times Spent 0.063 of 2 seconds executing testcases
That’s it! I really like this change since it makes sense for engine-specific tests to belong inside the storage engine’s directory. It makes conceptual sense and it’s a good step towards differentiating the database kernel and the storage engine, which Monty Taylor is actively hacking on. Hopefully he’ll blog more about these changes soon.
Tips on Drizzle Development and Valgrind
In brief, valgrind is a framework of awesome tools that does an amazing job at detecting memory errors. It will catch silly (often unexpected) mistakes and memory leaks that you’ve made in your code. IMHO, it’s a must have tool for open source hackers that work with Linux. If you develop a plugin or a storage engine for Drizzle/MySQL, you often end up wanting to test your program for memory errors. Actually, it’s not a “want”, it’s a MUST.
Conveniently by supplying a simple startup option, Drizzle and MySQL’s test runner will run the daemon process on valgrind’s virtual machine. I’m not sure about MySQL since I’ve never developed anything for it but at least with Drizzle you can run a test case independently by supplying the desired test name to the test runner.
$ ./dtr your_test_file_name --valgrind
So, with BlitzDB this is what I do to isolate the test runner to only run my tests:
$ ./dtr blitzdb.test --valgrind
Very simple.
The minor complication here is that the test runner will not output the valgrind report to the console and instead it writes the output to a file. So where is this file? the answer is, it’s written to the daemon’s error log which is located in the source tree:
$ less drizzle_src/tests/var/log/master.err CURRENT_TEST: main.blitzdb ==24563== Memcheck, a memory error detector ==24563== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. ...
Here’s another tip. If you ever wondered where the files that were generated in the test (like table and index files) are stored, they are stored inside the source tree as well. Here’s an example on my machine:
$ ll drizzle_src/tests/var/master-data/ total 20528 -rw-rw---- 1 tmaesaka tmaesaka 10485760 2009-12-01 22:06 ibdata1 -rw-rw---- 1 tmaesaka tmaesaka 5242880 2009-12-01 22:06 ib_logfile0 -rw-rw---- 1 tmaesaka tmaesaka 5242880 2009-12-01 22:06 ib_logfile1 drwxr-xr-x 2 tmaesaka tmaesaka 4096 2009-12-01 22:06 mysql drwxr-xr-x 2 tmaesaka tmaesaka 4096 2009-12-01 22:06 test
So, with all that in mind, happy hacking :)

