Archive

Posts Tagged ‘hacking’

BlitzDB Crash Safety and Auto Recovery

July 22nd, 2010

Crash Safety is a big deal in the database league. Lack of durability can lead to all sorts of terrible things upon a catastrophic event. Many projects, especially in the so called NoSQL world compromises crash safety in return for higher QPS. The argument there is that the availability of the overall system should be accomplished by replication since a database server can’t be rescued if the physical disk breaks. I happen to agree with this philosophy but I am also aware that this isn’t a correct answer for everyone. So, what will I do with BlitzDB?

Several relational database hackers have pointed out that BlitzDB isn’t any safer than MyISAM since it doesn’t guarantee crash safety. This is currently true but I plan on making BlitzDB much safer than MyISAM by providing following features.

  1. Auto Recovery Routine (startup option)
  2. Tokyo Cabinet’s Transaction API (table-specific option)

The second feature above would actually guarantee BlitzDB to be crash safe (especially combined with auto recovery) but I won’t get into depth in this post since this topic deserves a blog post of it’s own. Let me just state that this feature will be provided in a form like this:

CREATE TABLE t1 (
  a int PRIMARY KEY,
  b varchar(256)
) ENGINE = BLITZDB, CRASH_SAFE;

From here on, I’ll cover how I plan on hacking auto recovery in BlitzDB.

Auto Recovery Challenges

As I blogged a while back, recovering Tokyo Cabinet is relatively simple. However, this is not a sufficient solution in BlitzDB since the data file (hash database that actually holds the rows) and the index file(s) are independent from each other. That is, the likelihood of the data file and the index file(s) to be inconsistent is very high after a crash. So, how can we hack on this? Pretty simple.

Indexes aren’t Important at Recovery Phase

Because BlitzDB logically separates the data file and it’s indexes, index files aren’t that important. If a server crash had occurred, BlitzDB could delete the index file(s) and recompute them from the data file. Needless to say, this process would involve a lot of random access and computation but it would not dominate the time space of the system since it’s a one-time cost. This approach however has one flaw in it such that the index files can’t be recomputed if the data file is broken or is unrecoverable.

Therefore to guarantee crash safety, BlitzDB must ensure that the data file is unbreakable. This is precisely where Tokyo Cabinet’s Transaction API comes in. I’m planning on using it to protect the data file from breaking. If the data file is protected, the table can be rescued. Simple!

So, that’s what I have in mind for making BlitzDB a safer engine. Unfortunately I can’t start hacking on it immediately since I have several bugs to fix first. Nevertheless I’m looking forward to start hacking on it. This challenge should be quite fun to tackle.

Toru Maesaka drizzle, oss , , ,

Tips on Drizzle Development and Valgrind

December 1st, 2009

In brief, valgrind is a framework of awesome tools that does an amazing job at detecting memory errors. It will catch silly (often unexpected) mistakes and memory leaks that you’ve made in your code. IMHO, it’s a must have tool for open source hackers that work with Linux. If you develop a plugin or a storage engine for Drizzle/MySQL, you often end up wanting to test your program for memory errors. Actually, it’s not a “want”, it’s a MUST.

Conveniently by supplying a simple startup option, Drizzle and MySQL’s test runner will run the daemon process on valgrind’s virtual machine. I’m not sure about MySQL since I’ve never developed anything for it but at least with Drizzle you can run a test case independently by supplying the desired test name to the test runner.

 $ ./dtr your_test_file_name --valgrind

So, with BlitzDB this is what I do to isolate the test runner to only run my tests:

 $ ./dtr blitzdb.test --valgrind

Very simple.

The minor complication here is that the test runner will not output the valgrind report to the console and instead it writes the output to a file. So where is this file? the answer is, it’s written to the daemon’s error log which is located in the source tree:

$ less drizzle_src/tests/var/log/master.err
CURRENT_TEST: main.blitzdb
==24563== Memcheck, a memory error detector
==24563== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
...

Here’s another tip. If you ever wondered where the files that were generated in the test (like table and index files) are stored, they are stored inside the source tree as well. Here’s an example on my machine:

$ ll drizzle_src/tests/var/master-data/
total 20528
-rw-rw---- 1 tmaesaka tmaesaka 10485760 2009-12-01 22:06 ibdata1
-rw-rw---- 1 tmaesaka tmaesaka  5242880 2009-12-01 22:06 ib_logfile0
-rw-rw---- 1 tmaesaka tmaesaka  5242880 2009-12-01 22:06 ib_logfile1
drwxr-xr-x 2 tmaesaka tmaesaka     4096 2009-12-01 22:06 mysql
drwxr-xr-x 2 tmaesaka tmaesaka     4096 2009-12-01 22:06 test

So, with all that in mind, happy hacking :)

Toru Maesaka drizzle, knowledge, oss ,