Iterating Tokyo Cabinet in Parallel
Iterating a Tokyo Cabinet database (both B+Tree and Hash Table) is fairly easy as I’ve described it in the past. However, things are different if you want to allow multiple threads to iterate the database individually. This is because iterating TC in a “standard” way limits you to obtain only one iterator per database object.
The Problem
Iterating in a standard way means no matter how hard you try, only one thread can iterate the database at a time. This obviously kills concurrent performance and it is something you want to avoid at all costs. Why do I care? well, this is a pain for developing a relational storage engine because it means that the table scanner can’t be run in parallel. In terms of MySQL/Drizzle internals, threads will have to wait until the scanning thread exits rnd_end(). Not good at all.
The Solution
Fortunately Mikio (the author of TC) was aware of this issue from the beginning and has provided a series of hidden functions that can solve this problem. He actually has other hidden functionalities in TC that is only documented in the header file. He has no plans on deleting those and mentioned that they are only for experts that can actually be bothered reading the header file.
The function we’re interested in is a key-based iterator called tchdbgetnext() and I will introduce my favorite of the series, tchdbgetnext3(). The idea behind tchdbgetnext() is — Given a particular key, TC will return a key/value pair of the next record. The database can then be iterated by continuously throwing the returned key at tchdbgetnext(). The first record in the database can be obtained by providing a NULL key.
You’ll see why Mikio decided to hide this function in the following example with tchdbgetnext3():
const char *fetched_data; char *current_key = NULL; char *last_key = NULL; int fetched_data_len = 0; int current_key_len = 0; int last_key_len = 0; while (true) { last_key = current_key; last_key_len = current_key_len; current_key = tchdbgetnext3(tokyo_hashdb_handle, last_key, last_key_len, ¤t_key_len, &fetched_data, &fetched_data_len); /* This will free the value as well. Explained in the blog entry.*/ free(last_key); /* The entire database has been iterated */ if (current_key == NULL) break; }
You might be wondering why the “last_key” pointer is needed above. The answer is, tchdbgetnext3() returns an allocated pointer to the next key so continuously using the same pointer will result in a memory leak. To avoid this leak, “last_key” is used to remember where “current_key” had pointed to before it was re-pointed by tchdbgetnext3(). Confused? This takes a bit of thinking to understand hence it is only documented in the TC header file.
Another tricky but nice thing about tchdbgetnext3() is that it only calls malloc(3) once for each key/value pair. Usually malloc(3) is called individually for both key/value but with tchdbgetnext3(), TC allocates a buffer with enough space to accommodate both key/value and copies them next to each other. So, the pointers for key and value that tchdbgetnext3() sets on success is actually on the same buffer. This is why I only call free(3) on the key pointer in the above example. It frees the entire buffer which includes the value region.
Again, this takes a little bit of thinking to understand but it can cutdown malloc(3) call by half which can mean a lot to some people.
