Archive

Posts Tagged ‘tip’

Tokyo Cabinet Tip: Protected Database Iteration

May 13th, 2009

Tokyo Cabinet (TC) provides iteration functionality for both it’s persistent and non-persistent data structures. For example, if you wanted to iterate through TC’s hash database, you can use the tchdbiternext() function. This is really straight forward to use such that:

void *key;
int key_len;
 
if (tchdbiterinit(tc_database_handle) != true) {
  /* failed to initialize iterator */
}
 
while ((key = tchdbiternext(tc_database_handle, &key_len)) != NULL) {
  /* work with the fetched key and key_len */
}

will iterate through the entire hash database that “tc_database_handle” object is responsible for. This can be handy if you need to loop through your database for some arbitrary reason.

However, there is a consequence in using this function in a concurrent environment with a use-case where the order of records _really_ matter. This is because even though TC is a thread-safe library, the iteration functions aren’t thread-safe in a way that we expect.

For example, if a write operation occurs while the application iterates over the database, you will end up iterating over a database that is in a changed state. This will not make the cursor go crazy and crash your application since TC handles this internally but you still end up iterating over a database that is in a state that you did not initially intend on looping through.

Solution to this is to simply block write operations to the database while your application iterates through. For example, you could use pthread’s rw_lock to allow other threads to read while you iterate but block writes until you finish iterating.

I was planning on doing this for a table scanner in the storage engine that I’m currently working on but turns out TC has an undocumented function that will take care of this internally. I’ve talked to Mikio about this function and apparently it is intentional that he hasn’t documented it on his specification page. He has no plans on throwing it out so you do not have to worry about it to magically disappear one day. For more information, you can take a look at his header file (tchdb.h for hash database).

Explanation and Simple Example

The function is called tchdbforeach() which will atomically iterate through your database from beginning to the end by supplying each key/value pair to the callback function that you provide. The signature of the callback is the following:

bool callback(const void *kbuf, int ksiz, const void *vbuf,
              int vsiz, void *op);

where the fifth argument, “void *op” is an opaque pointer to the data that you can pass to the callback. Here is a simple example that will increment a counter integer on each iteration using this function:

/* Do whatever you like with the provided key/value pair in here */
bool callback(const void *kbuf, int ksiz, const void *vbuf,
              int vsiz, void *op) {
  if (op == NULL)
    return false;
 
  *((int *)op) += 1;
 
  return true;
}
 
int main(void) {
  int niter = 0;
 
  ...
 
  if (!tchdbforeach(tc_database_handle, callback, &niter)) {
    fprintf(stderr, "failed to iterate the database\n");
    return EXIT_FAILURE;
  }
 
  printf("iterated %d times\n", niter);
 
  ...
 
  return EXIT_SUCCESS:
}

If all goes well, the counter variable will be set to the number of records in the database. This function is slightly more complex than using tchdbiternext() but you are guaranteed to iterate atomically which is pretty important for a table scanner.

I hope this function can help you too.

Toru Maesaka knowledge, oss , ,