Home > knowledge, oss > Tokyo Cabinet Tip: Protected Database Iteration

Tokyo Cabinet Tip: Protected Database Iteration

Tokyo Cabinet (TC) provides iteration functionality for both it’s persistent and non-persistent data structures. For example, if you wanted to iterate through TC’s hash database, you can use the tchdbiternext() function. This is really straight forward to use such that:

void *key;
int key_len;
 
if (tchdbiterinit(tc_database_handle) != true) {
  /* failed to initialize iterator */
}
 
while ((key = tchdbiternext(tc_database_handle, &key_len)) != NULL) {
  /* work with the fetched key and key_len */
}

will iterate through the entire hash database that “tc_database_handle” object is responsible for. This can be handy if you need to loop through your database for some arbitrary reason.

However, there is a consequence in using this function in a concurrent environment with a use-case where the order of records _really_ matter. This is because even though TC is a thread-safe library, the iteration functions aren’t thread-safe in a way that we expect.

For example, if a write operation occurs while the application iterates over the database, you will end up iterating over a database that is in a changed state. This will not make the cursor go crazy and crash your application since TC handles this internally but you still end up iterating over a database that is in a state that you did not initially intend on looping through.

Solution to this is to simply block write operations to the database while your application iterates through. For example, you could use pthread’s rw_lock to allow other threads to read while you iterate but block writes until you finish iterating.

I was planning on doing this for a table scanner in the storage engine that I’m currently working on but turns out TC has an undocumented function that will take care of this internally. I’ve talked to Mikio about this function and apparently it is intentional that he hasn’t documented it on his specification page. He has no plans on throwing it out so you do not have to worry about it to magically disappear one day. For more information, you can take a look at his header file (tchdb.h for hash database).

Explanation and Simple Example

The function is called tchdbforeach() which will atomically iterate through your database from beginning to the end by supplying each key/value pair to the callback function that you provide. The signature of the callback is the following:

bool callback(const void *kbuf, int ksiz, const void *vbuf,
              int vsiz, void *op);

where the fifth argument, “void *op” is an opaque pointer to the data that you can pass to the callback. Here is a simple example that will increment a counter integer on each iteration using this function:

/* Do whatever you like with the provided key/value pair in here */
bool callback(const void *kbuf, int ksiz, const void *vbuf,
              int vsiz, void *op) {
  if (op == NULL)
    return false;
 
  *((int *)op) += 1;
 
  return true;
}
 
int main(void) {
  int niter = 0;
 
  ...
 
  if (!tchdbforeach(tc_database_handle, callback, &niter)) {
    fprintf(stderr, "failed to iterate the database\n");
    return EXIT_FAILURE;
  }
 
  printf("iterated %d times\n", niter);
 
  ...
 
  return EXIT_SUCCESS:
}

If all goes well, the counter variable will be set to the number of records in the database. This function is slightly more complex than using tchdbiternext() but you are guaranteed to iterate atomically which is pretty important for a table scanner.

I hope this function can help you too.

Toru Maesaka knowledge, oss , ,

  1. May 13th, 2009 at 14:49 | #1

    Can the key be refetched? MySQL/Drizzle can ask for a key/value a second in cases of ORDER BY. It is expected that the engine can provide the data from a second request. There is row cache that can be used to solve this but if it has to save the entire table there can be performance issues.

  2. May 14th, 2009 at 08:15 | #2

    Agreed, it would suck to burn memory just for that :(

    So, by meaning key refetch, does the core do this with rnd_next()? If so, I would have thought a refetch would be beyond rnd_next()’s responsibility…

    We could keep track of how many times the core iterated inside the engine and return the same key as it returned previously. Either that or we could keep a copy of the “previous key” and fetch the value, which could be more computation friendly.

    I’ll ping you on IRC about this when I come across it.

  3. bbxiong.xiao
    May 11th, 2010 at 17:25 | #3

    Hi, toru, Actually, i can not dump all my keys in either way, my TC/TT is running over 2 months, the records number is over 7,000,000 (uses “tchmgr inform”, i can see “record number: 7xxxxxx” clearly), when i tried to dump all keys inside TC, either way give the correct result, only 100, 000 records dumped and TC begins to complain “no more records”, i just can not believe it. My TC/TT runs well in getting/setting records, but i just can not dump all my keys!
    I tried to optimize my TC file(maybe my TC file is broken?) and it doesn’t work.

    Do you have any clue what happened?

  4. May 12th, 2010 at 05:27 | #4

    @bbxiong.xiao Hi! that’s odd. Can you reach the records that can’t be iterated through? For example, with tchdbget(). I’m currently doing some tests with n million records and my table scanner works 100% as expected.

    It would help if you could either email or write your loop condition here.

  5. bbxiong.xiao
    May 14th, 2010 at 06:06 | #5

    @Toru Maesaka

    Thanks toru for your reply.

    Yes, i can get the records which can not be iterated, (for example, i tested a key “149511433″, tchdbget returns the correct value), i dumped the key list using two methods:

    1. use tchdbiternext:
    while (true)
    {
    iLen = 0;
    if ((pszKey = (char*)tchdbiternext(pstTCHDB, &iLen)) != NULL)
    {
    printf(“%d\n”, atoi(pszKey));
    free(pszKey);
    }
    else if (pszKey == NULL && iLen == 0)
    {
    printf(“tchdbiternext2 returns NULL, err=%d, msg=%s\n”, tchdbecode(pstTCHDB), tchdberrmsg(tchdbecode(pstTCHDB)));
    break;
    }
    else if (pszKey == NULL)
    {
    printf(“should never reach here\n”);
    }
    }

    2. and the second way:
    if (!tchdbforeach(pstTCHDB, DumpKeyCallback, &niter))
    {
    printf(“tchdbforeach failed\n”);
    return -1;
    }

    the result of them are equal, they all dumped 10w+ records, (not include my test key “149511433″).

    Also, after doing optimization on the TC hash-db file, the records number reduced to 10w+ and the file size falls sharply.(before optimize, i uses tchmgr which shows my records number is 879w+)

    I checked all my TC files, half of them are “broken” (records number falls sharply after optimization).

    Yes, this is strange, i also performed some tests before deploying TT+TC in product environment, So i’m trying to find out whether there is someone else met the same problem.

  6. May 14th, 2010 at 08:50 | #6

    @bbxiong.xiao Your loop looks fine. However, getting less records after performing optimization means that your database file is broken (as you mentioned). This is because TC will extract the records that it can access.

    My first thought was that the hash chain in your database is broken but the fact that you can directly access your record means that’s unlikely. I’m not sure how you broke the file but if you know the keys that are in the database (e.g. there’s a sequential pattern), I recommend writing a short code that rebuilds a new database file (so copy each record to the new database at a time).

    Either that or you could directly ask Mikio Hirabayashi for help/consulting since he’s the sole author of Tokyo Cabinet.

  1. No trackbacks yet.