Writing records with duplicate keys to Tokyo Cabinet
Lately I’ve been noticing that people are visiting my blog to find ways to write multiple records with the same key to a Tokyo Cabinet (TC) database.
Well, the answer depends on which data structure you choose to construct a TC database. If you’re interested in TC’s hash database then you’re out of luck but TC’s B+Tree database will allow you to write duplicate keys. If you just want the answer, here’s a compilable source of how to do it. For those that are interested in how it works, keep on reading :)
So here’s how it’s done. You write the record(s) using TC’s tcbdbputdup() function so that upon key collision, TC will write the record next to the existing one. The following snippet will write three records to Tokyo Cabinet using an identical key.
const char *key = "key"; const char *r1 = "record 1"; const char *r2 = "record 2"; const char *r3 = "record 3"; /* store three different records with the same key. note that "database_handle" is a TCBDB object. */ if (!tcbdbputdup(database_handle, key, 3, r1, strlen(r1)) || !tcbdbputdup(database_handle, key, 3, r2, strlen(r2)) || !tcbdbputdup(database_handle, key, 3, r3, strlen(r3))) { fprintf(stderr, "failed to store data\n"); if (!tcbdbclose(database_handle)) { fprintf(stderr, "failed to close the database\n"); return 1; } tcbdbdel(database_handle); return 1; }
Something to watch out here is that because you’ve allowed duplication, running the above code multiple times will respectively keep appending the records to the database.
The next question is, how do we retrieve _only_ the records that corresponds to the key that we just inserted with. Simple! just traverse the tree from the first occurrence of the key and keep retrieving the data as we go until we hit a different key.
First thing that must be done is to create a cursor and move it to the first occurrence of the key.
BDBCUR *cursor; if ((cursor = tcbdbcurnew(db)) == NULL) { /* FAIL. do the right thing for your application */ } /* move the cursor to the first occurrence of the key */ if (!tcbdbcurjump(cursor, key, strlen(key))) { /* FAIL. do the right thing for your application */ }
Now we’re ready to traverse the tree. Remember that we’re only interested in a certain key so we only want to traverse the tree until we hit a different key. The following code snippet will do exactly that and print the discovered record as it traverses the tree. So in our case it would print, “record 1″, “record 2″ and “record 3″.
char *fetched_key; char *fetched_value; /* traverse the tree. terminates if the entire tree is traversed _OR_ if it hits a different key */ while (tcbdbcurkey2(cursor) != NULL) { fetched_key = tcbdbcurkey2(cursor); /* different key so break out of the loop */ if (strcmp(key, fetched_key) != 0) { free(fetched_key); break; } fetched_value = tcbdbcurval2(cursor); if (fetched_value) { fprintf(stdout, "fetched: %s\n", fetched_value); free(fetched_value); } tcbdbcurnext(cursor); }
The above tree traversal requires one additional lookup to terminate (if the entire tree isn’t traversed) but the chances are that the records are stored in the same page so this additional operation is cheap.
Alternatively, TC provides a function called tcbdbget4() which returns an allocated list of records that corresponds to the key you provide. If you decide to take this approach, you should consider whether the memory allocation cost and linked list construction overhead is feasible for your application or not.
