Archive

Archive for October, 2009

Fantastic Vacation in Hawaii

October 30th, 2009

Recently I took a late summer vacation to Oahu, Hawaii for a week with my coworkers. We rented a car (a Dodge to feel more American!) so we were able to visit various places like Swap Meet, Farmers Market, Hanauma Bay, Hale’iwa, North Shore and Mountains in the east.

I totally loved my stay there and I think I can now understand why Hawaii is by far the preferred travel destination among Japanese. Personally, snorkeling at Hanauma Bay was the highlight of the trip. Swimming there was simply astonishing. I’ll gradually add photos to my set on Flickr.

Sunset at Waikiki Beach, Hawaii

The above is a sunset shot I took at Waikiki beach on my final evening.

Toru Maesaka travel ,

Farewell GeoCities and Thank You

October 27th, 2009

Today I found out on Leah Culver’s blog post that GeoCities is shutting down. This news saddens me and here’s why — I remember being a farely heavy GeoCities user before it was acquired by Yahoo!. The first ever crappy HTML I made public on the web was with GeoCities in it’s early days. IIRC, my “homepage” was about Japanese RPGs that I was into at the time and providing crappy but freely usable images for anyone that ran a website. I guess you could call this my first encounter with open source and contribution spirit.

GeoCities certainly made some degree of influence in my life and because of this I feel saddened by this news. I will not forget you GeoCities.

Toru Maesaka webservice , ,

Iterating Tokyo Cabinet in Parallel

October 21st, 2009

Iterating a Tokyo Cabinet database (both B+Tree and Hash Table) is fairly easy as I’ve described it in the past. However, things are different if you want to allow multiple threads to iterate the database individually. This is because iterating TC in a “standard” way limits you to obtain only one iterator per database object.

The Problem

Iterating in a standard way means no matter how hard you try, only one thread can iterate the database at a time. This obviously kills concurrent performance and it is something you want to avoid at all costs. Why do I care? well, this is a pain for developing a relational storage engine because it means that the table scanner can’t be run in parallel. In terms of MySQL/Drizzle internals, threads will have to wait until the scanning thread exits rnd_end(). Not good at all.

The Solution

Fortunately Mikio (the author of TC) was aware of this issue from the beginning and has provided a series of hidden functions that can solve this problem. He actually has other hidden functionalities in TC that is only documented in the header file. He has no plans on deleting those and mentioned that they are only for experts that can actually be bothered reading the header file.

The function we’re interested in is a key-based iterator called tchdbgetnext() and I will introduce my favorite of the series, tchdbgetnext3(). The idea behind tchdbgetnext() is — Given a particular key, TC will return a key/value pair of the next record. The database can then be iterated by continuously throwing the returned key at tchdbgetnext(). The first record in the database can be obtained by providing a NULL key.

You’ll see why Mikio decided to hide this function in the following example with tchdbgetnext3():

const char *fetched_data;
char *current_key = NULL;
char *last_key = NULL;
 
int fetched_data_len = 0;
int current_key_len = 0;
int last_key_len = 0;
 
while (true) {
  last_key = current_key;
  last_key_len = current_key_len;
 
  current_key = tchdbgetnext3(tokyo_hashdb_handle, last_key,
                              last_key_len, &current_key_len,
                              &fetched_data, &fetched_data_len);
 
  /* This will free the value as well. Explained in the blog entry.*/
  free(last_key);
 
  /* The entire database has been iterated */
  if (current_key == NULL)
    break;
}

You might be wondering why the “last_key” pointer is needed above. The answer is, tchdbgetnext3() returns an allocated pointer to the next key so continuously using the same pointer will result in a memory leak. To avoid this leak, “last_key” is used to remember where “current_key” had pointed to before it was re-pointed by tchdbgetnext3(). Confused? This takes a bit of thinking to understand hence it is only documented in the TC header file.

Another tricky but nice thing about tchdbgetnext3() is that it only calls malloc(3) once for each key/value pair. Usually malloc(3) is called individually for both key/value but with tchdbgetnext3(), TC allocates a buffer with enough space to accommodate both key/value and copies them next to each other. So, the pointers for key and value that tchdbgetnext3() sets on success is actually on the same buffer. This is why I only call free(3) on the key pointer in the above example. It frees the entire buffer which includes the value region.

Again, this takes a little bit of thinking to understand but it can cutdown malloc(3) call by half which can mean a lot to some people.

Toru Maesaka oss ,

BlitzDB and Keyless Tables

October 9th, 2009

Previously you couldn’t create a table without defining a primary key with BlitzDB. This actually sounds like a nice constraint since you should always define a primary key. However, I went ahead and made this possible since one of the reasons that I’m developing BlitzDB is to get a better understanding of how the MySQL/Drizzle storage subsystem works. So, implementing a hidden key-generator and using it internally was something I wanted to do for sometime.

Previously this is what you got if you tried to create a table without a primary key:

drizzle> create table t1 (col1 int, col2 int, col3 text) engine=blitz;
ERROR 1173 (42000): This table type requires a primary key

Now:

drizzle> create table t1 (col1 int, col2 int, col3 text) engine=blitz;
Query OK, 0 rows affected (0 sec)

Inserting rows work as you would expect:

drizzle> insert into t1 values (1, 1, "first row");
Query OK, 1 row affected (0 sec)
 
drizzle> insert into t1 values (1, 2, "second row");
Query OK, 1 row affected (0 sec)
 
drizzle> insert into t1 values (1, 3, "third row");
Query OK, 1 row affected (0 sec)
 
drizzle> insert into t1 values (2, 1, "fourth row");
Query OK, 1 row affected (0 sec)
 
drizzle> insert into t1 values (2, 2, "fifth row");
Query OK, 1 row affected (0 sec)
 
drizzle> insert into t1 values (2, 3, "sixth row");
Query OK, 1 row affected (0 sec)

Selecting rows works fine although since there isn’t a key column in this table, every operation would require a full table scan which is not sexy:

drizzle> select * from t1;
+------+------+------------+
| col1 | col2 | col3       |
+------+------+------------+
|    1 |    1 | first row  | 
|    1 |    2 | second row | 
|    1 |    3 | third row  | 
|    2 |    1 | fourth row | 
|    2 |    2 | fifth row  | 
|    2 |    3 | sixth row  | 
+------+------+------------+
 
drizzle> select * from t1 where col1 = 1;
+------+------+------------+
| col1 | col2 | col3       |
+------+------+------------+
|    1 |    1 | first row  | 
|    1 |    2 | second row | 
|    1 |    3 | third row  | 
+------+------+------------+
3 rows in set (0 sec)
 
drizzle> select * from t1 where col2 = 2;
+------+------+------------+
| col1 | col2 | col3       |
+------+------+------------+
|    1 |    2 | second row | 
|    2 |    2 | fifth row  | 
+------+------+------------+
2 rows in set (0 sec)

How the internal works

BlitzDB does what most people would assume. It atomically generates a sequential unsigned 64bit integer then if necessary, converts it to big endian (network byte order). It then uses that value as a key to store the row into TC. The auto-generated key is made sure to be big-endian because I want BlitzDB tables to work on all platforms. That is, admins should be able to copy the “data files” over to another server and happily keep using the database. Keys are converted and _always_ used as little-endian inside BlitzDB.

Next Step

There’s still some bits and pieces on update related code that I need to work on but in general things are looking good. When I get those tasks done, I can then start working on supporting secondary index which I have cool ideas for.

Toru Maesaka drizzle, oss ,

Yokohama and Chinatown Experience

October 5th, 2009

Despite living in Tokyo for almost three years, I haven’t really had the opportunity to take a good look around Yokohama. So, being an active individual I decided to go sightseeing around Yokohama and have dinner in Chinatown. Chinatown at night is really nice to walk around but the problem is that there’s too many restaurants to choose from. Unfortunately I chose the wrong restaurant but hopefully I’ll get it right next time with prior research.

Chinatown, Yokohama

Picture of a cool looking object outside JR Yokohama station.

Outside Yokohama Station

Next stop? Hawaii.

Toru Maesaka travel , ,