Archive

Archive for September, 2008

Thoughts on UTF-8 over CJK charsets in Drizzle

September 28th, 2008

Internally, Drizzle will use UTF-8 everywhere and _only_ UTF-8. This is simply because UTF-8 is the choice of encoding within the Drizzle community at the moment. To me, this decision makes sense since UTF-8 is popular in the areas that Drizzle is targetting (Web and the Cloud). Limiting to UTF-8 also means that the Drizzle codebase would become cleaner, thus easier to maintain. However, there are arguments against it in the community so this could change in the future.

So, what does this mean to those that are outside regions that use latin characters, specifically East Asia? Would this cause an uproar?

Few months ago, Brian Aker had asked me about this and after a brief discussion with Jay Pipes couple of days ago, I figured I should blog about this so I can keep it as a note for myself and hopefully gain feedbacks from those that stumbles across this entry. Here are my thoughts based on my knowledge on the Japanese web industry:

Web Industry Standard in Japan

Looking at the web industry trend in Japan, UTF-8 is becoming the prominent encoding, despite the fact that UTF-8 requires more computation power and space than Japanese CJK charsets. For example, mixi.jp (one of the largest websites in Japan) still uses EUC-JP (CJK family) due to historical reasons but if you look at their newer features like video sharing, you can see that they’ve begun adopting UTF-8. Yahoo! JP, COOKPAD, ja.wikipedia and Livedoor are great examples of large Japanese sites too.

The reason UTF-8 is becoming popular in the .jp domain IMHO is:

  • The default encoding of XHTML is UTF-8/UTF-16
  • All browsers support UTF-8 nowadays (if it doesn’t you shouldn’t be using it)
  • Theoretically, more characters can be represented in UTF-8
  • Theoretically, existing ASCII functions can be used

However, there are certainly cases where web developers might need to use their local encoding for supporting things like mobile devices (Shift-JIS in Japan). These unique requirements IMHO should be handled by the client, such that rather than making DBMS responsible, you should encode the returned result to whatever you like in the application layer before rendering it.

More overhead per character

Using UTF-8 means that there is going to be an estimated average of 1 byte overhead per character (typically an EUC-JP character is 2 bytes), hence if you have a lot of textual data already in either of CJK encodings, you’re definitely going to use more storage (the more data you have, the more significance).

Eating more space may seem significant but to me, whats more significant is the cost reduction in memory and storage mediums nowadays. If you begin facing problems due to having too much data, its probably time to consider horizontal partitioning anyway.

Conclusion

The topic discussed in this entry is very sensitive, and it is merely my personal opinion. Every encoding has its ups and downs like all things (they were designed for a purpose after all) and hence there are numerous amount of people with different opinions. Satisfying everyone is difficult, but who knows? UTF-8 alone may satisfy majority of the users that we are targeting. If it doesn’t then I guess we’ll have to think again… We also need to look into internal sort performance if we go pure UTF-8.

The conclusion Jay and I came up with in our brief discussion was that providing a conversion tool in the Drizzle package could be a good start to get people jumping into the UTF-8 boat. There is no specific plan nor we’ve decided to do this yet but if we were to do it, I’m thinking that the tool can be something simple that uses GNU libiconv.

Hey, there is always the brute solution of storing textual data of your choice in binary ;)

Toru Maesaka drizzle, oss , , ,

Why Stats was refactored in memcached-1.3

September 25th, 2008

For those that do not follow the active development of memcached, the current excitement in the community is the new binary protocol that will be introduced in the upcoming 1.3 series. If you’d like a quick and easy introduction on the binary protocol, you can see the slides from my presentation.

So, with such significant advances, the 1.3 codebase is obviously going to look a bit different to the 1.2 codebase, but even then the overall software architecture is the same. Whats significantly different however, is how the stats opreration is implemented. This is why I am writing this entry, to answer the questions that people might have in advance.

Background in a nutshell

Looking further ahead, beyond the binary protocol, the memcached community is aiming to achieve a pluggable engine architecture, which will allow memcached to satisfy unique requirements that people might have. These unique requirements can be things like, persistent storage, data dumping, server-side replication and etc. All these fancy stuff obviously goes against the original motives of memcached but I will save this discussion for another day, as it is not appropriate for this entry :)

Supporting third party engines mean that memcached must be able to send back engine specific stats to the client (most likely a system admin). To achieve this, memcached’s stats handling had to be made flexible by splitting the concept of “stats” into two segments, “core server stats” and “engine stats” and hence the refactoring.

The new approach

Previously, stats was done by incrementing/accumulating values inside the stats structure (defined in memcached.h). The actual increments were done mostly in the server code.

In the new approach, an engine does not have to depend on the stats structure because this would limit the engine to this structure. Adding an opaque pointer to the structure for pointing to something engine specific could get around this problem but lets not go there… no, no, no.

All non-server stats are pushed out to the slabber code since this is the closest thing to an engine in memcached at the moment. In this model, if a client asks for something unique (e.g. “stats malloc”), then the server will query the engine for “malloc”. If the engine has no clue of what the client is asking for, then the server will simply return an error.

Likewise, if the client asks for non-specific stats (“stats\r\n” in the ASCII protocol), memcached will return the merged result of itself (core-server stats) and stats for general purpose from the slabber (bytes written, num of get/set and etc).

If you’d like to see the actual code, take a look at this branch:
http://github.com/tmaesaka/memcached/commits/binprot

Make sure you checkout the “binprot” branch.

Binary Stats is Packet-Per-Stat

Before I talk about how stats is implemented, I must mention that with the binary protocol, each statistical information is returned in it’s own packet (as mentioned in the documentation). The key contains the name of the statistical information and the value contains the associated value. Transmission termination is signaled with a packet with no key and value.

How it works

So how does this work? the laziest solution is to enforce the responsibility and implementation of data formatting/serialization to the engine, but this has the potential pitfall of:

  • Server Failure due to incorrect formatting/serialization by the engine.

Instead, an engine is given a callback that it can use to format/serialize stats data for returning to the core server. This way we can reduce the likelihood of an engine returning something invalid to the core server (assuming that the implementer uses the callback of course). Specifically, the engine needs to implement the following function:

char *get_stats(const char *stat_name, uint32_t (*add_stats)(
                char *buf, const char *key, const uint16_t klen,
                const char *val, const uint32_t vlen), int *buflen);

and notice the callback:

uint32_t (*add_stats)(char *buf, const char *key, const uint16_t klen,
                      const char *val, const uint32_t vlen);

where the buf argument is the buffer that the entry will be serialized to, the key argument should be the name of the statistical information (e.g. “bytes”) and the value should contain the associated value (e.g. “1024″). The remaining klen and vlen arguments should represent the length of the key and value (e.g. strlen(“bytes”)).

This callback returns the number of bytes it had appended to the provided buffer, which the engine can use to forward the write pointer for further appending. Just make sure you allocate enough memory in advance (each append has a 24 byte overhead for the binary protocol).

Another thing to mention is that the engine does not have to worry whether the return data is for the ascii or binary protocol, since memcached will give the appropriate callback (with different logic that corresponds to the protocol type) to the engine.

Once the engine populates the buffer with data that it would like to report, it can then simply return it to the core server, where it will be sent back to the client.

So, get_stats() could look something like this:

/* assume, foo_key = "hello" and foo_val = "world" */
char *buf, *ptr;
uint32_t nbytes = 0;
 
if ((buf = malloc(num_of_bytes)) == NULL)
    return NULL;
 
ptr = buf;
nbytes = add_stats(ptr, foo_key, strlen(foo_key),
                   foo_val, strlen(foo_val));
if (!nbytes)
    return NULL;
ptr += nbytes;
*buflen += nbytes;
 
...
 
*buflen += add_stats(ptr, NULL, 0, NULL, 0); /* seal with terminator */
return buf;

Thats it! minimal coding is required from the engine implementer.

Good and the not so Good

Like all things, stats over the binary protocol has its ups and downs. The good thing about the packet-per-row approach is that the client library should be easier to write, especially for languages that aren’t so string friendly (e.g. C compared to Perl). I’ve already heard that it made libmemcached’s life happier.

The downside however is the network cost of binary stats compared to the ASCII protocol. Because a packet must be created for each statistical information, the total bytes to transmit over the wire can be relatively large. For example, if you want to return ten stat rows back to the client, then the number of bytes to transmit is:

“264 bytes (sum of packet headers, including terminator) + size of each key and value”

whereas with the ASCII protocol it would be just:

“size of each key/value + 20 bytes (sum of CRLF) + 5 bytes (terminator)”

Sure, the size difference may look trivial and you may not issue the stats command much but some system admins might care…

Conclusion

As you can see, a decent amount of thought has been put into the 1.3 series by the memcached community, and as a result, memcached will keep getting better. It will stay simple as it always were and at the same time it will hopefully be able to do new things by accepting external engines in the future.

The stats code refactoring is a small (but important) step towards this goal :)

Toru Maesaka memcached, oss ,

memcached Night #1 in Tokyo

September 21st, 2008

memcached Night #1 in Tokyoで私が使用した講演資料を公開しました。

最近になって開発が一段落したという事もあり、講演時にベンチマークを取っておらず、説得力に欠けたスライドがいくつかあったかと思います。後付けになってしまいましたが、簡単なベンチマークを講演資料に追加しました。私の結果を要約すると、リクエストのconcurrencyが少ない場面だと、プロトコル間のパフォーマンスに差は見られませんが、同時リクエスト数を増やしていくと、パフォーマンスの差が見えてくるといったところです。今後はもっとヘビーなワークロードでテストを行う必要がありそうですね。

イベント自体は他のスピーカーの方達の話も面白く、ゲーム業界でもmemcachedが大事なところで使われているなど、本当に勉強になりました。参加者の皆さま、あらためて有り難うございます。懇親会も楽しく、COOKPADさんがあれだけのサービスを少数精鋭で支えている(技術面で)という話が私の中で印象強かったです。ぜひまたやりましょう。

最後に二つほど、明らかにさせておきたい事がありましたので、この機会に書かせて頂きます。

プロトコルドキュメントに関して

MIRACLE LINUXの吉岡さんのブログエントリーに固定長ヘッダのサイズが16バイトと書かれていて、それはおかしいぞ?と思い、見てみたらSix Apartのsvnレポジトリに入っているドキュメントが古いという事に気がつきました。現時点の仕様では固定長ヘッダのサイズは24バイトで、Trond Norbyeのgitレポジトリに現時点で最新のドキュメントがあります(紛らわしくて、すみません)。

チェックアウトするブランチ

講演以来、様々な方達にバイナリプロトコルのソースツリーを試して頂けているのですが、masterではなく、binprotというブランチ(http://github.com/tmaesaka/memcached/tree/binprot)をチェックアウトしてください。

さて来月、memcached hackathonの参加に米国に行きますので、その際にmemcached Nightで皆さまから頂いたフィードバックや日本での普及活動を口頭で伝えてきますね。

Toru Maesaka japanese, memcached , ,

Drizzle Article in Japanese

September 4th, 2008

Yesterday, an article I wrote for a fairly large Japanese IT news portal called @IT was made public and I figured I should blog about it in English, so that I can tell my fellow Drizzlers about it. Here is the link to the article even though it is in Nihongo ;)

http://www.atmarkit.co.jp/fdb/rensai/drzl_pj/drzl01.html

This three page multi-byte article starts by covering the concept of how the project was launched by Brian Aker, and the overall concept and philosophy of Drizzle. I then moved on to describing how we are modernizing things, for example adopting the C99 standard, targeting modern hardware (lots and lots of cores) and the microkernel architecture. I also described how we intend on working with other open source communities by actively using open source libraries that are out there, rather than writing our own or use MySQL’s existing libraries.

One of the misunderstandings that came up after the announcement of Drizzle at OSCON was that Drizzle was being compared against SQLite. I was afraid that the same could happen in Japan so I made sure that this misunderstanding wouldn’t happen in my article. If you’re interested in the difference, it is well described in the Drizzle Wiki:

http://drizzle.wikia.com/wiki/Drizzle_compared_with_SQLite

Other than that, I thoroughly explained how we are committed to being open and transparent, hence constantly welcoming people and any suggestions and patches that they might have. Even if you find your suggestion to be something trivial, it could turn out to be a breakthrough for the community.

So the point is, lets all stimulate each other, have fun, and make a great piece of software :)

Toru Maesaka drizzle, oss