Archive

Posts Tagged ‘memcached’

Why Stats was refactored in memcached-1.3

September 25th, 2008

For those that do not follow the active development of memcached, the current excitement in the community is the new binary protocol that will be introduced in the upcoming 1.3 series. If you’d like a quick and easy introduction on the binary protocol, you can see the slides from my presentation.

So, with such significant advances, the 1.3 codebase is obviously going to look a bit different to the 1.2 codebase, but even then the overall software architecture is the same. Whats significantly different however, is how the stats opreration is implemented. This is why I am writing this entry, to answer the questions that people might have in advance.

Background in a nutshell

Looking further ahead, beyond the binary protocol, the memcached community is aiming to achieve a pluggable engine architecture, which will allow memcached to satisfy unique requirements that people might have. These unique requirements can be things like, persistent storage, data dumping, server-side replication and etc. All these fancy stuff obviously goes against the original motives of memcached but I will save this discussion for another day, as it is not appropriate for this entry :)

Supporting third party engines mean that memcached must be able to send back engine specific stats to the client (most likely a system admin). To achieve this, memcached’s stats handling had to be made flexible by splitting the concept of “stats” into two segments, “core server stats” and “engine stats” and hence the refactoring.

The new approach

Previously, stats was done by incrementing/accumulating values inside the stats structure (defined in memcached.h). The actual increments were done mostly in the server code.

In the new approach, an engine does not have to depend on the stats structure because this would limit the engine to this structure. Adding an opaque pointer to the structure for pointing to something engine specific could get around this problem but lets not go there… no, no, no.

All non-server stats are pushed out to the slabber code since this is the closest thing to an engine in memcached at the moment. In this model, if a client asks for something unique (e.g. “stats malloc”), then the server will query the engine for “malloc”. If the engine has no clue of what the client is asking for, then the server will simply return an error.

Likewise, if the client asks for non-specific stats (“stats\r\n” in the ASCII protocol), memcached will return the merged result of itself (core-server stats) and stats for general purpose from the slabber (bytes written, num of get/set and etc).

If you’d like to see the actual code, take a look at this branch:
http://github.com/tmaesaka/memcached/commits/binprot

Make sure you checkout the “binprot” branch.

Binary Stats is Packet-Per-Stat

Before I talk about how stats is implemented, I must mention that with the binary protocol, each statistical information is returned in it’s own packet (as mentioned in the documentation). The key contains the name of the statistical information and the value contains the associated value. Transmission termination is signaled with a packet with no key and value.

How it works

So how does this work? the laziest solution is to enforce the responsibility and implementation of data formatting/serialization to the engine, but this has the potential pitfall of:

  • Server Failure due to incorrect formatting/serialization by the engine.

Instead, an engine is given a callback that it can use to format/serialize stats data for returning to the core server. This way we can reduce the likelihood of an engine returning something invalid to the core server (assuming that the implementer uses the callback of course). Specifically, the engine needs to implement the following function:

char *get_stats(const char *stat_name, uint32_t (*add_stats)(
                char *buf, const char *key, const uint16_t klen,
                const char *val, const uint32_t vlen), int *buflen);

and notice the callback:

uint32_t (*add_stats)(char *buf, const char *key, const uint16_t klen,
                      const char *val, const uint32_t vlen);

where the buf argument is the buffer that the entry will be serialized to, the key argument should be the name of the statistical information (e.g. “bytes”) and the value should contain the associated value (e.g. “1024″). The remaining klen and vlen arguments should represent the length of the key and value (e.g. strlen(“bytes”)).

This callback returns the number of bytes it had appended to the provided buffer, which the engine can use to forward the write pointer for further appending. Just make sure you allocate enough memory in advance (each append has a 24 byte overhead for the binary protocol).

Another thing to mention is that the engine does not have to worry whether the return data is for the ascii or binary protocol, since memcached will give the appropriate callback (with different logic that corresponds to the protocol type) to the engine.

Once the engine populates the buffer with data that it would like to report, it can then simply return it to the core server, where it will be sent back to the client.

So, get_stats() could look something like this:

/* assume, foo_key = "hello" and foo_val = "world" */
char *buf, *ptr;
uint32_t nbytes = 0;
 
if ((buf = malloc(num_of_bytes)) == NULL)
    return NULL;
 
ptr = buf;
nbytes = add_stats(ptr, foo_key, strlen(foo_key),
                   foo_val, strlen(foo_val));
if (!nbytes)
    return NULL;
ptr += nbytes;
*buflen += nbytes;
 
...
 
*buflen += add_stats(ptr, NULL, 0, NULL, 0); /* seal with terminator */
return buf;

Thats it! minimal coding is required from the engine implementer.

Good and the not so Good

Like all things, stats over the binary protocol has its ups and downs. The good thing about the packet-per-row approach is that the client library should be easier to write, especially for languages that aren’t so string friendly (e.g. C compared to Perl). I’ve already heard that it made libmemcached’s life happier.

The downside however is the network cost of binary stats compared to the ASCII protocol. Because a packet must be created for each statistical information, the total bytes to transmit over the wire can be relatively large. For example, if you want to return ten stat rows back to the client, then the number of bytes to transmit is:

“264 bytes (sum of packet headers, including terminator) + size of each key and value”

whereas with the ASCII protocol it would be just:

“size of each key/value + 20 bytes (sum of CRLF) + 5 bytes (terminator)”

Sure, the size difference may look trivial and you may not issue the stats command much but some system admins might care…

Conclusion

As you can see, a decent amount of thought has been put into the 1.3 series by the memcached community, and as a result, memcached will keep getting better. It will stay simple as it always were and at the same time it will hopefully be able to do new things by accepting external engines in the future.

The stats code refactoring is a small (but important) step towards this goal :)

Toru Maesaka memcached, oss ,

memcached Night #1 in Tokyo

September 21st, 2008

memcached Night #1 in Tokyoで私が使用した講演資料を公開しました。

最近になって開発が一段落したという事もあり、講演時にベンチマークを取っておらず、説得力に欠けたスライドがいくつかあったかと思います。後付けになってしまいましたが、簡単なベンチマークを講演資料に追加しました。私の結果を要約すると、リクエストのconcurrencyが少ない場面だと、プロトコル間のパフォーマンスに差は見られませんが、同時リクエスト数を増やしていくと、パフォーマンスの差が見えてくるといったところです。今後はもっとヘビーなワークロードでテストを行う必要がありそうですね。

イベント自体は他のスピーカーの方達の話も面白く、ゲーム業界でもmemcachedが大事なところで使われているなど、本当に勉強になりました。参加者の皆さま、あらためて有り難うございます。懇親会も楽しく、COOKPADさんがあれだけのサービスを少数精鋭で支えている(技術面で)という話が私の中で印象強かったです。ぜひまたやりましょう。

最後に二つほど、明らかにさせておきたい事がありましたので、この機会に書かせて頂きます。

プロトコルドキュメントに関して

MIRACLE LINUXの吉岡さんのブログエントリーに固定長ヘッダのサイズが16バイトと書かれていて、それはおかしいぞ?と思い、見てみたらSix Apartのsvnレポジトリに入っているドキュメントが古いという事に気がつきました。現時点の仕様では固定長ヘッダのサイズは24バイトで、Trond Norbyeのgitレポジトリに現時点で最新のドキュメントがあります(紛らわしくて、すみません)。

チェックアウトするブランチ

講演以来、様々な方達にバイナリプロトコルのソースツリーを試して頂けているのですが、masterではなく、binprotというブランチ(http://github.com/tmaesaka/memcached/tree/binprot)をチェックアウトしてください。

さて来月、memcached hackathonの参加に米国に行きますので、その際にmemcached Nightで皆さまから頂いたフィードバックや日本での普及活動を口頭で伝えてきますね。

Toru Maesaka japanese, memcached , ,

Perl, Binary and Memcached

August 15th, 2008

The last few days I’ve been working on updating the binary protocol test in the latest memcached development branch to comply with the latest binary protocol specification. Prior to this update, the test client was sending an invalid request to the server, which as a consequence made the test hang and never finish.

In brief, this is the big difference:

Previously, CAS (compare and swap) value was treated as part of the extra header that is appended/serialized behind the request/response header. In the latest specification, CAS value is a required 8 byte field in the 24 byte request/response header (header size in the previous version was 16 bytes). Other than that, the rest were minor differences in the packet format of extra fields in certain commands. Easy work :)

Here is the actual diff:

http://github.com/tmaesaka/memcached/commit/67b4da9eb855ebe7695a197320232b8d25692f84

As you can see, the test suite currently used by memcached is Perl based. This was fortunate for me since Perl is the second language to C that I like. I also made the code style to be more “perl-like” by fixing the indents. Although heh, I can see a Perl programmer arguing that the use of if/else blocks in the test is not best practice.

You know, fixing the test was pretty meaningful to me since it had forced me to study the binary protocol specification, which I knew almost nothing about at the hackathon in Santa Clara, CA back in april. Hopefully I can make productive suggestions at the upcoming hackathon in Menlo Park, CA in october.

Toru Maesaka memcached, oss ,

Great Fun at the MySQL Seminar

May 28th, 2008

Last week I spoke at the MySQL seminar in Tokyo as a guest speaker with Brian Aker on “Memcached and MySQL”. The seminar turned out to be larger than I expected with just over one hundred attendees. You can checkout the photos from the official MySQL APAC blog (Be warned that it’s in Japanese).

MySQL Seminar

At the end of the seminar, I showed a brief demo of the custom storage engine project that Trond Norbye and I have been working on for the last few weeks. If you’re interested, you can read more about it on his blog, or come on over to the memcached channel on freenode :)

Toru Maesaka knowledge, memcached , ,