Archive for October, 2008
ACM ICPC Asia Regional Contest in Aizu
The last couple of days have been really fun. I was at the University of Aizu visiting the ACM ICPC Asia Regional Contest representing mixi as a sponsor. It was really nice to be somewhere in Japan thats not Tokyo for once though the three hour train ride wasn’t the most exciting transport in my life. Fortunately I got a bit of programming done and had Mario Kart DS with me
Getting to mingle with Computer Science students from all over Japan at the closing party was awesome. I was never told about the speech that I had to make but I think the crowd liked my last minute “if you’re not then you should get involved with open source” speech.
Congratulations to all the contestants and thumbs up to all the staff!
Open Source Conference in Malaysia
So I’ve been bugging Colin Charles to invite me over to Malaysia for the last couple of months and what does he offer me? an opportunity to speak at a open source conference in Malaysia
FOSS.my is a two day event (9th & 10th next month) and as stated on their conference homepage, the aim of this conference is to cover technical aspects of various OSS projects without any business/sales intervention for those that follow open source technology in South East Asia (and other regions too of course). The cool thing is that after telling mixi about this event, they liked the idea so much they decided to sponsor the event immediately. I didn’t really expect this but hey, awesome.
At this conference, I will be doing two talks where one will go over how mixi uses various OSS technologies to power the largest social networking service in Japan. The other talk will cover how the memcached internals work and latest hot topics in the development community like the upcoming binary protocol.
I’ve never been to Malaysia before so I’m totally looking forward to this trip.
memcached Hackathon #5 at Sun Microsystems
Last week I was in the valley for the fifth memcached Hackathon at Sun Microsystems and visiting some friends at Six Apart HQ. The hackathon was so fun, we ended up leaving at 2am on a weeknight! Thanks to Matt Ingenthron and Sun Microsystems for organizing the event and providing food and space for this hackathon
In the previous hackathon, we mostly exchanged ideas on the binary protocol and the storage engine interface. This time it was more code oriented and we reviewed and tested the progress everyone had made in the latest binary protocol tree. Unfortunately I couldn’t cover the whole hackathon but here is a summary of discussions from the agenda that I was involved in:
Binary Protocol - Add an engine specific OPCODE
No disagreements here. An opcode is represented by a 1 byte unsigned integer so the consensus was that we should dedicate anything over 127 (0×7F) for special operations.
Storage Interface
We didn’t get around to discussing the interface in depth since getting the binary protocol released has greater priority at the moment. Trond however showed me some of the interesting work that he has been doing which will hopefully be out in the open soon.
Test Framework
The issue here is that tests aren’t actively been written. Opinions voiced on this issue was that some people aren’t comfortable with Perl, and thus difficult to understand the current Perl based test system.
Switching to a different test framework in a different language is easy but the problem is that this is a never-ending story. People can easily start demanding other languages that they feel comfortable in (python, java, ruby, lua, …). We briefly discussed that the ideal model is to be able to add tests written in any language but we didn’t go into depth on how we would actually achieve this.
Personally, I have nothing against the current test framework (mind you I like Perl) but I think if we were to switch, a solely C based framework is a good move. I am saying this because those that would think about opening up the memcached package and editing it can most likely write C (is this an assertive assumption? heh).
Client Libraries
Unfortunately I couldn’t get around to participating in the client talk but client-side replication work was being done for libmemcached and I heard from Brian Aker that there was good progress.
Jonathan (hachi) reviewed my binary protocol patch for Cache::Memcached and found that some protocol negotiation assumptions I made in the code can be improved. He is also looking at optimizing the code by subclassing the patch (reduces the number of conditional selections, perl method calls and hash lookups).
Scaling on Highly Threaded Servers
We didn’t really discuss this in depth since we were busy reviewing and testing the server code but as far as I know, we talked about how locking can be improved in memcached. Looking into and preparing for this is a good idea since we are entering a massively concurrent age. To the contrary, guys from Facebook mentioned that they were getting sufficient throughput with the current locking scheme which was awesome to hear.
The engine plugin rearchitecture should fit well with this project since we can interchange different versions of the slabber engine with different locking strategies and make them compete to be the next default memcached engine.
Conclusion
The hackathon was fun and we got a lot done in terms of finding things to improve on. It was great to catch up with guys that I communicate a lot with online and talk tech in person. It was awesome that Brad turned up as well. As for code improvements, Dustin’s test code found an issue in the stats subsystem always returning a zero for an opaque value. A little bit of coding looked necessary to get around this problem since an opaque value is held by the connection structure, which the engine does not have access to (it shouldn’t) but I was bored on my flight back to Tokyo so this problem is now fixed and pushed to my tree
Rethinking the Query Cache for Drizzle
There is a mutual understanding in the Drizzle community that the MySQL query cache works well for a small database but isn’t sufficient for relatively large scale usages. Does your application involve a lot of database updates? if so, you’ll probably face fragmentation issues in the query cache (though using the query cache isn’t suitable for use cases like this).
Caching is the key ingredient in boosting the performance of any software that requires significant amount of computation, hence it is something that can’t be overlooked. So how can we improve Drizzle?
The idea is to create a pluggable query cache subsystem that can work in a large scale environment. Drizzle, being a micro-kernel DBMS, it makes sense to make the cache component pluggable and let the DBA choose the caching solution of their choice. This is exactly what I’m working on at the moment and my first plugin will allow Drizzle to use memcached as its query cache.
For example, a DBA could hook up their memcached pool to Drizzle and use several gigabytes of fast cache space to cache their results.
Things to consider
- Does the DBA really want to cache results?
- Does the result construction take long enough to care?
- Do we want to specify a specific SQL statement to always cache?
- Do we want to enforce a certain table to be cached?
- Transactional Engines
If we can satisfy the above points and achieve modularity, I think its a total win. For those that like diagrams, here is the architecture that is on my mind at the moment:
Benefits of using memcached
memcached is proven to work and help scale web applications in a cost effective fashion by various players in the web industry. It is also fast. The time complexity of fetching a cached result from memcached is O(1), which is an order we all love. Furthermore, by using memcached, the fragmentation issue disappears since this is a problem that the memcached community had to face in the past and successfully overcame by developing the slab subsystem.
Want to scale? with consistent hashing enabled, you can greatly reduce the number of cache misses from adding/removing a node from a live pool. Got spare boxes lying around? hook them up and powerup Drizzle! Need support? both memcached and Drizzle community members are heartwarming people.
Other Solutions work Too!
The beauty of modularity is that you can create and use your own solution for your unique requirements. For example lets assume that there is a webshop that wants to keep the number of physical servers down (e.g. limited monetary/space resource).
To satisfy the requirement stated above, you could cache to a fantastically fast hash database, such as Tokyo Cabinet (much, much faster than BDB). If you haven’t heard of it, you should look at the incredible benchmark comparison). So, what I really wanted to say is that the microkernel property of Drizzle will open up a lot of new possibilities for your application and help you tackle the new requirements that seem to come out of no where.
Where from here?
Currently going through the UDF -> Plugin Architecture conversion done by Mark, and planning on basing the code on his logging plugin while its fantastically simple. My work will be done in:
- lp:~tmaesaka/drizzle/pluggable-qcache
I’ll hopefully have something decent to show soon, and I will keep people updated on my blog, IRC and the Mailing List (drizzle-discuss).
So that is all I have to say for now… If you have any suggestions, please do enlighten me ![]()
Affection towards Beautiful Typography
Came across a service called Wordle this morning that can create a cloud of beautiful text based on the data you provide. Heres what it generated from my RSS feed:
For those that love beautiful typography, you should definitely try it out. The data you provide doesn’t have to be a feed. You can directly type in a bunch of text and get Wordle to generate a text cloud.
Heres an idea, copy and paste the lyric of your favorite song or poem at http://wordle.net/create and see what you get ![]()
メモ: DTraceを少し勉強してみた
なぜか今まで自分で調べるほど興味が向かなかったDTraceですが、最近よく耳にするようになった事もあり、少し勉強してみました。Mac OS X (10.5) から標準で使えるというのも大きかったですね。正直なところ、Solarisでしか動作しなかったら、DTraceに興味を示さなかったと思います。
ググってみたら、ITMediaなどで詳しく書かれている人がいるので、Yet another DTrace entryになってしまいますが、私の個人メモという事でお許しください。DTraceを調べてて思ったのが、Paul van den BogaardというSUNの人が書かれた、DTrace by Example: Solving a Real World Problemというドキュメントが凄く解り易かったです。
DTraceを簡素にいうと?
DTraceとはSUNがSolarisのために開発したダイナミックなトレーシング機能で、runtimeでシステムの動作をkernelレベルからトレースできるプロダクトです。Paulの資料ではフレームワークという表現がされていて、現在はSolarisの他にMac OS Xで使用する事が可能です。
DTraceを使うとシステム管理者や開発者などは手元のプログラムと、その更に下にあるOSに関する様々な情報を取得する事が可能になり、システムのプロファイリングやデバッグに役立ちます。ただ、DTraceと一言でいっても、様々なコンポーネントが絡んでいるので、気をつけないといけませんね:
- OSのカーネル
- D言語
- dtraceコマンド
- dtraceのバーチャルマシン
- dtraceのプローブ
- dtraceのプロバイダ
- dtraceの各種ライブラリ
私が見たところ、DTraceの良いところは、ユーザ(システム管理者など)にとって興味のない情報を含んだ膨大なトレース結果が返されるのではなく、トレースする・返す情報を細かいレベルまで実際にD言語をつかって指定する事が可能な事でしょうか(例えばシステムコールの発行数・実行時間だけを返せとか)。つまりD言語をつかってカスタムなトレーサやプロファイラーを書く事ができるわけですね。
DTraceの流れ
D言語で記述されたトレーススクリプトは、dtraceコマンドによって実行します。スクリプトを受け取ったdtraceコマンドはOSのカーネルに組み込まれたdtraceのバーチャルマシンが理解できる中間形式に変換します(Javaでいうbytecodeっぽいですね)。あとはバーチャルマシンがユーザのスクリプトに記述されたロジックに基づいた集計を行ってくれます。
集計の流れ、そしてProbeとProvider
DTraceにはprobeという概念があり、probeとはカーネル内の計測ポイントを示します。Probeには様々な種類があり、それぞれ特定の条件下で有効になります(とあるシステムコールが発行されたなど)。したがって、D言語のスクリプトに特定のprobeを指定すると、そのprobeが有効になった際に報告される情報を収集できるとの事。
ちなみに私の使っているMacBook Pro (OS X 10.5.5, DTrace API version 1.2.2) のDarwin kernelに組み込まれているprobeの総数を見てみたところ:
… (省略)
21652 plockstat4556 libSystem.B.dylib pthread_rwlock_wrlock$UNIX2003 rw-error
21653 plockstat4556 libSystem.B.dylib pthread_rwlock_unlock rw-release
21654 plockstat4556 libSystem.B.dylib pthread_rwlock_unlock$UNIX2003 rw-release
2万個以上ものProbeが発見されました。けっこうな数ですね〜。実際はオン・ザ・フライでprobeを生成するproviderが存在するので、probeはもっとあるらしいです。
SUN KKのDTraceドキュメント(日本語)によると、Probeが有効になると以下の情報が取得可能との事です:
- 関数に渡されたすべての引数
- カーネル内のすべての大域変数
- 関数が呼び出された日時を示すタイムスタンプ
- 関数を呼び出したコードセクションを示すスタックトレース
- 関数が呼び出されたとき実行中だったプロセス
- 関数を呼び出したスレッド
また、カーネル内でProbeを実際に有効にするカーネルモジュールをProviderと呼びます。Providerには色々な種類があって、特定のプロバイダに関連するprobeは、そのProviderにグルーピングされる様です。
ググってみたらProviderのリスト・種類の説明はDTraceのwikiとSolaris Dynamic Tracing Guideに載っていました:
http://wikis.sun.com/display/DTrace/Providers
http://docs.sun.com/app/docs/doc/817-6223(チャプター17から32まで)
D言語の書き方
私自身が人に教えれるほどD言語を把握していない事と、仕様を書くとブログエントリーの域を超えてしまうので、控えさせて頂きますが、DTraceユーザガイドの第3章に基本的な説明が記載されています。
楽な逃げ道を紹介すると、自分で頑張ってスクリプトをガリガリ書かなくても、DTraceToolKitという200種類以上もの充実したスクリプト集があります(存在を教えてくれたkiyotakaさんに感謝)。OS Xだと動かないスクリプトも結構あるみたいですが、私が試してみたメモリ関連のスクリプトは大丈夫っぽいです。
DTraceをMac OS Xで使ってみる(ワンライナー)
トリッキーな検査でなければ、スクリプトファイルを作成しなくても端末上でdtraceを試す事が可能です。
例えばプロセスがファイルを開いたら即座に報告する:
$ sudo dtrace -n 'syscall::open*:entry { printf("%s %s", execname, copyinstr(arg0)); }'
read(2)を呼んだプロセスを即座に報告する:
$ sudo dtrace -n 'syscall::read:entry { printf("%s", execname); }' // OUTPUT: // CPU ID FUNCTION:NAME // 1 17602 read:entry mDNSResponder // 1 17602 read:entry mDNSResponder // 1 17602 read:entry EchoPod // 1 17602 read:entry EchoPod
-p オプションでトレース対象を特定のプロセスに絞る (例, pid=6455):
$ sudo dtrace -p 6455 -n 'syscall::read:entry { printf("%s", execname); }'
など、ご覧の通りD言語を少し学ぶとPerlのワンライナー感覚で色々と遊べます。
余談
memcached-1.2.6には、ほぼ全てのコマンドに対しdtrace probeが組み込まれており、configure時に--enable-dtraceを指定するとprobeを適応する事ができるのです。つまり、memcachedに特化したtop(1)的なプログラムを数行で書けてしまうのです。
が、、現状だとOS Xの実装がSolarisと違うため、OS Xだとビルドに失敗してしまいます(dtraceの-Gオプションを認識しない)。この問題はSUNの技術者が絶賛対応中なので、1.3シリーズにはOS Xでビルドできるようになると期待していて、今後はmemcachedを修行相手にしてdtraceの知識や腕を磨きたいな〜、と思っています。

