Archive for the ‘memcached’ Category
memcached Hackathon #5 at Sun Microsystems
Last week I was in the valley for the fifth memcached Hackathon at Sun Microsystems and visiting some friends at Six Apart HQ. The hackathon was so fun, we ended up leaving at 2am on a weeknight! Thanks to Matt Ingenthron and Sun Microsystems for organizing the event and providing food and space for this hackathon
In the previous hackathon, we mostly exchanged ideas on the binary protocol and the storage engine interface. This time it was more code oriented and we reviewed and tested the progress everyone had made in the latest binary protocol tree. Unfortunately I couldn’t cover the whole hackathon but here is a summary of discussions from the agenda that I was involved in:
Binary Protocol - Add an engine specific OPCODE
No disagreements here. An opcode is represented by a 1 byte unsigned integer so the consensus was that we should dedicate anything over 127 (0×7F) for special operations.
Storage Interface
We didn’t get around to discussing the interface in depth since getting the binary protocol released has greater priority at the moment. Trond however showed me some of the interesting work that he has been doing which will hopefully be out in the open soon.
Test Framework
The issue here is that tests aren’t actively been written. Opinions voiced on this issue was that some people aren’t comfortable with Perl, and thus difficult to understand the current Perl based test system.
Switching to a different test framework in a different language is easy but the problem is that this is a never-ending story. People can easily start demanding other languages that they feel comfortable in (python, java, ruby, lua, …). We briefly discussed that the ideal model is to be able to add tests written in any language but we didn’t go into depth on how we would actually achieve this.
Personally, I have nothing against the current test framework (mind you I like Perl) but I think if we were to switch, a solely C based framework is a good move. I am saying this because those that would think about opening up the memcached package and editing it can most likely write C (is this an assertive assumption? heh).
Client Libraries
Unfortunately I couldn’t get around to participating in the client talk but client-side replication work was being done for libmemcached and I heard from Brian Aker that there was good progress.
Jonathan (hachi) reviewed my binary protocol patch for Cache::Memcached and found that some protocol negotiation assumptions I made in the code can be improved. He is also looking at optimizing the code by subclassing the patch (reduces the number of conditional selections, perl method calls and hash lookups).
Scaling on Highly Threaded Servers
We didn’t really discuss this in depth since we were busy reviewing and testing the server code but as far as I know, we talked about how locking can be improved in memcached. Looking into and preparing for this is a good idea since we are entering a massively concurrent age. To the contrary, guys from Facebook mentioned that they were getting sufficient throughput with the current locking scheme which was awesome to hear.
The engine plugin rearchitecture should fit well with this project since we can interchange different versions of the slabber engine with different locking strategies and make them compete to be the next default memcached engine.
Conclusion
The hackathon was fun and we got a lot done in terms of finding things to improve on. It was great to catch up with guys that I communicate a lot with online and talk tech in person. It was awesome that Brad turned up as well. As for code improvements, Dustin’s test code found an issue in the stats subsystem always returning a zero for an opaque value. A little bit of coding looked necessary to get around this problem since an opaque value is held by the connection structure, which the engine does not have access to (it shouldn’t) but I was bored on my flight back to Tokyo so this problem is now fixed and pushed to my tree
Rethinking the Query Cache for Drizzle
There is a mutual understanding in the Drizzle community that the MySQL query cache works well for a small database but isn’t sufficient for relatively large scale usages. Does your application involve a lot of database updates? if so, you’ll probably face fragmentation issues in the query cache (though using the query cache isn’t suitable for use cases like this).
Caching is the key ingredient in boosting the performance of any software that requires significant amount of computation, hence it is something that can’t be overlooked. So how can we improve Drizzle?
The idea is to create a pluggable query cache subsystem that can work in a large scale environment. Drizzle, being a micro-kernel DBMS, it makes sense to make the cache component pluggable and let the DBA choose the caching solution of their choice. This is exactly what I’m working on at the moment and my first plugin will allow Drizzle to use memcached as its query cache.
For example, a DBA could hook up their memcached pool to Drizzle and use several gigabytes of fast cache space to cache their results.
Things to consider
- Does the DBA really want to cache results?
- Does the result construction take long enough to care?
- Do we want to specify a specific SQL statement to always cache?
- Do we want to enforce a certain table to be cached?
- Transactional Engines
If we can satisfy the above points and achieve modularity, I think its a total win. For those that like diagrams, here is the architecture that is on my mind at the moment:
Benefits of using memcached
memcached is proven to work and help scale web applications in a cost effective fashion by various players in the web industry. It is also fast. The time complexity of fetching a cached result from memcached is O(1), which is an order we all love. Furthermore, by using memcached, the fragmentation issue disappears since this is a problem that the memcached community had to face in the past and successfully overcame by developing the slab subsystem.
Want to scale? with consistent hashing enabled, you can greatly reduce the number of cache misses from adding/removing a node from a live pool. Got spare boxes lying around? hook them up and powerup Drizzle! Need support? both memcached and Drizzle community members are heartwarming people.
Other Solutions work Too!
The beauty of modularity is that you can create and use your own solution for your unique requirements. For example lets assume that there is a webshop that wants to keep the number of physical servers down (e.g. limited monetary/space resource).
To satisfy the requirement stated above, you could cache to a fantastically fast hash database, such as Tokyo Cabinet (much, much faster than BDB). If you haven’t heard of it, you should look at the incredible benchmark comparison). So, what I really wanted to say is that the microkernel property of Drizzle will open up a lot of new possibilities for your application and help you tackle the new requirements that seem to come out of no where.
Where from here?
Currently going through the UDF -> Plugin Architecture conversion done by Mark, and planning on basing the code on his logging plugin while its fantastically simple. My work will be done in:
- lp:~tmaesaka/drizzle/pluggable-qcache
I’ll hopefully have something decent to show soon, and I will keep people updated on my blog, IRC and the Mailing List (drizzle-discuss).
So that is all I have to say for now… If you have any suggestions, please do enlighten me ![]()
Why Stats was refactored in memcached-1.3
For those that do not follow the active development of memcached, the current excitement in the community is the new binary protocol that will be introduced in the upcoming 1.3 series. If you’d like a quick and easy introduction on the binary protocol, you can see the slides from my presentation.
So, with such significant advances, the 1.3 codebase is obviously going to look a bit different to the 1.2 codebase, but even then the overall software architecture is the same. Whats significantly different however, is how the stats opreration is implemented. This is why I am writing this entry, to answer the questions that people might have in advance.
Background in a nutshell
Looking further ahead, beyond the binary protocol, the memcached community is aiming to achieve a pluggable engine architecture, which will allow memcached to satisfy unique requirements that people might have. These unique requirements can be things like, persistent storage, data dumping, server-side replication and etc. All these fancy stuff obviously goes against the original motives of memcached but I will save this discussion for another day, as it is not appropriate for this entry
Supporting third party engines mean that memcached must be able to send back engine specific stats to the client (most likely a system admin). To achieve this, memcached’s stats handling had to be made flexible by splitting the concept of “stats” into two segments, “core server stats” and “engine stats” and hence the refactoring.
The new approach
Previously, stats was done by incrementing/accumulating values inside the stats structure (defined in memcached.h). The actual increments were done mostly in the server code.
In the new approach, an engine does not have to depend on the stats structure because this would limit the engine to this structure. Adding an opaque pointer to the structure for pointing to something engine specific could get around this problem but lets not go there… no, no, no.
All non-server stats are pushed out to the slabber code since this is the closest thing to an engine in memcached at the moment. In this model, if a client asks for something unique (e.g. “stats malloc”), then the server will query the engine for “malloc”. If the engine has no clue of what the client is asking for, then the server will simply return an error.
Likewise, if the client asks for non-specific stats (”stats\r\n” in the ASCII protocol), memcached will return the merged result of itself (core-server stats) and stats for general purpose from the slabber (bytes written, num of get/set and etc).
If you’d like to see the actual code, take a look at this branch:
http://github.com/tmaesaka/memcached/commits/binprot
Make sure you checkout the “binprot” branch.
Binary Stats is Packet-Per-Stat
Before I talk about how stats is implemented, I must mention that with the binary protocol, each statistical information is returned in it’s own packet (as mentioned in the documentation). The key contains the name of the statistical information and the value contains the associated value. Transmission termination is signaled with a packet with no key and value.
How it works
So how does this work? the laziest solution is to enforce the responsibility and implementation of data formatting/serialization to the engine, but this has the potential pitfall of:
- Server Failure due to incorrect formatting/serialization by the engine.
Instead, an engine is given a callback that it can use to format/serialize stats data for returning to the core server. This way we can reduce the likelihood of an engine returning something invalid to the core server (assuming that the implementer uses the callback of course). Specifically, the engine needs to implement the following function:
char *get_stats(const char *stat_name, uint32_t (*add_stats)( char *buf, const char *key, const uint16_t klen, const char *val, const uint32_t vlen), int *buflen);
and notice the callback:
uint32_t (*add_stats)(char *buf, const char *key, const uint16_t klen, const char *val, const uint32_t vlen);
where the buf argument is the buffer that the entry will be serialized to, the key argument should be the name of the statistical information (e.g. “bytes”) and the value should contain the associated value (e.g. “1024″). The remaining klen and vlen arguments should represent the length of the key and value (e.g. strlen(”bytes”)).
This callback returns the number of bytes it had appended to the provided buffer, which the engine can use to forward the write pointer for further appending. Just make sure you allocate enough memory in advance (each append has a 24 byte overhead for the binary protocol).
Another thing to mention is that the engine does not have to worry whether the return data is for the ascii or binary protocol, since memcached will give the appropriate callback (with different logic that corresponds to the protocol type) to the engine.
Once the engine populates the buffer with data that it would like to report, it can then simply return it to the core server, where it will be sent back to the client.
So, get_stats() could look something like this:
/* assume, foo_key = "hello" and foo_val = "world" */ char *buf, *ptr; uint32_t nbytes = 0; if ((buf = malloc(num_of_bytes)) == NULL) return NULL; ptr = buf; nbytes = add_stats(ptr, foo_key, strlen(foo_key), foo_val, strlen(foo_val)); if (!nbytes) return NULL; ptr += nbytes; *buflen += nbytes; ... *buflen += add_stats(ptr, NULL, 0, NULL, 0); /* seal with terminator */ return buf;
Thats it! minimal coding is required from the engine implementer.
Good and the not so Good
Like all things, stats over the binary protocol has its ups and downs. The good thing about the packet-per-row approach is that the client library should be easier to write, especially for languages that aren’t so string friendly (e.g. C compared to Perl). I’ve already heard that it made libmemcached’s life happier.
The downside however is the network cost of binary stats compared to the ASCII protocol. Because a packet must be created for each statistical information, the total bytes to transmit over the wire can be relatively large. For example, if you want to return ten stat rows back to the client, then the number of bytes to transmit is:
“264 bytes (sum of packet headers, including terminator) + size of each key and value”
whereas with the ASCII protocol it would be just:
“size of each key/value + 20 bytes (sum of CRLF) + 5 bytes (terminator)”
Sure, the size difference may look trivial and you may not issue the stats command much but some system admins might care…
Conclusion
As you can see, a decent amount of thought has been put into the 1.3 series by the memcached community, and as a result, memcached will keep getting better. It will stay simple as it always were and at the same time it will hopefully be able to do new things by accepting external engines in the future.
The stats code refactoring is a small (but important) step towards this goal ![]()
memcached Night #1 in Tokyo
memcached Night #1 in Tokyoで私が使用した講演資料を公開しました。
最近になって開発が一段落したという事もあり、講演時にベンチマークを取っておらず、説得力に欠けたスライドがいくつかあったかと思います。後付けになってしまいましたが、簡単なベンチマークを講演資料に追加しました。私の結果を要約すると、リクエストのconcurrencyが少ない場面だと、プロトコル間のパフォーマンスに差は見られませんが、同時リクエスト数を増やしていくと、パフォーマンスの差が見えてくるといったところです。今後はもっとヘビーなワークロードでテストを行う必要がありそうですね。
イベント自体は他のスピーカーの方達の話も面白く、ゲーム業界でもmemcachedが大事なところで使われているなど、本当に勉強になりました。参加者の皆さま、あらためて有り難うございます。懇親会も楽しく、COOKPADさんがあれだけのサービスを少数精鋭で支えている(技術面で)という話が私の中で印象強かったです。ぜひまたやりましょう。
最後に二つほど、明らかにさせておきたい事がありましたので、この機会に書かせて頂きます。
プロトコルドキュメントに関して
MIRACLE LINUXの吉岡さんのブログエントリーに固定長ヘッダのサイズが16バイトと書かれていて、それはおかしいぞ?と思い、見てみたらSix Apartのsvnレポジトリに入っているドキュメントが古いという事に気がつきました。現時点の仕様では固定長ヘッダのサイズは24バイトで、Trond Norbyeのgitレポジトリに現時点で最新のドキュメントがあります(紛らわしくて、すみません)。
チェックアウトするブランチ
講演以来、様々な方達にバイナリプロトコルのソースツリーを試して頂けているのですが、masterではなく、binprotというブランチ(http://github.com/tmaesaka/memcached/tree/binprot)をチェックアウトしてください。
さて来月、memcached hackathonの参加に米国に行きますので、その際にmemcached Nightで皆さまから頂いたフィードバックや日本での普及活動を口頭で伝えてきますね。
Perl, Binary and Memcached
The last few days I’ve been working on updating the binary protocol test in the latest memcached development branch to comply with the latest binary protocol specification. Prior to this update, the test client was sending an invalid request to the server, which as a consequence made the test hang and never finish.
In brief, this is the big difference:
Previously, CAS (compare and swap) value was treated as part of the extra header that is appended/serialized behind the request/response header. In the latest specification, CAS value is a required 8 byte field in the 24 byte request/response header (header size in the previous version was 16 bytes). Other than that, the rest were minor differences in the packet format of extra fields in certain commands. Easy work
Here is the actual diff:
http://github.com/tmaesaka/memcached/commit/67b4da9eb855ebe7695a197320232b8d25692f84
As you can see, the test suite currently used by memcached is Perl based. This was fortunate for me since Perl is the second language to C that I like. I also made the code style to be more “perl-like” by fixing the indents. Although heh, I can see a Perl programmer arguing that the use of if/else blocks in the test is not best practice.
You know, fixing the test was pretty meaningful to me since it had forced me to study the binary protocol specification, which I knew almost nothing about at the hackathon in Santa Clara, CA back in april. Hopefully I can make productive suggestions at the upcoming hackathon in Menlo Park, CA in october.
Great Fun at the MySQL Seminar
Last week I spoke at the MySQL APAC seminar in Tokyo as a guest speaker with Brian Aker on “Memcached and MySQL”. The seminar turned out to be very fun with just over one hundred attendees. You can checkout the photos from the official MySQL APAC blog (text is in Japanese).
At the end of the seminar, I showed a brief demo of the custom storage engine project that Trond Norbye and I have been working on for the last few weeks. If you’re interested, you can read more about it on his blog, or come on over to the memcached channel on freenode ![]()
