Archive for the ‘oss’ Category
Open Source Conference in Malaysia
So I’ve been bugging Colin Charles to invite me over to Malaysia for the last couple of months and what does he offer me? an opportunity to speak at a open source conference in Malaysia
FOSS.my is a two day event (9th & 10th next month) and as stated on their conference homepage, the aim of this conference is to cover technical aspects of various OSS projects without any business/sales intervention for those that follow open source technology in South East Asia (and other regions too of course). The cool thing is that after telling mixi about this event, they liked the idea so much they decided to sponsor the event immediately. I didn’t really expect this but hey, awesome.
At this conference, I will be doing two talks where one will go over how mixi uses various OSS technologies to power the largest social networking service in Japan. The other talk will cover how the memcached internals work and latest hot topics in the development community like the upcoming binary protocol.
I’ve never been to Malaysia before so I’m totally looking forward to this trip.
memcached Hackathon #5 at Sun Microsystems
Last week I was in the valley for the fifth memcached Hackathon at Sun Microsystems and visiting some friends at Six Apart HQ. The hackathon was so fun, we ended up leaving at 2am on a weeknight! Thanks to Matt Ingenthron and Sun Microsystems for organizing the event and providing food and space for this hackathon
In the previous hackathon, we mostly exchanged ideas on the binary protocol and the storage engine interface. This time it was more code oriented and we reviewed and tested the progress everyone had made in the latest binary protocol tree. Unfortunately I couldn’t cover the whole hackathon but here is a summary of discussions from the agenda that I was involved in:
Binary Protocol - Add an engine specific OPCODE
No disagreements here. An opcode is represented by a 1 byte unsigned integer so the consensus was that we should dedicate anything over 127 (0×7F) for special operations.
Storage Interface
We didn’t get around to discussing the interface in depth since getting the binary protocol released has greater priority at the moment. Trond however showed me some of the interesting work that he has been doing which will hopefully be out in the open soon.
Test Framework
The issue here is that tests aren’t actively been written. Opinions voiced on this issue was that some people aren’t comfortable with Perl, and thus difficult to understand the current Perl based test system.
Switching to a different test framework in a different language is easy but the problem is that this is a never-ending story. People can easily start demanding other languages that they feel comfortable in (python, java, ruby, lua, …). We briefly discussed that the ideal model is to be able to add tests written in any language but we didn’t go into depth on how we would actually achieve this.
Personally, I have nothing against the current test framework (mind you I like Perl) but I think if we were to switch, a solely C based framework is a good move. I am saying this because those that would think about opening up the memcached package and editing it can most likely write C (is this an assertive assumption? heh).
Client Libraries
Unfortunately I couldn’t get around to participating in the client talk but client-side replication work was being done for libmemcached and I heard from Brian Aker that there was good progress.
Jonathan (hachi) reviewed my binary protocol patch for Cache::Memcached and found that some protocol negotiation assumptions I made in the code can be improved. He is also looking at optimizing the code by subclassing the patch (reduces the number of conditional selections, perl method calls and hash lookups).
Scaling on Highly Threaded Servers
We didn’t really discuss this in depth since we were busy reviewing and testing the server code but as far as I know, we talked about how locking can be improved in memcached. Looking into and preparing for this is a good idea since we are entering a massively concurrent age. To the contrary, guys from Facebook mentioned that they were getting sufficient throughput with the current locking scheme which was awesome to hear.
The engine plugin rearchitecture should fit well with this project since we can interchange different versions of the slabber engine with different locking strategies and make them compete to be the next default memcached engine.
Conclusion
The hackathon was fun and we got a lot done in terms of finding things to improve on. It was great to catch up with guys that I communicate a lot with online and talk tech in person. It was awesome that Brad turned up as well. As for code improvements, Dustin’s test code found an issue in the stats subsystem always returning a zero for an opaque value. A little bit of coding looked necessary to get around this problem since an opaque value is held by the connection structure, which the engine does not have access to (it shouldn’t) but I was bored on my flight back to Tokyo so this problem is now fixed and pushed to my tree
Rethinking the Query Cache for Drizzle
There is a mutual understanding in the Drizzle community that the MySQL query cache works well for a small database but isn’t sufficient for relatively large scale usages. Does your application involve a lot of database updates? if so, you’ll probably face fragmentation issues in the query cache (though using the query cache isn’t suitable for use cases like this).
Caching is the key ingredient in boosting the performance of any software that requires significant amount of computation, hence it is something that can’t be overlooked. So how can we improve Drizzle?
The idea is to create a pluggable query cache subsystem that can work in a large scale environment. Drizzle, being a micro-kernel DBMS, it makes sense to make the cache component pluggable and let the DBA choose the caching solution of their choice. This is exactly what I’m working on at the moment and my first plugin will allow Drizzle to use memcached as its query cache.
For example, a DBA could hook up their memcached pool to Drizzle and use several gigabytes of fast cache space to cache their results.
Things to consider
- Does the DBA really want to cache results?
- Does the result construction take long enough to care?
- Do we want to specify a specific SQL statement to always cache?
- Do we want to enforce a certain table to be cached?
- Transactional Engines
If we can satisfy the above points and achieve modularity, I think its a total win. For those that like diagrams, here is the architecture that is on my mind at the moment:
Benefits of using memcached
memcached is proven to work and help scale web applications in a cost effective fashion by various players in the web industry. It is also fast. The time complexity of fetching a cached result from memcached is O(1), which is an order we all love. Furthermore, by using memcached, the fragmentation issue disappears since this is a problem that the memcached community had to face in the past and successfully overcame by developing the slab subsystem.
Want to scale? with consistent hashing enabled, you can greatly reduce the number of cache misses from adding/removing a node from a live pool. Got spare boxes lying around? hook them up and powerup Drizzle! Need support? both memcached and Drizzle community members are heartwarming people.
Other Solutions work Too!
The beauty of modularity is that you can create and use your own solution for your unique requirements. For example lets assume that there is a webshop that wants to keep the number of physical servers down (e.g. limited monetary/space resource).
To satisfy the requirement stated above, you could cache to a fantastically fast hash database, such as Tokyo Cabinet (much, much faster than BDB). If you haven’t heard of it, you should look at the incredible benchmark comparison). So, what I really wanted to say is that the microkernel property of Drizzle will open up a lot of new possibilities for your application and help you tackle the new requirements that seem to come out of no where.
Where from here?
Currently going through the UDF -> Plugin Architecture conversion done by Mark, and planning on basing the code on his logging plugin while its fantastically simple. My work will be done in:
- lp:~tmaesaka/drizzle/pluggable-qcache
I’ll hopefully have something decent to show soon, and I will keep people updated on my blog, IRC and the Mailing List (drizzle-discuss).
So that is all I have to say for now… If you have any suggestions, please do enlighten me ![]()
Thoughts on UTF-8 over CJK charsets in Drizzle
Internally, Drizzle will use UTF-8 everywhere and _only_ UTF-8. This is simply because UTF-8 is the choice of encoding within the Drizzle community at the moment. To me, this decision makes sense since UTF-8 is popular in the areas that Drizzle is targetting (Web and the Cloud). Limiting to UTF-8 also means that the Drizzle codebase would become cleaner, thus easier to maintain. However, there are arguments against it in the community so this could change in the future.
So, what does this mean to those that are outside regions that use latin characters, specifically East Asia? Would this cause an uproar?
Few months ago, Brian Aker had asked me about this and after a brief discussion with Jay Pipes couple of days ago, I figured I should blog about this so I can keep it as a note for myself and hopefully gain feedbacks from those that stumbles across this entry. Here are my thoughts based on my knowledge on the Japanese web industry:
Web Industry Standard in Japan
Looking at the web industry trend in Japan, UTF-8 is becoming the prominent encoding, despite the fact that UTF-8 requires more computation power and space than Japanese CJK charsets. For example, mixi.jp (one of the largest websites in Japan) still uses EUC-JP (CJK family) due to historical reasons but if you look at their newer features like video sharing, you can see that they’ve begun adopting UTF-8. Yahoo! JP, COOKPAD, ja.wikipedia and Livedoor are great examples of large Japanese sites too.
The reason UTF-8 is becoming popular in the .jp domain IMHO is:
- The default encoding of XHTML is UTF-8/UTF-16
- All browsers support UTF-8 nowadays (if it doesn’t you shouldn’t be using it)
- Theoretically, more characters can be represented in UTF-8
- Theoretically, existing ASCII functions can be used
However, there are certainly cases where web developers might need to use their local encoding for supporting things like mobile devices (Shift-JIS in Japan). These unique requirements IMHO should be handled by the client, such that rather than making DBMS responsible, you should encode the returned result to whatever you like in the application layer before rendering it.
More overhead per character
Using UTF-8 means that there is going to be an estimated average of 1 byte overhead per character (typically an EUC-JP character is 2 bytes), hence if you have a lot of textual data already in either of CJK encodings, you’re definitely going to use more storage (the more data you have, the more significance).
Eating more space may seem significant but to me, whats more significant is the cost reduction in memory and storage mediums nowadays. If you begin facing problems due to having too much data, its probably time to consider horizontal partitioning anyway.
Conclusion
The topic discussed in this entry is very sensitive, and it is merely my personal opinion. Every encoding has its ups and downs like all things (they were designed for a purpose after all) and hence there are numerous amount of people with different opinions. Satisfying everyone is difficult, but who knows? UTF-8 alone may satisfy majority of the users that we are targeting. If it doesn’t then I guess we’ll have to think again… We also need to look into internal sort performance if we go pure UTF-8.
The conclusion Jay and I came up with in our brief discussion was that providing a conversion tool in the Drizzle package could be a good start to get people jumping into the UTF-8 boat. There is no specific plan nor we’ve decided to do this yet but if we were to do it, I’m thinking that the tool can be something simple that uses GNU libiconv.
Hey, there is always the brute solution of storing textual data of your choice in binary ![]()
Why Stats was refactored in memcached-1.3
For those that do not follow the active development of memcached, the current excitement in the community is the new binary protocol that will be introduced in the upcoming 1.3 series. If you’d like a quick and easy introduction on the binary protocol, you can see the slides from my presentation.
So, with such significant advances, the 1.3 codebase is obviously going to look a bit different to the 1.2 codebase, but even then the overall software architecture is the same. Whats significantly different however, is how the stats opreration is implemented. This is why I am writing this entry, to answer the questions that people might have in advance.
Background in a nutshell
Looking further ahead, beyond the binary protocol, the memcached community is aiming to achieve a pluggable engine architecture, which will allow memcached to satisfy unique requirements that people might have. These unique requirements can be things like, persistent storage, data dumping, server-side replication and etc. All these fancy stuff obviously goes against the original motives of memcached but I will save this discussion for another day, as it is not appropriate for this entry
Supporting third party engines mean that memcached must be able to send back engine specific stats to the client (most likely a system admin). To achieve this, memcached’s stats handling had to be made flexible by splitting the concept of “stats” into two segments, “core server stats” and “engine stats” and hence the refactoring.
The new approach
Previously, stats was done by incrementing/accumulating values inside the stats structure (defined in memcached.h). The actual increments were done mostly in the server code.
In the new approach, an engine does not have to depend on the stats structure because this would limit the engine to this structure. Adding an opaque pointer to the structure for pointing to something engine specific could get around this problem but lets not go there… no, no, no.
All non-server stats are pushed out to the slabber code since this is the closest thing to an engine in memcached at the moment. In this model, if a client asks for something unique (e.g. “stats malloc”), then the server will query the engine for “malloc”. If the engine has no clue of what the client is asking for, then the server will simply return an error.
Likewise, if the client asks for non-specific stats (”stats\r\n” in the ASCII protocol), memcached will return the merged result of itself (core-server stats) and stats for general purpose from the slabber (bytes written, num of get/set and etc).
If you’d like to see the actual code, take a look at this branch:
http://github.com/tmaesaka/memcached/commits/binprot
Make sure you checkout the “binprot” branch.
Binary Stats is Packet-Per-Stat
Before I talk about how stats is implemented, I must mention that with the binary protocol, each statistical information is returned in it’s own packet (as mentioned in the documentation). The key contains the name of the statistical information and the value contains the associated value. Transmission termination is signaled with a packet with no key and value.
How it works
So how does this work? the laziest solution is to enforce the responsibility and implementation of data formatting/serialization to the engine, but this has the potential pitfall of:
- Server Failure due to incorrect formatting/serialization by the engine.
Instead, an engine is given a callback that it can use to format/serialize stats data for returning to the core server. This way we can reduce the likelihood of an engine returning something invalid to the core server (assuming that the implementer uses the callback of course). Specifically, the engine needs to implement the following function:
char *get_stats(const char *stat_name, uint32_t (*add_stats)( char *buf, const char *key, const uint16_t klen, const char *val, const uint32_t vlen), int *buflen);
and notice the callback:
uint32_t (*add_stats)(char *buf, const char *key, const uint16_t klen, const char *val, const uint32_t vlen);
where the buf argument is the buffer that the entry will be serialized to, the key argument should be the name of the statistical information (e.g. “bytes”) and the value should contain the associated value (e.g. “1024″). The remaining klen and vlen arguments should represent the length of the key and value (e.g. strlen(”bytes”)).
This callback returns the number of bytes it had appended to the provided buffer, which the engine can use to forward the write pointer for further appending. Just make sure you allocate enough memory in advance (each append has a 24 byte overhead for the binary protocol).
Another thing to mention is that the engine does not have to worry whether the return data is for the ascii or binary protocol, since memcached will give the appropriate callback (with different logic that corresponds to the protocol type) to the engine.
Once the engine populates the buffer with data that it would like to report, it can then simply return it to the core server, where it will be sent back to the client.
So, get_stats() could look something like this:
/* assume, foo_key = "hello" and foo_val = "world" */ char *buf, *ptr; uint32_t nbytes = 0; if ((buf = malloc(num_of_bytes)) == NULL) return NULL; ptr = buf; nbytes = add_stats(ptr, foo_key, strlen(foo_key), foo_val, strlen(foo_val)); if (!nbytes) return NULL; ptr += nbytes; *buflen += nbytes; ... *buflen += add_stats(ptr, NULL, 0, NULL, 0); /* seal with terminator */ return buf;
Thats it! minimal coding is required from the engine implementer.
Good and the not so Good
Like all things, stats over the binary protocol has its ups and downs. The good thing about the packet-per-row approach is that the client library should be easier to write, especially for languages that aren’t so string friendly (e.g. C compared to Perl). I’ve already heard that it made libmemcached’s life happier.
The downside however is the network cost of binary stats compared to the ASCII protocol. Because a packet must be created for each statistical information, the total bytes to transmit over the wire can be relatively large. For example, if you want to return ten stat rows back to the client, then the number of bytes to transmit is:
“264 bytes (sum of packet headers, including terminator) + size of each key and value”
whereas with the ASCII protocol it would be just:
“size of each key/value + 20 bytes (sum of CRLF) + 5 bytes (terminator)”
Sure, the size difference may look trivial and you may not issue the stats command much but some system admins might care…
Conclusion
As you can see, a decent amount of thought has been put into the 1.3 series by the memcached community, and as a result, memcached will keep getting better. It will stay simple as it always were and at the same time it will hopefully be able to do new things by accepting external engines in the future.
The stats code refactoring is a small (but important) step towards this goal ![]()
Drizzle Article in Japanese
Yesterday, an article I wrote for a fairly large Japanese IT news portal called @IT was made public and I figured I should blog about it in English, so that I can tell my fellow Drizzlers about it. Here is the link to the article even though it is in Nihongo
http://www.atmarkit.co.jp/fdb/rensai/drzl_pj/drzl01.html
This three page multi-byte article starts by covering the concept of how the project was launched by Brian Aker, and the overall concept and philosophy of Drizzle. I then moved on to describing how we are modernizing things, for example adopting the C99 standard, targeting modern hardware (lots and lots of cores) and the microkernel architecture. I also described how we intend on working with other open source communities by actively using open source libraries that are out there, rather than writing our own or use MySQL’s existing libraries.
One of the misunderstandings that came up after the announcement of Drizzle at OSCON was that Drizzle was being compared against SQLite. I was afraid that the same could happen in Japan so I made sure that this misunderstanding wouldn’t happen in my article. If you’re interested in the difference, it is well described in the Drizzle Wiki:
http://drizzle.wikia.com/wiki/Drizzle_compared_with_SQLite
Other than that, I thoroughly explained how we are committed to being open and transparent, hence constantly welcoming people and any suggestions and patches that they might have. Even if you find your suggestion to be something trivial, it could turn out to be a breakthrough for the community.
So the point is, lets all stimulate each other, have fun, and make a great piece of software ![]()
Perl, Binary and Memcached
The last few days I’ve been working on updating the binary protocol test in the latest memcached development branch to comply with the latest binary protocol specification. Prior to this update, the test client was sending an invalid request to the server, which as a consequence made the test hang and never finish.
In brief, this is the big difference:
Previously, CAS (compare and swap) value was treated as part of the extra header that is appended/serialized behind the request/response header. In the latest specification, CAS value is a required 8 byte field in the 24 byte request/response header (header size in the previous version was 16 bytes). Other than that, the rest were minor differences in the packet format of extra fields in certain commands. Easy work
Here is the actual diff:
http://github.com/tmaesaka/memcached/commit/67b4da9eb855ebe7695a197320232b8d25692f84
As you can see, the test suite currently used by memcached is Perl based. This was fortunate for me since Perl is the second language to C that I like. I also made the code style to be more “perl-like” by fixing the indents. Although heh, I can see a Perl programmer arguing that the use of if/else blocks in the test is not best practice.
You know, fixing the test was pretty meaningful to me since it had forced me to study the binary protocol specification, which I knew almost nothing about at the hackathon in Santa Clara, CA back in april. Hopefully I can make productive suggestions at the upcoming hackathon in Menlo Park, CA in october.
Mac OS X, Ubuntu and Drizzle
So admittedly, Mac OS X is currently not the most friendly platform to work with Drizzle, mostly due to library issues.
OS X has several weird hacks in it due to licensing issues (libreadline comes into mind first). Sure, MacPorts, Darwin Ports and etc could get around this problem but should this be necessary? Personally I dislike resorting to these solutions. Fortunately I’ve been doing all my Drizzle work with Ubuntu on a dedicated server so I’ve yet to come across any build related issues. However, it kind of sucks not to be able to take my Mac out to a cafe in the weekend and work there without connectivity.
So to make my life happier, I installed Ubuntu on my MacBook Pro (alongside OS X of course).
I came across few problems like corrupted partition table in the process of getting Ubuntu working but the following Ubuntu threads helped greatly:
General Instructions
Boot related problems when using Hardy Heron (Ubuntu 8.04)
You know, getting Ubuntu running on my Mac was entertaining since I was talking to Monty Taylor about his thoughts on how using a Mac is selling out yesterday. Now what does this make me now?
Happy Hacking ![]()
Drizzle, out in the open
So I’ve been fortunate enough to participate in developing Drizzle, which is a microkernel fork of MySQL that you can read more about on Brian Aker’s blog post.
In brief we are getting rid of components that we find unnecessary in MySQL by default, and instead making them optional by refactoring the server to be modular, aka microkernel. Another words, we are trying to develop a lean, fast, simple and extensible RDBMS that would fit well in mid and large scale web applications.
How? well, take Query Cache for example. QC works well in a one-man database but it has very small (if not no) effect when we start thinking big, and especially in the web industry. So why bother keeping it? what would be better is if we could _optionally_ make Drizzle use a cluster of memcached for query caching, which would also allow many database instances to share a common cache. Same things can be said about many other components, such as ACL and Stored Procedures. This is exactly why we are moving to a microkernel architecture. If you want something special, you should be able to customize the server in a relatively easy fashion and satisfy your requirements, rather than having to refactor the server code yourself.
Indeed, not everyone needs a microkernel database, in fact I assume most people won’t. However, there are enough web developers and companies in the small portion of the pie that would love a microkernel database to solve the problems that they are facing today. This is exactly why we don’t consider Drizzle to be a MySQL replacement.
If you’d like more information, do check out our project page on Launchpad and browse through the mailing list archive. Drizzle development is done in a true open source fashion by using open resources and tools like Bazaar and Launchpad. This means that everyone is free to come up with improvement suggestions/patches and submit it to the drizzle community.
Drizzle has been very fun and I thank Brian for getting me involved in such a fun project
Btw, I wrote a blog post on Drizzle in Japanese on the Mixi engineering blog too.
Great Fun at the MySQL Seminar
Last week I spoke at the MySQL APAC seminar in Tokyo as a guest speaker with Brian Aker on “Memcached and MySQL”. The seminar turned out to be very fun with just over one hundred attendees. You can checkout the photos from the official MySQL APAC blog (text is in Japanese).
At the end of the seminar, I showed a brief demo of the custom storage engine project that Trond Norbye and I have been working on for the last few weeks. If you’re interested, you can read more about it on his blog, or come on over to the memcached channel on freenode ![]()
