For those that do not follow the active development of memcached, the current excitement in the community is the new binary protocol that will be introduced in the upcoming 1.3 series. If you’d like a quick and easy introduction on the binary protocol, you can see the slides from my presentation.
So, with such significant advances, the 1.3 codebase is obviously going to look a bit different to the 1.2 codebase, but even then the overall software architecture is the same. Whats significantly different however, is how the stats opreration is implemented. This is why I am writing this entry, to answer the questions that people might have in advance.
Background in a nutshell
Looking further ahead, beyond the binary protocol, the memcached community is aiming to achieve a pluggable engine architecture, which will allow memcached to satisfy unique requirements that people might have. These unique requirements can be things like, persistent storage, data dumping, server-side replication and etc. All these fancy stuff obviously goes against the original motives of memcached but I will save this discussion for another day, as it is not appropriate for this entry :)
Supporting third party engines mean that memcached must be able to send back engine specific stats to the client (most likely a system admin). To achieve this, memcached’s stats handling had to be made flexible by splitting the concept of “stats” into two segments, “core server stats” and “engine stats” and hence the refactoring.
The new approach
Previously, stats was done by incrementing/accumulating values inside the stats structure (defined in memcached.h). The actual increments were done mostly in the server code.
In the new approach, an engine does not have to depend on the stats structure because this would limit the engine to this structure. Adding an opaque pointer to the structure for pointing to something engine specific could get around this problem but lets not go there… no, no, no.
All non-server stats are pushed out to the slabber code since this is the closest thing to an engine in memcached at the moment. In this model, if a client asks for something unique (e.g. “stats malloc”), then the server will query the engine for “malloc”. If the engine has no clue of what the client is asking for, then the server will simply return an error.
Likewise, if the client asks for non-specific stats (“stats\r\n” in the ASCII protocol), memcached will return the merged result of itself (core-server stats) and stats for general purpose from the slabber (bytes written, num of get/set and etc).
If you’d like to see the actual code, take a look at this branch:
http://github.com/tmaesaka/memcached/commits/binprot
Make sure you checkout the “binprot” branch.
Binary Stats is Packet-Per-Stat
Before I talk about how stats is implemented, I must mention that with the binary protocol, each statistical information is returned in it’s own packet (as mentioned in the documentation). The key contains the name of the statistical information and the value contains the associated value. Transmission termination is signaled with a packet with no key and value.
How it works
So how does this work? the laziest solution is to enforce the responsibility and implementation of data formatting/serialization to the engine, but this has the potential pitfall of:
- Server Failure due to incorrect formatting/serialization by the engine.
Instead, an engine is given a callback that it can use to format/serialize stats data for returning to the core server. This way we can reduce the likelihood of an engine returning something invalid to the core server (assuming that the implementer uses the callback of course). Specifically, the engine needs to implement the following function:
char *get_stats(const char *stat_name, uint32_t (*add_stats)(
char *buf, const char *key, const uint16_t klen,
const char *val, const uint32_t vlen), int *buflen);
and notice the callback:
uint32_t (*add_stats)(char *buf, const char *key, const uint16_t klen,
const char *val, const uint32_t vlen);
where the buf argument is the buffer that the entry will be serialized to, the key argument should be the name of the statistical information (e.g. “bytes”) and the value should contain the associated value (e.g. “1024″). The remaining klen and vlen arguments should represent the length of the key and value (e.g. strlen(“bytes”)).
This callback returns the number of bytes it had appended to the provided buffer, which the engine can use to forward the write pointer for further appending. Just make sure you allocate enough memory in advance (each append has a 24 byte overhead for the binary protocol).
Another thing to mention is that the engine does not have to worry whether the return data is for the ascii or binary protocol, since memcached will give the appropriate callback (with different logic that corresponds to the protocol type) to the engine.
Once the engine populates the buffer with data that it would like to report, it can then simply return it to the core server, where it will be sent back to the client.
So, get_stats() could look something like this:
/* assume, foo_key = "hello" and foo_val = "world" */
char *buf, *ptr;
uint32_t nbytes = 0;
if ((buf = malloc(num_of_bytes)) == NULL)
return NULL;
ptr = buf;
nbytes = add_stats(ptr, foo_key, strlen(foo_key),
foo_val, strlen(foo_val));
if (!nbytes)
return NULL;
ptr += nbytes;
*buflen += nbytes;
...
*buflen += add_stats(ptr, NULL, 0, NULL, 0); /* seal with terminator */
return buf;
Thats it! minimal coding is required from the engine implementer.
Good and the not so Good
Like all things, stats over the binary protocol has its ups and downs. The good thing about the packet-per-row approach is that the client library should be easier to write, especially for languages that aren’t so string friendly (e.g. C compared to Perl). I’ve already heard that it made libmemcached’s life happier.
The downside however is the network cost of binary stats compared to the ASCII protocol. Because a packet must be created for each statistical information, the total bytes to transmit over the wire can be relatively large. For example, if you want to return ten stat rows back to the client, then the number of bytes to transmit is:
“264 bytes (sum of packet headers, including terminator) + size of each key and value”
whereas with the ASCII protocol it would be just:
“size of each key/value + 20 bytes (sum of CRLF) + 5 bytes (terminator)”
Sure, the size difference may look trivial and you may not issue the stats command much but some system admins might care…
Conclusion
As you can see, a decent amount of thought has been put into the 1.3 series by the memcached community, and as a result, memcached will keep getting better. It will stay simple as it always were and at the same time it will hopefully be able to do new things by accepting external engines in the future.
The stats code refactoring is a small (but important) step towards this goal :)
Toru Maesaka memcached, oss memcached, programming