Toru Maesaka

Web addict and a hackaholic based in Tokyo

Archive for November, 2008

Google App Engine and it’s Memcache API

with one comment

Google App Engine (GAE) is something I’ve been meaning to look into for personal interest but have been failing to do up until now due to lazyness and being relatively busy.

So specifically, I’m interested in the Datastore API and the Memcache API since well, thats what I do. For those that aren’t familiar with GAE, it is a platform provided by Google that allows you to run your web application on their infrastructure. Using the Google infrastructure is done through a set of provided APIs and they take care of Scaling and HA issues for you. This means you don’t have to invest into hardware (elastic running cost) nor have to repair anything (other than your code of course). So, its a typical example of PaaS.

Taking a look at the Memcache API

Nowadays its gradually becoming common knowledge in the web industry that using memcached can help your site scale and reduce the response time dramatically in a cost-efficient fashion (adding a DB Slave vs memcached node). The question is, what’s behind Google’s Memcache API? On the App Engine documentation, it is only stated that:

The Memcache API has similar features to and is compatible with memcached by Danga Interactive.

So, its actuallly not stated that the backend is powered by memcached despite the name. This means that the backend can be anything like a distributed Google Sparse Hash over the wire. I guess what’s important is not so much the cache daemon but by keeping the interface consistent with memcached, developers that are familiar with memcached can use GAE without allergic reactions. Not to mention, memcached has a brilliant interface for a distributed cache.

Caching your data on GAE is uver simple. You first import the ‘memcache’ module from the GAE package:

from google.appengine.api import memcache

then call the appropriate API method for whatever it is that you want to do.

Just for fun I tried setting a value using a key thats longer than 250 bytes since the maximum length of a key that memcached will accept over the ASCII protocol is 250 bytes (aka 250 ASCII characters). So how about the App Engine?

from google.appengine.api import memcache
 
memcache.flush_all()
test_key = 'x' * 300
 
if not memcache.set(test_key, 'some_val'):
    print 'Failed to set'
    quit()
 
print "Looks like we're good = " + memcache.get(test_key)

Well, turns out this code didn’t run with this error message from my local app server:

Keys may not be more than 250 bytes in length, received 300 bytes

Hehe, this looks very memcached to me but who knows, this could also be deliberate to keep things consistent with memcached.

Memcache API and Datastore API in Action

Okay, so to see if the Memcache API + Datastore API performs just like what you would expect from memcached + MySQL, I wrote a simple GAE Web Application. Here is the sourcecode and screenshots of the application actually running on Google:

gae_memcache_api gae_datastore_api

All it does is, it populates your Cache and Persistent Storage with 64 rows that are 4KB each (so, 256KB in total) and measures how long it takes to bring it over to the application layer. This is obviously not enough to simulate data transfer in a real world web application but I figured its enough to make a point.

So as expected, retrieving data is faster by using the memcache API and in theory this performance should not degrade and run constantly even with increased concurrent connections and requests. On the other hand, performance of the Datastore API _could_ degrade. I’m saying “could” because as much as I’d like to prove this point, I didn’t really want to ab Google.

Btw, after quickly looking at the caching code in the SDK, it seems Memcache is emulated using Python’s Dictionary on the local development environment.

Taking a look into Cached Bytes

Conveniently, the Memcache API provides a simple way to fetch the amount of bytes that is currently being cached for you:

from google.appengine.api import memcache
 
stats = memcache.get_stats()
if stats: print stats['bytes']

Being a curious individual and a great stalker, I decided to use this information to compare whatever it is thats behind the Memcache API with memcached. You see, with memcached you don’t get the exact number of key/value bytes that you sent over the wire because memcached reports the total number of bytes it had consumed, including overheads per item (as it should). In other words, what memcached reports is “unique”.

So, below is what I got from comparing the Memcache API (on Google’s infrastructure) and the latest release of memcached (1.2.6) at the point of this blog entry:

1 x 128 byte value with a 5 byte key
Memcache API: 133 bytes
memcached-1.2.6: 184 bytes

64 x 128 byte values with 5 byte keys
Memcache API: 8512 bytes
memcached-1.2.6: 11776 bytes

128 x 128 byte values with 5 byte keys
Memcache API: 17024 bytes
memcached-1.2.6: 23552 bytes

Wow, according to the above results, Google’s Memcache backend is not showing any overhead in its report. Maybe it is a sparse map over the wire after all. But like I mentioned earlier, it doesn’t really matter what’s behind the API because what’s actually important is that its easy for us end-users to use and that it performs in an O(1) manner.

Conclusion

The Google App Engine Documentation rocks! like I mentioned on Twitter, the team that worked on the documentation should get a medal. It got me started in no time and gave me just enough information to start doing my own thing without getting frustrated from excessive information.

There are still unresolved questions like how sharding works for the Memcache API. I mean, do each application get a dedicated server instance(s) or are keys appended/prepended with an app_id in the background? The latter approach sounds simple and effective but it opens up another question of stats management. I guess a housekeeping index for each application would get around this issue but there is no programmable way from the outside to confirm this.

On a different note, I should stop being a stalker and just enjoy what’s been provided (though this is a really difficult thing to do once you dive into the world of engineering) :)

Written by tmaesaka

November 24th, 2008 at 5:31 am

Posted in memcached

Tagged with , ,

Kuala Lumpur Conference Trip

with 3 comments

Despite almost missing my flight to Malaysia due to unfortunate reasons, I successfully managed to get on the plane and make it to Kuala Lumpur last friday. My suggestion from this experience is to not arrive at the airport 30 minutes before your flight…

It was pretty cool to arrive in Kuala Lumpur on the same day as RPK’s release day and experiencing the delicious Ipoh white coffee at the KL airport was a great start of my conference trip.

The conference was fun and it was nice meeting new people in this region. As for my sessions, the interest in memcached was astonishing and it was great to get mixi’s name out there. Hopefully “slow” media sites in South East Asia (which shall remain nameless) will start utilizing memcached ;)

I also managed to get the word out for Drizzle in this region which was also fun with lots of questions coming my way… the funniest was being asked “hang on, is that David Axmark?” with a totally surprised look.

Here are the slides from my memcached talk though my presentation style is less words on the slides and more talking so you may not find it informative by itself. Other than that, here are my thoughts from the stay:

  • People are friendly and helpful
  • Hotel prices there are awesome
  • Great to hear that the Malaysian Gov supports Open Source
  • You can get around with just English in Kuala Lumpur
  • Their curry didn’t seem hot at start but gradually kicked in
  • mixi loads much faster than facebook in KL (I still love you guys)

For those that are interested, I’ve taken lots of photos and they are now up on Flickr. Thumbs up to the event organizers and thank you for taking care of me while I was there :)

Written by tmaesaka

November 13th, 2008 at 10:29 pm

Posted in event, travel

Tagged with ,

Studying the Malaysian Railway System

with 3 comments

I’m flying to Malaysia tomorrow to speak at FOSS.my and hey, until ten minutes ago I didn’t even bother finding out how to get to the hotel from the Kuala Lumpur International Airport.

Originally I was hoping for a courtesy shuttle from the airport to the hotel since the hotel looks fairly upclass or just grab a taxi. The problem is that there is no courtesy shuttle and I don’t know how and what to instruct the taxi driver. Sure, a printout of Google Maps could help but I didn’t feel too comfortable about it.

So, after searching for ways to get to Mid Valley (where I’m staying) it turns out theres a fancy Train Station there which I can get to by first getting on the KLIA Ekspres from the International Airport to KL Sentral and transfer on to Rawang-Seremban Line, where Mid Valley is the next station.

I think I’ll give this route a try tomorrow and needless to say, I’m going to print this entry and carry it with me. Programmers love to solve problems and finding your way to the hotel in a country that you’ve never been to and can’t speak the language seems to be no exception. Totally looking forward to this mini adventure :)

Written by tmaesaka

November 7th, 2008 at 12:08 am

Posted in event, travel

Tagged with , ,