<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Toru Maesaka &#187; google</title>
	<atom:link href="http://torum.net/tag/google/feed/" rel="self" type="application/rss+xml" />
	<link>http://torum.net</link>
	<description>Hackaholic and a Web Addict based in Tokyo</description>
	<lastBuildDate>Sat, 01 Oct 2011 18:46:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
		<item>
		<title>Google App Engine and it&#8217;s Memcache API</title>
		<link>http://torum.net/2008/11/gae-memcache-api/</link>
		<comments>http://torum.net/2008/11/gae-memcache-api/#comments</comments>
		<pubDate>Sun, 23 Nov 2008 20:31:40 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[memcached]]></category>
		<category><![CDATA[gae]]></category>
		<category><![CDATA[google]]></category>

		<guid isPermaLink="false">http://torum.net/?p=676</guid>
		<description><![CDATA[Google App Engine (GAE) is something I&#8217;ve been meaning to look into for personal interest but have been failing to do up until now due to lazyness and being relatively busy. So specifically, I&#8217;m interested in the Datastore API and the Memcache API since well, thats what I do. For those that aren&#8217;t familiar with [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://code.google.com/appengine/">Google App Engine</a> (GAE) is something I&#8217;ve been meaning to look into for personal interest but have been failing to do up until now due to lazyness and being relatively busy.</p>
<p>So specifically, I&#8217;m interested in the <a href="http://code.google.com/appengine/docs/datastore/">Datastore API</a> and the <a href="http://code.google.com/appengine/docs/memcache/">Memcache API</a> since well, thats what I do. For those that aren&#8217;t familiar with GAE, it is a platform provided by Google that allows you to run your web application on their infrastructure. Using the Google infrastructure is done through a set of <a href="http://code.google.com/appengine/docs/">provided APIs</a> and they take care of Scaling and <a href="http://en.wikipedia.org/wiki/High_availability">HA</a> issues for you. This means you don&#8217;t have to invest into hardware (elastic running cost) nor have to repair anything (other than your code of course). So, its a typical example of <a href="http://en.wikipedia.org/wiki/Platform_as_a_service">PaaS</a>.</p>
<h3>Taking a look at the Memcache API</h3>
<p>Nowadays its gradually becoming common knowledge in the web industry that using <a href="http://www.danga.com/memcached">memcached</a> can help your site scale and reduce the response time dramatically in a cost-efficient fashion (adding a DB Slave vs memcached node). The question is, what&#8217;s behind Google&#8217;s Memcache API? On the App Engine documentation, it is only stated that:</p>
<blockquote><p>The Memcache API has similar features to and is compatible with memcached by Danga Interactive.</p></blockquote>
<p>So, its actuallly not stated that the backend is powered by memcached despite the name. This means that the backend can be anything like a distributed <a href="http://code.google.com/p/google-sparsehash/">Google Sparse Hash</a> over the wire. I guess what&#8217;s important is not so much the cache daemon but by keeping the interface consistent with memcached, developers that are familiar with memcached can use GAE without allergic reactions. Not to mention, memcached has a brilliant interface for a distributed cache.</p>
<p>Caching your data on GAE is uver simple. You first import the &#8216;memcache&#8217; module from the GAE package:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> google.<span style="color: black;">appengine</span>.<span style="color: black;">api</span> <span style="color: #ff7700;font-weight:bold;">import</span> memcache</pre></div></div>

<p>then call the appropriate <a href="http://code.google.com/appengine/docs/memcache/clientclass.html">API method</a> for whatever it is that you want to do.</p>
<p>Just for fun I tried setting a value using a key thats longer than 250 bytes since the maximum length of a key that memcached will accept over the <a href="http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt">ASCII protocol</a> is 250 bytes (aka 250 ASCII characters). So how about the App Engine?</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> google.<span style="color: black;">appengine</span>.<span style="color: black;">api</span> <span style="color: #ff7700;font-weight:bold;">import</span> memcache
&nbsp;
memcache.<span style="color: black;">flush_all</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
test_key = <span style="color: #483d8b;">'x'</span> <span style="color: #66cc66;">*</span> <span style="color: #ff4500;">300</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> memcache.<span style="color: #008000;">set</span><span style="color: black;">&#40;</span>test_key, <span style="color: #483d8b;">'some_val'</span><span style="color: black;">&#41;</span>:
    <span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">'Failed to set'</span>
    quit<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">print</span> <span style="color: #483d8b;">&quot;Looks like we're good = &quot;</span> + memcache.<span style="color: black;">get</span><span style="color: black;">&#40;</span>test_key<span style="color: black;">&#41;</span></pre></div></div>

<p>Well, turns out this code didn&#8217;t run with this error message from my local app server:</p>
<blockquote><p><strong><type 'exceptions.ValueError'>Keys may not be more than 250 bytes in length, received 300 bytes</strong>
</p></blockquote>
<p>Hehe, this looks very memcached to me but who knows, this could also be deliberate to keep things consistent with memcached.</p>
<h3>Memcache API and Datastore API in Action</h3>
<p>Okay, so to see if the Memcache API + Datastore API performs just like what you would expect from memcached + MySQL, I wrote a simple GAE Web Application. Here is <a href="http://torum.net/dist/data_api_example.tar.gz">the sourcecode</a> and screenshots of the application actually running on Google:</p>
<p style="text-align: center;"><a href="http://www.flickr.com/photos/tmaesaka/3045435635/" title="gae_memcache_api by tmaesaka, on Flickr"><img src="http://farm4.static.flickr.com/3159/3045435635_8f71466620_m.jpg" width="240" height="142" alt="gae_memcache_api" /></a> <a href="http://www.flickr.com/photos/tmaesaka/3046270024/" title="gae_datastore_api by tmaesaka, on Flickr"><img src="http://farm4.static.flickr.com/3204/3046270024_d39f1058a3_m.jpg" width="240" height="141" alt="gae_datastore_api" /></a></p>
<p>All it does is, it populates your Cache and Persistent Storage with 64 rows that are 4KB each (so, 256KB in total) and measures how long it takes to bring it over to the application layer. This is obviously not enough to simulate data transfer in a real world web application but I figured its enough to make a point.</p>
<p>So as expected, retrieving data is faster by using the memcache API and in theory this performance should not  degrade and run constantly even with increased concurrent connections and requests. On the other hand, performance of the Datastore API _could_ degrade. I&#8217;m saying &#8220;could&#8221; because as much as I&#8217;d like to prove this point, I didn&#8217;t really want to <a href="http://httpd.apache.org/docs/2.0/programs/ab.html">ab</a> Google.</p>
<p>Btw, after quickly looking at the caching code in the SDK, it seems Memcache is emulated using <a href="http://www.python.org/doc/2.5.2/tut/node7.html">Python&#8217;s Dictionary</a> on the local development environment.</p>
<h3>Taking a look into Cached Bytes</h3>
<p>Conveniently, the Memcache API provides a simple way to fetch the amount of bytes that is currently being cached for you:</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> google.<span style="color: black;">appengine</span>.<span style="color: black;">api</span> <span style="color: #ff7700;font-weight:bold;">import</span> memcache
&nbsp;
stats = memcache.<span style="color: black;">get_stats</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
<span style="color: #ff7700;font-weight:bold;">if</span> stats: <span style="color: #ff7700;font-weight:bold;">print</span> stats<span style="color: black;">&#91;</span><span style="color: #483d8b;">'bytes'</span><span style="color: black;">&#93;</span></pre></div></div>

<p>Being a curious individual and a great stalker, I decided to use this information to compare whatever it is thats behind the Memcache API with memcached. You see, with memcached you don&#8217;t get the exact number of key/value bytes that you sent over the wire because memcached reports the total number of bytes it had consumed, including overheads per item (as it should). In other words, what memcached reports is &#8220;unique&#8221;.</p>
<p>So, below is what I got from comparing the Memcache API (on Google&#8217;s infrastructure) and the latest release of memcached (1.2.6) at the point of this blog entry:</p>
<div style="margin-left: 30px;">
<strong>1 x 128 byte value with a 5 byte key</strong><br />
Memcache API: 133 bytes<br />
memcached-1.2.6: 184 bytes</p>
<p><strong>64 x 128 byte values with 5 byte keys</strong><br />
Memcache API: 8512 bytes<br />
memcached-1.2.6: 11776 bytes</p>
<p><strong>128 x 128 byte values with 5 byte keys</strong><br />
Memcache API: 17024 bytes<br />
memcached-1.2.6: 23552 bytes
</div>
<p>Wow, according to the above results, Google&#8217;s Memcache backend is not showing any overhead in its report. Maybe it is a sparse map over the wire after all. But like I mentioned earlier, it doesn&#8217;t really matter what&#8217;s behind the API because what&#8217;s actually important is that its easy for us end-users to use and that it performs in an O(1) manner.</p>
<h3>Conclusion</h3>
<p>The Google App Engine Documentation rocks! like <a href="http://twitter.com/tmaesaka/status/1008943172">I mentioned on Twitter</a>, the team that worked on the documentation should get a medal. It got me started in no time and gave me just enough information to start doing my own thing without getting frustrated from excessive information.</p>
<p>There are still unresolved questions like how sharding works for the Memcache API. I mean, do each application get a dedicated server instance(s) or are keys appended/prepended with an app_id in the background? The latter approach sounds simple and effective but it opens up another question of stats management. I guess a housekeeping index for each application would get around this issue but there is no programmable way from the outside to confirm this.</p>
<p>On a different note, I should stop being a stalker and just enjoy what&#8217;s been provided (though this is a really difficult thing to do once you dive into the world of engineering) :)</p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2008/11/gae-memcache-api/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

