<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Toru Maesaka</title>
	<atom:link href="http://torum.net/feed/" rel="self" type="application/rss+xml" />
	<link>http://torum.net</link>
	<description>Hackaholic and a Web Addict based in Tokyo</description>
	<lastBuildDate>Fri, 27 Aug 2010 15:13:38 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Two Weeks in Review</title>
		<link>http://torum.net/2010/08/two-weeks-in-review/</link>
		<comments>http://torum.net/2010/08/two-weeks-in-review/#comments</comments>
		<pubDate>Fri, 27 Aug 2010 09:58:24 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[random]]></category>
		<category><![CDATA[life]]></category>
		<category><![CDATA[phone]]></category>

		<guid isPermaLink="false">http://torum.net/?p=2372</guid>
		<description><![CDATA[It&#8217;s been almost two weeks since I left mixi where I literally had a great time for the past 3 years and 8 months. Here&#8217;s what I&#8217;ve been up to lately. Having dinner every night with life-long friends that I made through working at mixi. BBQ in the Mountains of Chiba with friends. Riding my [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been almost two weeks since I left mixi where I literally had a great time for the past 3 years and 8 months. Here&#8217;s what I&#8217;ve been up to lately.</p>
<ul>
<li>Having dinner every night with life-long friends that I made through working at mixi.</li>
<li>BBQ in the Mountains of <a href="http://en.wikipedia.org/wiki/Chiba_Prefecture">Chiba</a> with friends.</li>
<li>Riding my new <a href="http://www.orbea.com/en/bicis/">road bike</a> (bicycle).</li>
<li>Day trip to <a href="http://en.wikipedia.org/wiki/Kamakura,_Kanagawa">Kamakura</a>.</li>
<li>Studying relevant technologies for my next gig.</li>
<li>Searching for a new apartment to rent in Tokyo.</li>
<li>Throwing things out to make moving easier.</li>
<li>Cancelled my gym membership due to moving.</li>
<li>Writing final evaluation for Djellel on Google Summer of Code.</li>
<li>Retired my iPhone 3G that I&#8217;ve been using for over two years.</li>
</ul>
<p>My free time comes from a surprise stack of unconsumed vacation time that I had left at mixi. You see, I hardly used my paid leave while I was at mixi due to working on exciting projects and having fun people around me. Time just flew.</p>
<p>As a replacement for my iPhone 3G, I bought a BlackBerry Bold 9700 which now makes me a BlackBerry + Android user. I don&#8217;t have anything against the iPhone as a product (although I have improvement suggestions) but I wanted a change after using it for over two years. I&#8217;m also hoping that the new iPod Touch will come with a camera which eliminates my desire for an iPhone 4. I still actively use my HTC Magic for mobile web surfing and <a href="http://torum.net/2010/06/better-mobile-internet-life-in-japan/">tethering my b-mobile</a> on the field.</p>
<p>My opinion on the BlackBerry so far is that it&#8217;s a powerful email machine combined with <a href="http://www.google.com/mobile/sync/">Google Sync</a> and <a href="http://mail.google.com/support/bin/topic.py?hl=en&#038;topic=12867">Google Contacts</a> (now part of Gmail). I&#8217;m certain that I&#8217;ve been replying to more emails while I&#8217;m out than when I was using the iPhone. The web browsing experience on the 9700 is poor but I have an Android to fulfill that void.</p>
<p>Anyhow, I just wanted to let my readers know that I&#8217;m doing fine. My vacation ends next week so until then, please feel free to ping me for a hand on your project, drinking or whatever :)</p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2010/08/two-weeks-in-review/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BlitzDB Crash Safety and Auto Recovery</title>
		<link>http://torum.net/2010/07/blitzdb-crash-safety-and-auto-recovery/</link>
		<comments>http://torum.net/2010/07/blitzdb-crash-safety-and-auto-recovery/#comments</comments>
		<pubDate>Thu, 22 Jul 2010 09:43:14 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[drizzle]]></category>
		<category><![CDATA[oss]]></category>
		<category><![CDATA[blitzdb]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[recovery]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://torum.net/?p=2369</guid>
		<description><![CDATA[Crash Safety is a big deal in the database league. Lack of durability can lead to all sorts of terrible things upon a catastrophic event. Many projects, especially in the so called NoSQL world compromises crash safety in return for higher QPS. The argument there is that the availability of the overall system should be [...]]]></description>
			<content:encoded><![CDATA[<p>Crash Safety is a big deal in the database league. Lack of durability can lead to all sorts of terrible things upon a catastrophic event. Many projects, especially in the so called NoSQL world compromises crash safety in return for higher QPS. The argument there is that the availability of the overall system should be accomplished by replication since a database server can&#8217;t be rescued if the physical disk breaks. I happen to agree with this philosophy but I am also aware that this isn&#8217;t a correct answer for everyone. So, what will I do with BlitzDB?</p>
<p>Several relational database hackers have pointed out that BlitzDB isn&#8217;t any safer than MyISAM since it doesn&#8217;t guarantee crash safety. This is currently true but I plan on making BlitzDB much safer than MyISAM by providing following features.</p>
<ol>
<li>Auto Recovery Routine (startup option)</li>
<li>Tokyo Cabinet&#8217;s Transaction API (table-specific option)</li>
</ol>
<p>The second feature above would actually guarantee BlitzDB to be crash safe (especially combined with auto recovery) but I won&#8217;t get into depth in this post since this topic deserves a blog post of it&#8217;s own. Let me just state that this feature will be provided in a form like this:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> t1 <span style="color: #66cc66;">&#40;</span>
  a int <span style="color: #993333; font-weight: bold;">PRIMARY</span> <span style="color: #993333; font-weight: bold;">KEY</span><span style="color: #66cc66;">,</span>
  b varchar<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">256</span><span style="color: #66cc66;">&#41;</span>
<span style="color: #66cc66;">&#41;</span> ENGINE <span style="color: #66cc66;">=</span> BLITZDB<span style="color: #66cc66;">,</span> CRASH_SAFE;</pre></div></div>

<p>From here on, I&#8217;ll cover how I plan on hacking auto recovery in BlitzDB.</p>
<h3>Auto Recovery Challenges</h3>
<p>As I blogged a while back, <a href="http://torum.net/2010/01/how-to-recover-a-tokyo-cabinet-database-file/">recovering Tokyo Cabinet</a> is relatively simple. However, this is not a sufficient solution in BlitzDB since the data file (hash database that actually holds the rows) and the index file(s) are independent from each other. That is, the likelihood of the data file and the index file(s) to be inconsistent is very high after a crash. So, how can we hack on this? Pretty simple.</p>
<h3>Indexes aren&#8217;t Important at Recovery Phase</h3>
<p>Because BlitzDB logically separates the data file and it&#8217;s indexes, index files aren&#8217;t that important. If a server crash had occurred, BlitzDB could delete the index file(s) and recompute them from the data file. Needless to say, this process would involve a lot of random access and computation but it would not dominate the time space of the system since it&#8217;s a one-time cost. This approach however has one flaw in it such that the index files can&#8217;t be recomputed if the data file is broken or is unrecoverable.</p>
<p>Therefore to guarantee crash safety, BlitzDB must ensure that the data file is unbreakable. This is precisely where Tokyo Cabinet&#8217;s Transaction API comes in. I&#8217;m planning on using it to protect the data file from breaking. If the data file is protected, the table can be rescued. Simple!</p>
<p>So, that&#8217;s what I have in mind for making BlitzDB a safer engine. Unfortunately I can&#8217;t start hacking on it immediately since I have several bugs to fix first. Nevertheless I&#8217;m looking forward to start hacking on it. This challenge should be quite fun to tackle.</p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2010/07/blitzdb-crash-safety-and-auto-recovery/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Extending CREATE TABLE Syntax in Drizzle</title>
		<link>http://torum.net/2010/07/extending-create-table-syntax-in-drizzle/</link>
		<comments>http://torum.net/2010/07/extending-create-table-syntax-in-drizzle/#comments</comments>
		<pubDate>Wed, 21 Jul 2010 17:37:42 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[drizzle]]></category>
		<category><![CDATA[oss]]></category>
		<category><![CDATA[api]]></category>

		<guid isPermaLink="false">http://torum.net/?p=2367</guid>
		<description><![CDATA[The flexibility to add table-specific options for things like compression, encryption and optimization can be useful to storage engine developers as this flexibility can open up new possibilities. Here&#8217;s what I&#8217;m talking about: CREATE TABLE t1 &#40; ... &#41; ENGINE = my_engine, MY_OPTION = your_arg; Supporting this is relatively easy in Drizzle and this API [...]]]></description>
			<content:encoded><![CDATA[<p>The flexibility to add table-specific options for things like compression, encryption and optimization can be useful to storage engine developers as this flexibility can open up new possibilities. Here&#8217;s what I&#8217;m talking about:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">TABLE</span> t1 <span style="color: #66cc66;">&#40;</span>
  <span style="color: #66cc66;">...</span>
<span style="color: #66cc66;">&#41;</span> ENGINE <span style="color: #66cc66;">=</span> my_engine<span style="color: #66cc66;">,</span> MY_OPTION <span style="color: #66cc66;">=</span> your_arg;</pre></div></div>

<p>Supporting this is relatively easy in Drizzle and this API feature (and a bit more) is <a href="http://askmonty.org/wiki/Manual:Extending_CREATE_TABLE">available in MariaDB</a> as well. Unfortunately Drizzle&#8217;s method to do this isn&#8217;t documented in the Wiki yet but it should be added when our Storage Engine API becomes stable (as in, no interface changes).</p>
<h3>Implement StorageEngine::doValidateTableOptions()</h3>
<p>Here&#8217;s the actual interface.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">bool</span> StorageEngine<span style="color: #008080;">::</span><span style="color: #007788;">doValidateTableOptions</span><span style="color: #008000;">&#40;</span><span style="color: #0000ff;">const</span> std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> <span style="color: #000040;">&amp;</span>key,
                                           <span style="color: #0000ff;">const</span> std<span style="color: #008080;">::</span><span style="color: #007788;">string</span> <span style="color: #000040;">&amp;</span>state<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<p>This function is called for each table options given at CREATE TABLE syntax execution. The first argument, <em>key</em> is a const reference to a string that represents the option name. The second argument, <em>state</em> represents the argument given for that option.</p>
<p>Therefore, given: <em>COMPRESSION = YES_PLEASE</em>, <em>key</em> would be &#8220;COMPRESSION&#8221; and <em>state</em> would be &#8220;YES_PLEASE&#8221;. The objective of this function is to check whether the key/state pair makes sense to your storage engine. If this function returns false, Drizzle will return an error for the CREATE TABLE query. Personally I think this interface can be improved to be a bit more Developer friendly, such as making life easier to validate numeric values without enforcing the developer to play around with the data. Saying that, given the pace that Drizzle is growing, this could be improved before we know it.</p>
<h3>Access Options at StorageEngine::doCreateTable()</h3>
<p>Here&#8217;s the actual interface for doCreateTable().</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">int</span> doCreateTable<span style="color: #008000;">&#40;</span>drizzled<span style="color: #008080;">::</span><span style="color: #007788;">Session</span> <span style="color: #000040;">&amp;</span>session,
                  drizzled<span style="color: #008080;">::</span><span style="color: #007788;">Table</span> <span style="color: #000040;">&amp;</span>table_arg,
                  <span style="color: #0000ff;">const</span> drizzled<span style="color: #008080;">::</span><span style="color: #007788;">TableIdentifier</span> <span style="color: #000040;">&amp;</span>identifier,
                  drizzled<span style="color: #008080;">::</span><span style="color: #007788;">message</span><span style="color: #008080;">::</span><span style="color: #007788;">Table</span> <span style="color: #000040;">&amp;</span>table_proto<span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span></pre></div></div>

<p>Given that the options were successfully validated, doCreateTable() is called next. In Drizzle, all information regarding a table (including options) is represented in a Google Protocol Buffer message. A reference to that message object is passed to doCreateTable() as the fourth argument so all you need to do is loop through the options list in the message object and extract what you need. Here&#8217;s a minimal example that only takes care of one option.</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">int</span> n_options <span style="color: #000080;">=</span> table_proto.<span style="color: #007788;">engine</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>.<span style="color: #007788;">options_size</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #0000ff;">int</span> i <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span> i <span style="color: #000080;">&lt;</span> n_options<span style="color: #008080;">;</span> i<span style="color: #000040;">++</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
  <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span>table_proto.<span style="color: #007788;">engine</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>.<span style="color: #007788;">options</span><span style="color: #008000;">&#40;</span>i<span style="color: #008000;">&#41;</span>.<span style="color: #007788;">name</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #000080;">==</span> <span style="color: #FF0000;">&quot;my_option_name&quot;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#123;</span>
    <span style="color: #666666;">// Do whatever you like with this stream.</span>
    std<span style="color: #008080;">::</span><span style="color: #007788;">istringstream</span> stream<span style="color: #008000;">&#40;</span>table_proto.<span style="color: #007788;">engine</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>.<span style="color: #007788;">options</span><span style="color: #008000;">&#40;</span>i<span style="color: #008000;">&#41;</span>.<span style="color: #007788;">state</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
  <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>The above example should be simple to extend to handle multiple options. What&#8217;s really important in the above example is that the option name can be accessed with the name() accessor and the state (value) of that option with the state() accessor.</p>
<p>So, that&#8217;s all I have to cover for now. I hope this feature will help storage engine developers create and provide useful table specific features for their engine.</p>
<p>Happy Hacking.</p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2010/07/extending-create-table-syntax-in-drizzle/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>BlitzDB is now in Drizzle&#8217;s Trunk Repository</title>
		<link>http://torum.net/2010/06/blitzdb-drizzle-merge/</link>
		<comments>http://torum.net/2010/06/blitzdb-drizzle-merge/#comments</comments>
		<pubDate>Mon, 21 Jun 2010 11:20:45 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[drizzle]]></category>
		<category><![CDATA[oss]]></category>
		<category><![CDATA[blitzdb]]></category>

		<guid isPermaLink="false">http://torum.net/?p=2362</guid>
		<description><![CDATA[Happy to announce that BlitzDB has been merged with Drizzle&#8217;s Trunk. As much as I&#8217;m excited, it&#8217;s time to come back to reality. This merge is merely a beginning. There is much more work that needs to be done to BlitzDB such as ensuring stability by adding more tests, find bugs, and eliminate them. I&#8217;m [...]]]></description>
			<content:encoded><![CDATA[<p>Happy to announce that BlitzDB has <a href="http://bazaar.launchpad.net/~drizzle-developers/drizzle/development/revision/1626">been merged</a> with Drizzle&#8217;s Trunk.</p>
<p>As much as I&#8217;m excited, it&#8217;s time to come back to reality. This merge is merely a beginning. There is much more work that needs to be done to BlitzDB such as ensuring stability by adding more tests, find bugs, and eliminate them. I&#8217;m hoping that the likelihood of bugs being found will increase due to this merge. Admittedly, I want to hack on fancy (yet important) things like auto recovery but I&#8217;m going to resist doing this until I&#8217;m truly satisfied with the quality of BlitzDB. My plan is to have BlitzDB rock solid by Drizzle&#8217;s Beta release.</p>
<p>The review process to get BlitzDB into Drizzle was straight forward and smooth. This is mostly due to the fact that the community was very supportive about testing. Folks like Stewart Smith and Patrick Crews from Rackspace pointed out several bugs that I would not have found myself. I&#8217;m certainly lucky to have a supportive professional QA engineer (looking at you Patrick) to test out and give punishment to BlitzDB.</p>
<p>All I&#8217;ll be doing on BlitzDB for the next couple of weeks is debugging and refactoring to improve readability. What I need more of at the moment is test cases on JOINs that are likely to be used in practice. If you have a good test case, I would greatly appreciate it!</p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2010/06/blitzdb-drizzle-merge/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Better Mobile Internet Life in Japan</title>
		<link>http://torum.net/2010/06/better-mobile-internet-life-in-japan/</link>
		<comments>http://torum.net/2010/06/better-mobile-internet-life-in-japan/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 12:42:31 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[technology]]></category>
		<category><![CDATA[3g]]></category>
		<category><![CDATA[connectivity]]></category>
		<category><![CDATA[internet]]></category>
		<category><![CDATA[mobile]]></category>

		<guid isPermaLink="false">http://torum.net/?p=2360</guid>
		<description><![CDATA[Having mobile internet connectivity was always something hot among tech geeks. What&#8217;s interesting though is that this luxury is gradually becoming a normal day to day lifestyle in urban Japan. Nowadays pretty much every major mobile carrier provides unlimited 3G Data SIM package for a competitive price. Pricing currently ranges between 2000 to 3000 JPY [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/tmaesaka/4706090010/" title="b-mobile U300 by tmaesaka, on Flickr"><img src="http://farm2.static.flickr.com/1278/4706090010_dc699f9379_m.jpg" width="165" height="240" alt="b-mobile U300" align="left" style="margin-right:8px;" /></a> Having mobile internet connectivity was always something hot among tech geeks. What&#8217;s interesting though is that this luxury is gradually becoming a normal day to day lifestyle in urban Japan. Nowadays pretty much every major mobile carrier provides unlimited 3G Data SIM package for a competitive price. Pricing currently ranges between 2000 to 3000 JPY (<a href="http://www.google.com/search?rls=en&#038;q=2500+jpy+in+usd&#038;ie=UTF-8&#038;oe=UTF-8">Google Currency Conversion</a>).</p>
<p>This movement in the mobile market is understandable since most modern lightweight laptops and netbooks sold in Japan has an internal 3G modem, which allows you to gain instant internet connectivity as long as there&#8217;s antenna coverage. Sony Vaio is a great example of this use-case. Needless to say, the iPad and Android is a contributing factor to the market shift as well.</p>
<p>The catch with these products is that you&#8217;re usually obliged to sign a contract for two years. This wouldn&#8217;t be a problem if you&#8217;re traveling to Japan and planning to stick around for that long but otherwise this can turn out to cost you unnecessary fees for cancellation.</p>
<h3>Traveling to Japan? Read This</h3>
<p>I&#8217;ve always been a fan of the prepaid model (as long as it&#8217;s not overpriced). So, last weekend I bought a 6 month package from b-mobile, which is a service that provides 3G internet connectivity (unlimited packets) for a finite period that you choose. The awesome thing about this service is that you don&#8217;t have to sign any contracts or register your personal information to the service provider (Japan Communications). All you need to do is prepay for a certain period and they&#8217;ll give you a SIM for it. No questions asked. Another good thing about b-mobile is that it runs on docomo&#8217;s <a href="http://en.wikipedia.org/wiki/FOMA">FOMA network</a> which is arguably the strongest mobile network in Japan.</p>
<p>For your interest I bought mine at <a href="http://maps.google.com/maps?f=q&#038;source=s_q&#038;hl=en&#038;geocode=&#038;q=bic+camera+shibuya&#038;sll=35.674018,139.707814&#038;sspn=0.010668,0.015171&#038;ie=UTF8&#038;hq=bic+camera&#038;hnear=Shibuya+Ward,+T%C5%8Dky%C5%8D+Metropolis,+Japan&#038;ll=35.660522,139.701226&#038;spn=0.01067,0.015171&#038;z=16&#038;iwloc=A">Bic Camera in Shibuya</a> and paid around 14,000 JPY for 6 months (works out to be around <a href="http://www.google.com/search?rls=en&#038;q=2333+jpy+in+usd&#038;ie=UTF-8&#038;oe=UTF-8">2333 JPY</a> per month). Their sales model is great for us consumers but the first impression I got was that this could be pretty dodgy if this product gets in the hands of folks with malicious intents. </p>
<h3>Perfect with Android, Especially Nexus One</h3>
<p>I decided to throw my new SIM into my HTC Magic (Dev Phone courtesy of Google) and setup tethering on it (both WiFi and USB Cable). Unfortunately the certain Android 2.1 kernel I was using wasn&#8217;t compatible with b-mobile so I had to go through several workarounds and ask my Android guru colleagues for help to get it working. The funny thing is that b-mobile will work out of the box with Nexus One running Froyo.</p>
<p>Despite the obstacles I&#8217;m happy with the outcome and I hope this blog entry would turn out to be helpful to those that are planning on traveling and staying in Japan for a while.</p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2010/06/better-mobile-internet-life-in-japan/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Notes on Loading Data to Google App Engine</title>
		<link>http://torum.net/2010/06/loading-data-to-google-app-engine/</link>
		<comments>http://torum.net/2010/06/loading-data-to-google-app-engine/#comments</comments>
		<pubDate>Tue, 15 Jun 2010 14:30:00 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[knowledge]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[datastore]]></category>
		<category><![CDATA[gae]]></category>
		<category><![CDATA[memo]]></category>

		<guid isPermaLink="false">http://torum.net/?p=2358</guid>
		<description><![CDATA[Google has a fantastic documentation on this topic but at the time I wrote this blog entry, the documentation covered how to download and upload data using appcfg.py but not with bulkloader.py (there is also bulkload_client.py). So, I decided to play around with the nifty bulkloader and keep a note on my findings. Prepare the [...]]]></description>
			<content:encoded><![CDATA[<p>Google has a <a href="http://code.google.com/appengine/docs/python/tools/uploadingdata.html">fantastic documentation</a> on this topic but at the time I wrote this blog entry, the documentation covered how to download and upload data using appcfg.py but not with bulkloader.py (there is also bulkload_client.py). So, I decided to play around with the nifty bulkloader and keep a note on my findings.</p>
<h3>Prepare the End Point for Loading Data</h3>
<p>Loading data to the Data Store is accomplished by sending data to the application over <a href="http://en.wikipedia.org/wiki/Http">HTTP</a>. This means that your application needs a uniquely identifiable URI for you to send your data to.  Creating a valid URI is just a matter of <a href="http://code.google.com/appengine/docs/python/tools/uploadingdata.html#Setting_Up_remote_api">setting up a handler</a> for it in the <strong>app.yaml</strong> config file. GAE takes care of the import logic with it&#8217;s own handler. There&#8217;s nothing special in this step and the documentation covers how to do this concisely.</p>
<h3>Test Data for Demo Purpose</h3>
<p>For this blog entry, I decided to prepare a CSV with four rows that represents users. In reality, there would be more information related to a user but I decided to keep things minimal for this blog entry. I saved this data as <strong>user.csv</strong>.</p>

<div class="wp_syntax"><div class="code"><pre class="null" style="font-family:monospace;">1, Daniel, Bernstein, xxxxxxx
2, Donald, Knuth, xxxxxxx
3, Bjarne, Stroustrup, xxxxxxx
4, Robert, Sedgewick, xxxxxxx</pre></div></div>

<p>You can also represent your table in XML but I decided to use CSV for it&#8217;s simplicity.</p>
<h3>Create a Bulk Loader Configuration File or Not</h3>
<p>In addition to the CSV file, the bulk loader needs to know how each record in the CSV file should be represented as a Data Store entity. The modeling as far as I know can be done in two ways. One is to write a <a href="http://code.google.com/intl/en/appengine/docs/python/tools/uploadingdata.html#Creating_Loader_Classes">loader class in Python</a> that the bulkloader can use. Another approach is to get bulkloader.py to <a href="http://code.google.com/appengine/docs/python/tools/uploadingdata.html#Configuring_the_Bulk_Loader">generate a configuration file</a> (in <a href="http://en.wikipedia.org/wiki/Yaml">YAML</a>).</p>
<p>I decided to write my own Python class to get through this step since according to the documentation at the time this blog post was written, this approach doesn&#8217;t work with the local development server.</p>
<p>With the above in mind, here is my loader class. You would usually keep the Data Model definition (the User class) in a separate file but for demo purposes, I decided to keep it in one file.</p>

<div class="wp_syntax"><div class="code"><pre class="python" style="font-family:monospace;"><span style="color: #ff7700;font-weight:bold;">from</span> google.<span style="color: black;">appengine</span>.<span style="color: black;">ext</span> <span style="color: #ff7700;font-weight:bold;">import</span> db
<span style="color: #ff7700;font-weight:bold;">from</span> google.<span style="color: black;">appengine</span>.<span style="color: black;">tools</span> <span style="color: #ff7700;font-weight:bold;">import</span> bulkloader
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> User<span style="color: black;">&#40;</span>db.<span style="color: black;">Model</span><span style="color: black;">&#41;</span>:
  <span style="color: #008000;">id</span> = db.<span style="color: black;">IntegerProperty</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
  firstname = db.<span style="color: black;">StringProperty</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
  lastname = db.<span style="color: black;">StringProperty</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
  some_text = db.<span style="color: black;">StringProperty</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>
&nbsp;
<span style="color: #ff7700;font-weight:bold;">class</span> UserLoader<span style="color: black;">&#40;</span>bulkloader.<span style="color: black;">Loader</span><span style="color: black;">&#41;</span>:
  <span style="color: #ff7700;font-weight:bold;">def</span> <span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span><span style="color: black;">&#41;</span>:
    bulkloader.<span style="color: black;">Loader</span>.<span style="color: #0000cd;">__init__</span><span style="color: black;">&#40;</span><span style="color: #008000;">self</span>, <span style="color: #483d8b;">'User'</span>,
                               <span style="color: black;">&#91;</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'id'</span>, <span style="color: #008000;">int</span><span style="color: black;">&#41;</span>,
                                <span style="color: black;">&#40;</span><span style="color: #483d8b;">'firstname'</span>, <span style="color: #008000;">str</span><span style="color: black;">&#41;</span>,
                                <span style="color: black;">&#40;</span><span style="color: #483d8b;">'lastname'</span>, <span style="color: #008000;">str</span><span style="color: black;">&#41;</span>,
                                <span style="color: black;">&#40;</span><span style="color: #483d8b;">'some_text'</span>, <span style="color: #008000;">str</span><span style="color: black;">&#41;</span><span style="color: black;">&#93;</span><span style="color: black;">&#41;</span>
loaders = <span style="color: black;">&#91;</span>UserLoader<span style="color: black;">&#93;</span></pre></div></div>

<p>The explanation on what this class does is <a href="http://code.google.com/intl/en/appengine/docs/python/tools/uploadingdata.html#Creating_Loader_Classes">described</a> in the documentation. I saved this script as <strong>user_loader.py</strong>.</p>
<h3>Load your Data to the Data Store</h3>
<p>For demo purposes, I used my local development server on port 8083 to load the CSV file. Given that the application is running and that the API endpoint is active, it&#8217;s just a matter of providing bulkloader.py with essential information. For available options I recommend reading help by executing &#8216;bulkloader.py -h&#8217;.</p>
<p>The following command attempts to load our entity of &#8216;kind=User&#8217; from user.csv using our loader class (user_loader.py) to the endpoint.</p>

<div class="wp_syntax"><div class="code"><pre class="null" style="font-family:monospace;">$ bulkloader.py --filename=user.csv --config_file=user_loader.py \
--kind=User --url=http://localhost:8083/import --app_id=your_app_id</pre></div></div>

<p>Note that it&#8217;s essential to provide the &#45;&#45;app_id option when uploading data to the local server. When asked for credentials, you can type anything you like. You only need to supply valid credentials when uploading to production.</p>
<p>Here&#8217;s the output from executing the above command.</p>

<div class="wp_syntax"><div class="code"><pre class="null" style="font-family:monospace;">[INFO    ] Logging to bulkloader-log-20100615.213842
[INFO    ] Throttling transfers:
[INFO    ] Bandwidth: 250000 bytes/second
[INFO    ] HTTP connections: 8/second
[INFO    ] Entities inserted/fetched/modified: 20/second
[INFO    ] Batch Size: 10
[INFO    ] Opening database: bulkloader-progress-20100615.213842.sql3
Please enter login credentials for localhost
Email: foo
Password for foo: 
[INFO    ] Connecting to localhost:8083/import
[INFO    ] Starting import; maximum 10 entities per post
[INFO    ] 4 entites total, 0 previously transferred
[INFO    ] 4 entities (933 bytes) transferred in 4.0 seconds
[INFO    ] All entities successfully transferred</pre></div></div>

<p>Success!</p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2010/06/loading-data-to-google-app-engine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BlitzDB Concurrent Testing and Write Performance</title>
		<link>http://torum.net/2010/05/blitzdb-concurrency-testing/</link>
		<comments>http://torum.net/2010/05/blitzdb-concurrency-testing/#comments</comments>
		<pubDate>Wed, 12 May 2010 06:42:05 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[drizzle]]></category>
		<category><![CDATA[oss]]></category>
		<category><![CDATA[blitzdb]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://torum.net/?p=2353</guid>
		<description><![CDATA[Last month while being at the MySQL Conference, several people asked me about the status of BlitzDB. Specifically, they were interested in when I&#8217;ll release BlitzDB. Fair enough &#8211; I&#8217;ve been working on this project long enough for people to start questioning this. The answer is, BlitzDB is done in terms of implementing the design. [...]]]></description>
			<content:encoded><![CDATA[<p>Last month while being at the MySQL Conference, several people asked me about the status of BlitzDB. Specifically, they were interested in when I&#8217;ll release BlitzDB. Fair enough &#8211; I&#8217;ve been working on this project long enough for people to start questioning this.</p>
<p>The answer is, BlitzDB is done in terms of implementing the design. Right now it&#8217;s about finding bugs, fixing it and testing BlitzDB&#8217;s stability under concurrent load. Thanks to the motivation boost I gained at the conference, I&#8217;ve now fixed the bugs that were slowing me down and I&#8217;m gradually adding more tests into BlitzDB&#8217;s test suite. I consider BlitzDB&#8217;s initial release to be the day it gets merged into Drizzle&#8217;s trunk. This is almost ready as BlitzDB seems to be building fine on Drizzle&#8217;s Build Farm infrastructure. However, I won&#8217;t move to the next step until I&#8217;m satisfied with BlitzDB&#8217;s stability.</p>
<p>Yesterday I spent some time doing some concurrency testing on BlitzDB&#8217;s INSERT code with skyload. Needless to say, concurrency testing is also a convenient way to look at the performance of a particular component. So, I decided to publish my findings from this test. First, here is the background of the test.</p>
<h3>Purpose of the Test</h3>
<ul>
<li>Test BlitzDB&#8217;s slot-lock mechanism.</li>
<li>Confirm that BlitzDB will not crash under concurrent INSERT workload.</li>
<li>Confirm that key insertion to the index is working as expected.</li>
<li>Confirm that writes to multiple indexes work as expected.</li>
<li>Observe the write-performance impact of adding an index.</li>
</ul>
<p>Two commodity boxes were used. One dedicated for the client and the other dedicated for the server (Drizzle + BlitzDB). Both boxes has the same spec: Intel Quad Xeon E5345 (2×4MB L2 cache), 8GB Memory, 500GB SATA II, gigabit NIC. Servers were connected by a gigabit switch. File system on the server was ext3.</p>
<p>By default, a BlitzDB table is optimized for up to 1 million rows. Therefore this test inserted 1 million rows to a table with different concurrency levels. A different concurrency level is used per run. The table used in this test only contains three integer columns. Tests are performed up to three indexes. The linux kenel&#8217;s dirty buffer is flushed before each test run. Tests were run until the performance curve flattened.</p>
<h3>Result</h3>
<p align="center"><a href="http://www.flickr.com/photos/tmaesaka/4598572902/" title="BlitzDB Table Insertion - Multi Index by tmaesaka, on Flickr"><img src="http://farm2.static.flickr.com/1324/4598572902_c1e45d7ac5.jpg" width="500" height="294" alt="BlitzDB Table Insertion - Multi Index" /></a></p>
<p>As seen above, scalability from 1 thread to 4 thread showed an ideal curve. This is expected since the server is a 4 core box. From 4 threads, performance showed some improvements up to 12 threads. From there on, concurrency greatly exceeds the number of physical cores so we can&#8217;t observe decent performance growth. The highest insert QPS gained in this test was <strong>just over 86,000 QPS</strong>. With more cores on the server and more clients, I suspect BlitzDB can hit over 100k QPS.</p>
<p>Although this graph looks good at first sight, I&#8217;m not happy with it. The performance penalty for adding multiple indexes should be greater than what&#8217;s observed in this result. This is because TC&#8217;s B+Tree is internally protected by a single lock on writes. I suspect that the performance penalty is not observed in this graph because I didn&#8217;t give BlitzDB enough load to make TC work hard. This implies that a bottleneck could exist elsewhere (Network, Drizzle or BlitzDB&#8217;s handler level code).</p>
<p>However, I&#8217;m glad that BlitzDB stood stable on this concurrency test which was what I wanted to test in the first place. Admittedly I need to mix several types of queries to properly test BlitzDB&#8217;s stability. I plan on doing this next with sysbench and hopefully <a href="https://launchpad.net/randgen">RQG</a>.</p>
<p>Once this is done, I&#8217;ll submit a merge proposal to the Drizzle Project :)</p>
<h3>Future Development Plans</h3>
<ul>
<li>Find bugs, Fix bugs, Repeat.</li>
<li>Write an inbuilt auto recovery routine.</li>
<li>Eventually add a crash safe option to BlitzDB.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2010/05/blitzdb-concurrency-testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Testing BlitzDB on Drizzle&#8217;s Build Farm</title>
		<link>http://torum.net/2010/05/blitzdb-on-build-farm/</link>
		<comments>http://torum.net/2010/05/blitzdb-on-build-farm/#comments</comments>
		<pubDate>Thu, 06 May 2010 10:37:36 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[drizzle]]></category>
		<category><![CDATA[oss]]></category>
		<category><![CDATA[blitzdb]]></category>
		<category><![CDATA[hudson]]></category>

		<guid isPermaLink="false">http://torum.net/?p=2349</guid>
		<description><![CDATA[One of many important things that the Drizzle project takes seriously is for the project sourcecode to successfully build in all our target platforms AND pass tests in them. This is not really specific to Drizzle as most open source projects would have the same policy. For example we do the same thing in memcached [...]]]></description>
			<content:encoded><![CDATA[<p>One of many important things that the Drizzle project takes seriously is for the project sourcecode to successfully build in all our target platforms AND pass tests in them. This is not really specific to Drizzle as most open source projects would have the same policy. For example we do the same thing in memcached thanks to Dustin Sailing&#8217;s buildbot kungfu. </p>
<p>Yesterday, Monty Taylor gave me access to Drizzle&#8217;s Build Farm Infrastructure so that I could test BlitzDB on various Linux distributions and FreeBSD. Unfortunately most build machines didn&#8217;t have Tokyo Cabinet installed so I could only test builds on Ubuntu and Debian. Fortunately the build went fine on those platforms though this was predictable since Ubuntu is my primary development platform. What was disturbing was getting test errors on my index test suite. I guess it&#8217;s time to put my thinking cap on and see what the problem is there.</p>
<p>This is a big leap towards getting BlitzDB in Drizzle&#8217;s trunk which I&#8217;m steadily working towards. I also want to benchmark BlitzDB at it&#8217;s current state with <a href="http://sysbench.sourceforge.net/">sysbench</a>&#8216;s OLTP tests. This is still low in my priority queue but hopefully I&#8217;ll do it in the next couple of months.</p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2010/05/blitzdb-on-build-farm/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Drizzle Google Summer of Code Projects</title>
		<link>http://torum.net/2010/04/drizzle-summer-of-code/</link>
		<comments>http://torum.net/2010/04/drizzle-summer-of-code/#comments</comments>
		<pubDate>Tue, 27 Apr 2010 06:12:03 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[drizzle]]></category>
		<category><![CDATA[oss]]></category>
		<category><![CDATA[gsoc]]></category>

		<guid isPermaLink="false">http://torum.net/?p=2346</guid>
		<description><![CDATA[This morning while being half a sleep, I was delighted to see an announcement email for Drizzle&#8217;s Google Summer of Code projects in my inbox. Congratulations to not only those that are taking part in GSoC via Drizzle but all of you participating in GSoC this year. Here&#8217;s the actual announcement email that contains the [...]]]></description>
			<content:encoded><![CDATA[<p>This morning while being half a sleep, I was delighted to see an announcement email for Drizzle&#8217;s Google Summer of Code projects in my inbox. Congratulations to not only those that are taking part in GSoC via Drizzle but all of you participating in GSoC this year. Here&#8217;s the actual announcement email that contains the list of Drizzle projects that will take place this year.</p>
<ul>
<li><a href="https://lists.launchpad.net/drizzle-discuss/msg06568.html">https://lists.launchpad.net/drizzle-discuss/msg06568.html</a></li>
</ul>
<p>This year I&#8217;m mentoring <a href="http://www.ucs.louisiana.edu/~ded5797/">Djellel Eddine Difallah</a> on &#8220;<strong>A Memcached Query Cache Plugin for Drizzle</strong>&#8220;. This happens to be a project I abandoned a <a href="http://torum.net/2008/10/rethink-query-cache-drizzle/">long time ago</a> so I was happy to see someone digging it up and seeking interest in it.</p>
<p>I&#8217;m excited to work with Djellel over the summer. Looking forward to having lots of fun with all the technical challenges and most importantly hacking under the open community environment.</p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2010/04/drizzle-summer-of-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RainDB.org up for Donation</title>
		<link>http://torum.net/2010/04/raindb-org-up-for-donation/</link>
		<comments>http://torum.net/2010/04/raindb-org-up-for-donation/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 22:29:03 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[oss]]></category>
		<category><![CDATA[drizzle]]></category>

		<guid isPermaLink="false">http://torum.net/2010/04/raindb-org-up-for-donation/</guid>
		<description><![CDATA[This time last year I obtained a domain called raindb.org which I was intending on using for my storage engine project. RainDB was the project name I had in mind for BlitzDB at the time. Since I now have a different project name, I no longer have any use for this domain. So, rather than [...]]]></description>
			<content:encoded><![CDATA[<p>This time last year I obtained a domain called raindb.org which I was intending on using for my storage engine project. RainDB was the project name I had in mind for BlitzDB at the time. Since I now have a different project name, I no longer have any use for this domain.</p>
<p>So, rather than letting it go to waste I&#8217;d like to contribute this domain for yet another potential open source database project. Your project can be anything &#8211; MySQL Storage Engine, Drizzle Storage Engine, Embedded Library, Stand Alone Server, whatever. RainDB would be a good name for a highly concurrent database since the analogy is &#8211; &#8220;it can be rained on&#8221;.</p>
<p>If you&#8217;re interested please feel free to email, <a href="http://twitter.com/tmaesaka">tweet</a>, or even just leave a comment on this blog entry.  </p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2010/04/raindb-org-up-for-donation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
