<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Toru Maesaka &#187; btree</title>
	<atom:link href="http://torum.net/tag/btree/feed/" rel="self" type="application/rss+xml" />
	<link>http://torum.net</link>
	<description>Hackaholic and a Web Addict based in Tokyo</description>
	<lastBuildDate>Tue, 28 Feb 2012 10:52:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
		<item>
		<title>TC Concurrency Model and BlitzDB Part 1</title>
		<link>http://torum.net/2009/11/tc-concurrency-model-and-blitzdb-1/</link>
		<comments>http://torum.net/2009/11/tc-concurrency-model-and-blitzdb-1/#comments</comments>
		<pubDate>Sat, 07 Nov 2009 18:18:32 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[drizzle]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[blitzdb]]></category>
		<category><![CDATA[btree]]></category>
		<category><![CDATA[hashtable]]></category>

		<guid isPermaLink="false">http://torum.net/?p=2304</guid>
		<description><![CDATA[Recently I started rewriting BlitzDB because I&#8217;ve come to realize the mistakes I&#8217;ve made from getting a better understanding of the Drizzle Storage API and Tokyo Cabinet internals. Admittedly a rewrite is an exaggeration because I&#8217;ll be reusing most of the components but more in a C++ way. One decision I decided to make is [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I started rewriting BlitzDB because I&#8217;ve come to realize the mistakes I&#8217;ve made from getting a better understanding of the <a href="https://launchpad.net/drizzle">Drizzle</a> Storage API and <a href="http://1978th.net/tokyocabinet/">Tokyo Cabinet</a> internals. Admittedly a rewrite is an exaggeration because I&#8217;ll be reusing most of the components but more in a C++ way.</p>
<p>One decision I decided to make is that BlitzDB will only support a BTREE index via <a href="http://1978th.net/tokyocabinet/spex-en.html#tcbdbapi">TC&#8217;s B+Tree API</a> in it&#8217;s first release. Ignoring BlitzDB for now, several people I&#8217;ve talked to about key/value data structures often ask why I love <a href="http://en.wikipedia.org/wiki/B%2Btree">B+Tree</a> so much when it&#8217;s faster to work with a <a href="http://en.wikipedia.org/wiki/Hash_table">hash table</a>. Please don&#8217;t take it wrong, O(1) operations are beautiful and I love hash tables but stereotyping key/value structures to it is not. Everything has it&#8217;s ups and downs and hash table/map is not an exception. In this blog entry, I will describe why B+Tree is good for index scanning.</p>
<h3>Why a B+Tree Index</h3>
<p>A search algorithm of O(1) like hashing is clearly faster than O(log n) unless there&#8217;s something fishy about the implementation or the dataset is too small for the time complexity to matter. However, this is only true for looking up and fetching the value. For those that are only interested in fetching a particular value, that&#8217;s probably the best you can ask for. However things are different if you look into things beyond lookups like fetching or scanning through a range of keys.</p>
<p>To do this with a typical hash table, either your data structure must be able to provide a list of stored keys OR your application must do some housekeeping and save a list of relevant keys elsewhere for future use. Your application would then need to compute the subset of keys that you&#8217;re interested in and fetch them with a loop. Algorithmically speaking, each fetch operation is O(1) but what&#8217;s expensive here is that you end up doing a lot of random access. This is obviously going to kill your performance, especially when you need to chew through a heavy workload (though this _could_ change when SSD becomes standard).</p>
<p>B+Tree on the other hand is fantastic for this use-case. The actual data are stored at the leaf node and they are usually logically linked so that you don&#8217;t need to re-traverse the tree to get the next greater key (if you run out of relevant pages in the node, you move on to the neighbor leaf node). The pages are aligned on disk, which means sequential access. Another bonus is that most of the time, you can keep the entire internal nodes on memory which is small and inexpensive but effective for searching.</p>
<p>Solution to this? well, mine is to implement a combination play of the two data structures and take advantage of the different characteristics. In BlitzDB, the actual rows are stored in TC&#8217;s Hash Database and the index will store keys to the row. So, a clustered index.</p>
<p>What I&#8217;ve mentioned so far is all theoretical without providing any benchmark results but all I&#8217;m trying to say is, <strong>it&#8217;s all about access patterns and use-cases</strong>. My current interest is in index scan and therefore the decision. However, if there is enough people that asks me for a HASH index, I can write that functionality relatively easily later on :)</p>
<h3>Next Stop</h3>
<p>I would love to keep writing but it is currently past 3am in Japan and I&#8217;m dozing out here. Apologies for not covering Tokyo Cabinet&#8217;s in-depth concurrency model but I will cover it in my next post of the series and how this impacts BlitzDB&#8217;s design.</p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2009/11/tc-concurrency-model-and-blitzdb-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Writing records with duplicate keys to Tokyo Cabinet</title>
		<link>http://torum.net/2009/09/writing-duplicate-keys-to-tokyocabinet/</link>
		<comments>http://torum.net/2009/09/writing-duplicate-keys-to-tokyocabinet/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 14:56:08 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[knowledge]]></category>
		<category><![CDATA[oss]]></category>
		<category><![CDATA[btree]]></category>
		<category><![CDATA[tokyocabinet]]></category>

		<guid isPermaLink="false">http://torum.net/?p=2287</guid>
		<description><![CDATA[Lately I&#8217;ve been noticing that people are visiting my blog to find ways to write multiple records with the same key to a Tokyo Cabinet (TC) database. Well, the answer depends on which data structure you choose to construct a TC database. If you&#8217;re interested in TC&#8217;s hash database then you&#8217;re out of luck but [...]]]></description>
			<content:encoded><![CDATA[<p>Lately I&#8217;ve been noticing that people are visiting my blog to find ways to write multiple records with the same key to a <a href="http://1978th.net/tokyocabinet/">Tokyo Cabinet</a> (TC) database.</p>
<p>Well, the answer depends on which data structure you choose to construct a TC database. If you&#8217;re interested in TC&#8217;s hash database then you&#8217;re out of luck but <a href="http://1978th.net/tokyocabinet/spex-en.html#tcbdbapi">TC&#8217;s B+Tree database</a> will allow you to write duplicate keys. If you just want the answer, here&#8217;s a <a href="http://torum.net/code/c/tc_dupkey.c">compilable source</a> of how to do it. For those that are interested in how it works, keep on reading :)</p>
<p>So here&#8217;s how it&#8217;s done. You write the record(s) using TC&#8217;s tcbdbputdup() function so that upon key collision, TC will write the record next to the existing one. The following snippet will write three records to Tokyo Cabinet using an identical key.</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">const</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>key <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;key&quot;</span><span style="color: #339933;">;</span>
<span style="color: #993333;">const</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>r1 <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;record 1&quot;</span><span style="color: #339933;">;</span>
<span style="color: #993333;">const</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>r2 <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;record 2&quot;</span><span style="color: #339933;">;</span>
<span style="color: #993333;">const</span> <span style="color: #993333;">char</span> <span style="color: #339933;">*</span>r3 <span style="color: #339933;">=</span> <span style="color: #ff0000;">&quot;record 3&quot;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">/* store three different records with the same key.
   note that &quot;database_handle&quot; is a TCBDB object. */</span>
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span>tcbdbputdup<span style="color: #009900;">&#40;</span>database_handle<span style="color: #339933;">,</span> key<span style="color: #339933;">,</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">,</span> r1<span style="color: #339933;">,</span> strlen<span style="color: #009900;">&#40;</span>r1<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">||</span>
    <span style="color: #339933;">!</span>tcbdbputdup<span style="color: #009900;">&#40;</span>database_handle<span style="color: #339933;">,</span> key<span style="color: #339933;">,</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">,</span> r2<span style="color: #339933;">,</span> strlen<span style="color: #009900;">&#40;</span>r2<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">||</span>
    <span style="color: #339933;">!</span>tcbdbputdup<span style="color: #009900;">&#40;</span>database_handle<span style="color: #339933;">,</span> key<span style="color: #339933;">,</span> <span style="color: #0000dd;">3</span><span style="color: #339933;">,</span> r3<span style="color: #339933;">,</span> strlen<span style="color: #009900;">&#40;</span>r3<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  fprintf<span style="color: #009900;">&#40;</span>stderr<span style="color: #339933;">,</span> <span style="color: #ff0000;">&quot;failed to store data<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span>tcbdbclose<span style="color: #009900;">&#40;</span>database_handle<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    fprintf<span style="color: #009900;">&#40;</span>stderr<span style="color: #339933;">,</span> <span style="color: #ff0000;">&quot;failed to close the database<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>   
  tcbdbdel<span style="color: #009900;">&#40;</span>database_handle<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #b1b100;">return</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Something to watch out here is that because you&#8217;ve allowed duplication, running the above code multiple times will respectively keep appending the records to the database.</p>
<p>The next question is, how do we retrieve _only_ the records that corresponds to the key that we just inserted with. Simple! just traverse the tree from the first occurrence of the key and keep retrieving the data as we go until we hit a different key.</p>
<p>First thing that must be done is to create a cursor and move it to the first occurrence of the key.</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">BDBCUR <span style="color: #339933;">*</span>cursor<span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>cursor <span style="color: #339933;">=</span> tcbdbcurnew<span style="color: #009900;">&#40;</span>db<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">==</span> NULL<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #808080; font-style: italic;">/* FAIL. do the right thing for your application */</span> 
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">/* move the cursor to the first occurrence of the key */</span>
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span>tcbdbcurjump<span style="color: #009900;">&#40;</span>cursor<span style="color: #339933;">,</span> key<span style="color: #339933;">,</span> strlen<span style="color: #009900;">&#40;</span>key<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #808080; font-style: italic;">/* FAIL. do the right thing for your application */</span> 
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Now we&#8217;re ready to traverse the tree. Remember that we&#8217;re only interested in a certain key so we only want to traverse the tree until we hit a different key. The following code snippet will do exactly that and print the discovered record as it traverses the tree. So in our case it would print, “record 1″, “record 2″ and “record 3″.</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">char</span> <span style="color: #339933;">*</span>fetched_key<span style="color: #339933;">;</span>
<span style="color: #993333;">char</span> <span style="color: #339933;">*</span>fetched_value<span style="color: #339933;">;</span>
&nbsp;
<span style="color: #808080; font-style: italic;">/* traverse the tree. terminates if the entire tree is
   traversed _OR_ if it hits a different key */</span>
<span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span>tcbdbcurkey2<span style="color: #009900;">&#40;</span>cursor<span style="color: #009900;">&#41;</span> <span style="color: #339933;">!=</span> NULL<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  fetched_key <span style="color: #339933;">=</span> tcbdbcurkey2<span style="color: #009900;">&#40;</span>cursor<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #808080; font-style: italic;">/* different key so break out of the loop */</span>
  <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>strcmp<span style="color: #009900;">&#40;</span>key<span style="color: #339933;">,</span> fetched_key<span style="color: #009900;">&#41;</span> <span style="color: #339933;">!=</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    free<span style="color: #009900;">&#40;</span>fetched_key<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #000000; font-weight: bold;">break</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  fetched_value <span style="color: #339933;">=</span> tcbdbcurval2<span style="color: #009900;">&#40;</span>cursor<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>fetched_value<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    fprintf<span style="color: #009900;">&#40;</span>stdout<span style="color: #339933;">,</span> <span style="color: #ff0000;">&quot;fetched: %s<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span> fetched_value<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    free<span style="color: #009900;">&#40;</span>fetched_value<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
  tcbdbcurnext<span style="color: #009900;">&#40;</span>cursor<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>The above tree traversal requires one additional lookup to terminate (if the entire tree isn&#8217;t traversed) but the chances are that the records are stored in the same page so this additional operation is cheap.</p>
<p>Alternatively, TC provides a function called tcbdbget4() which returns an allocated list of records that corresponds to the key you provide. If you decide to take this approach, you should consider whether the memory allocation cost and linked list construction overhead is feasible for your application or not.</p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2009/09/writing-duplicate-keys-to-tokyocabinet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

