<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Toru Maesaka &#187; tip</title>
	<atom:link href="http://torum.net/tag/tip/feed/" rel="self" type="application/rss+xml" />
	<link>http://torum.net</link>
	<description>Hackaholic and a Web Addict based in Tokyo</description>
	<lastBuildDate>Thu, 22 Jul 2010 09:59:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Tokyo Cabinet Tip: Protected Database Iteration</title>
		<link>http://torum.net/2009/05/tokyo-cabinet-protected-database-iteration/</link>
		<comments>http://torum.net/2009/05/tokyo-cabinet-protected-database-iteration/#comments</comments>
		<pubDate>Wed, 13 May 2009 06:29:17 +0000</pubDate>
		<dc:creator>Toru Maesaka</dc:creator>
				<category><![CDATA[knowledge]]></category>
		<category><![CDATA[oss]]></category>
		<category><![CDATA[storage]]></category>
		<category><![CDATA[tip]]></category>
		<category><![CDATA[tokyocabinet]]></category>

		<guid isPermaLink="false">http://torum.net/?p=1688</guid>
		<description><![CDATA[Tokyo Cabinet (TC) provides iteration functionality for both it&#8217;s persistent and non-persistent data structures. For example, if you wanted to iterate through TC&#8217;s hash database, you can use the tchdbiternext() function. This is really straight forward to use such that: void *key; int key_len; &#160; if &#40;tchdbiterinit&#40;tc_database_handle&#41; != true&#41; &#123; /* failed to initialize iterator [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://tokyocabinet.sourceforge.net">Tokyo Cabinet</a> (TC) provides iteration functionality for both it&#8217;s persistent and non-persistent data structures. For example, if you wanted to iterate through TC&#8217;s hash database, you can use the  tchdbiternext() function. This is really straight forward to use such that:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #993333;">void</span> <span style="color: #339933;">*</span>key<span style="color: #339933;">;</span>
<span style="color: #993333;">int</span> key_len<span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>tchdbiterinit<span style="color: #009900;">&#40;</span>tc_database_handle<span style="color: #009900;">&#41;</span> <span style="color: #339933;">!=</span> <span style="color: #000000; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #808080; font-style: italic;">/* failed to initialize iterator */</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #b1b100;">while</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span>key <span style="color: #339933;">=</span> tchdbiternext<span style="color: #009900;">&#40;</span>tc_database_handle<span style="color: #339933;">,</span> <span style="color: #339933;">&amp;</span>key_len<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">!=</span> NULL<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #808080; font-style: italic;">/* work with the fetched key and key_len */</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>will iterate through the entire hash database that &#8220;tc_database_handle&#8221; object is responsible for. This can be handy if you need to loop through your database for some arbitrary reason.</p>
<p>However, there is a consequence in using this function in a concurrent environment with a use-case where the order of records _really_ matter. This is because even though TC is a thread-safe library, the iteration functions aren&#8217;t thread-safe in a way that we expect.</p>
<p>For example, if a write operation occurs while the application iterates over the database, you will end up iterating over a database that is in a changed state. This will not make the cursor go crazy and crash your application since TC handles this internally but you still end up iterating over a database that is in a state that you did not initially intend on looping through.</p>
<p>Solution to this is to simply block write operations to the database while your application iterates through. For example, you could use pthread&#8217;s <a href="http://en.wikipedia.org/wiki/Readers-writer_lock">rw_lock</a> to allow other threads to read while you iterate but block writes until you finish iterating.</p>
<p>I was planning on doing this for a table scanner in the <a href="https://launchpad.net/blitzdb">storage engine</a> that I&#8217;m currently working on but turns out TC has an undocumented function that will take care of this internally. I&#8217;ve talked to Mikio about this function and apparently it is intentional that he hasn&#8217;t documented it on his <a href="http://tokyocabinet.sourceforge.net/spex-en.html">specification page</a>. He has no plans on throwing it out so you do not have to worry about it to magically disappear one day. For more information, you can take a look at his header file (tchdb.h for hash database).</p>
<h4>Explanation and Simple Example</h4>
<p>The function is called <strong>tchdbforeach()</strong> which will atomically iterate through your database from beginning to the end by supplying each key/value pair to the callback function that you provide. The signature of the callback is the following:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;">bool callback<span style="color: #009900;">&#40;</span><span style="color: #993333;">const</span> <span style="color: #993333;">void</span> <span style="color: #339933;">*</span>kbuf<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> ksiz<span style="color: #339933;">,</span> <span style="color: #993333;">const</span> <span style="color: #993333;">void</span> <span style="color: #339933;">*</span>vbuf<span style="color: #339933;">,</span>
              <span style="color: #993333;">int</span> vsiz<span style="color: #339933;">,</span> <span style="color: #993333;">void</span> <span style="color: #339933;">*</span>op<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>where the fifth argument, &#8220;void *op&#8221; is an opaque pointer to the data that you can pass to the callback. Here is a simple example that will increment a counter integer on each iteration using this function:</p>

<div class="wp_syntax"><div class="code"><pre class="c" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">/* Do whatever you like with the provided key/value pair in here */</span>
bool callback<span style="color: #009900;">&#40;</span><span style="color: #993333;">const</span> <span style="color: #993333;">void</span> <span style="color: #339933;">*</span>kbuf<span style="color: #339933;">,</span> <span style="color: #993333;">int</span> ksiz<span style="color: #339933;">,</span> <span style="color: #993333;">const</span> <span style="color: #993333;">void</span> <span style="color: #339933;">*</span>vbuf<span style="color: #339933;">,</span>
              <span style="color: #993333;">int</span> vsiz<span style="color: #339933;">,</span> <span style="color: #993333;">void</span> <span style="color: #339933;">*</span>op<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>op <span style="color: #339933;">==</span> NULL<span style="color: #009900;">&#41;</span>
    <span style="color: #b1b100;">return</span> <span style="color: #000000; font-weight: bold;">false</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #339933;">*</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #993333;">int</span> <span style="color: #339933;">*</span><span style="color: #009900;">&#41;</span>op<span style="color: #009900;">&#41;</span> <span style="color: #339933;">+=</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #b1b100;">return</span> <span style="color: #000000; font-weight: bold;">true</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
<span style="color: #993333;">int</span> main<span style="color: #009900;">&#40;</span><span style="color: #993333;">void</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #993333;">int</span> niter <span style="color: #339933;">=</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">;</span>
&nbsp;
  ...
&nbsp;
  <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">!</span>tchdbforeach<span style="color: #009900;">&#40;</span>tc_database_handle<span style="color: #339933;">,</span> callback<span style="color: #339933;">,</span> <span style="color: #339933;">&amp;</span>niter<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    fprintf<span style="color: #009900;">&#40;</span>stderr<span style="color: #339933;">,</span> <span style="color: #ff0000;">&quot;failed to iterate the database<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">return</span> EXIT_FAILURE<span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #000066;">printf</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;iterated %d times<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">,</span> niter<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  ...
&nbsp;
  <span style="color: #b1b100;">return</span> EXIT_SUCCESS<span style="color: #339933;">:</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>If all goes well, the counter variable will be set to the number of records in the database. This function is slightly more complex than using tchdbiternext() but you are guaranteed to iterate atomically which is pretty important for a table scanner.</p>
<p>I hope this function can help you too.</p>
]]></content:encoded>
			<wfw:commentRss>http://torum.net/2009/05/tokyo-cabinet-protected-database-iteration/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
