<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Tokyo Cabinet Tip: Protected Database Iteration</title>
	<atom:link href="http://torum.net/2009/05/tokyo-cabinet-protected-database-iteration/feed/" rel="self" type="application/rss+xml" />
	<link>http://torum.net/2009/05/tokyo-cabinet-protected-database-iteration/</link>
	<description>Hackaholic and a Web Addict based in Tokyo</description>
	<lastBuildDate>Tue, 27 Jul 2010 02:42:30 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Toru Maesaka</title>
		<link>http://torum.net/2009/05/tokyo-cabinet-protected-database-iteration/comment-page-1/#comment-10926</link>
		<dc:creator>Toru Maesaka</dc:creator>
		<pubDate>Fri, 14 May 2010 08:50:14 +0000</pubDate>
		<guid isPermaLink="false">http://torum.net/?p=1688#comment-10926</guid>
		<description>&lt;a href=&quot;#comment-10922&quot; rel=&quot;nofollow&quot;&gt;@bbxiong.xiao&lt;/a&gt; Your loop looks fine. However, getting less records after performing optimization means that your database file is broken (as you mentioned). This is because TC will extract the records that it can access.

My first thought was that the hash chain in your database is broken but the fact that you can directly access your record means that&#039;s unlikely. I&#039;m not sure how you broke the file but if you know the keys that are in the database (e.g. there&#039;s a sequential pattern), I recommend writing a short code that rebuilds a new database file (so copy each record to the new database at a time).

Either that or you could directly ask Mikio Hirabayashi for help/consulting since he&#039;s the sole author of Tokyo Cabinet.</description>
		<content:encoded><![CDATA[<p><a href="#comment-10922" rel="nofollow">@bbxiong.xiao</a> Your loop looks fine. However, getting less records after performing optimization means that your database file is broken (as you mentioned). This is because TC will extract the records that it can access.</p>
<p>My first thought was that the hash chain in your database is broken but the fact that you can directly access your record means that&#8217;s unlikely. I&#8217;m not sure how you broke the file but if you know the keys that are in the database (e.g. there&#8217;s a sequential pattern), I recommend writing a short code that rebuilds a new database file (so copy each record to the new database at a time).</p>
<p>Either that or you could directly ask Mikio Hirabayashi for help/consulting since he&#8217;s the sole author of Tokyo Cabinet.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bbxiong.xiao</title>
		<link>http://torum.net/2009/05/tokyo-cabinet-protected-database-iteration/comment-page-1/#comment-10922</link>
		<dc:creator>bbxiong.xiao</dc:creator>
		<pubDate>Fri, 14 May 2010 06:06:43 +0000</pubDate>
		<guid isPermaLink="false">http://torum.net/?p=1688#comment-10922</guid>
		<description>&lt;a href=&quot;#comment-10865&quot; rel=&quot;nofollow&quot;&gt;@Toru Maesaka&lt;/a&gt; 

Thanks toru for your reply.

Yes, i can get the records which can not be iterated, （for example, i tested a key &quot;149511433&quot;, tchdbget returns the correct value), i dumped the key list using two methods:

1. use tchdbiternext:
    while (true)
    {
        iLen = 0;
        if ((pszKey = (char*)tchdbiternext(pstTCHDB, &amp;iLen)) != NULL)
        {
            printf(&quot;%d\n&quot;, atoi(pszKey));
            free(pszKey);
        }
        else if (pszKey == NULL &amp;&amp; iLen == 0)
        {
            printf(&quot;tchdbiternext2 returns NULL, err=%d, msg=%s\n&quot;, tchdbecode(pstTCHDB), tchdberrmsg(tchdbecode(pstTCHDB)));
            break;
        }
        else if (pszKey == NULL)
        {
            printf(&quot;should never reach here\n&quot;);
        }
    }

2. and the second way:
    if (!tchdbforeach(pstTCHDB, DumpKeyCallback, &amp;niter))
    {
        printf(&quot;tchdbforeach failed\n&quot;);
        return -1;
    }


the result of them are equal, they all dumped 10w+ records, (not include my test key &quot;149511433&quot;).


Also, after doing optimization on the TC hash-db file, the records number reduced to 10w+ and the file size falls sharply.(before optimize, i uses tchmgr which shows my records number is 879w+)

I checked all my TC files, half of them are &quot;broken&quot; (records number falls sharply after optimization).

Yes, this is strange, i also performed some tests before deploying TT+TC in product environment, So i&#039;m trying to find out whether there is someone else met the same problem.</description>
		<content:encoded><![CDATA[<p><a href="#comment-10865" rel="nofollow">@Toru Maesaka</a> </p>
<p>Thanks toru for your reply.</p>
<p>Yes, i can get the records which can not be iterated, （for example, i tested a key &#8220;149511433&#8243;, tchdbget returns the correct value), i dumped the key list using two methods:</p>
<p>1. use tchdbiternext:<br />
    while (true)<br />
    {<br />
        iLen = 0;<br />
        if ((pszKey = (char*)tchdbiternext(pstTCHDB, &amp;iLen)) != NULL)<br />
        {<br />
            printf(&#8220;%d\n&#8221;, atoi(pszKey));<br />
            free(pszKey);<br />
        }<br />
        else if (pszKey == NULL &amp;&amp; iLen == 0)<br />
        {<br />
            printf(&#8220;tchdbiternext2 returns NULL, err=%d, msg=%s\n&#8221;, tchdbecode(pstTCHDB), tchdberrmsg(tchdbecode(pstTCHDB)));<br />
            break;<br />
        }<br />
        else if (pszKey == NULL)<br />
        {<br />
            printf(&#8220;should never reach here\n&#8221;);<br />
        }<br />
    }</p>
<p>2. and the second way:<br />
    if (!tchdbforeach(pstTCHDB, DumpKeyCallback, &amp;niter))<br />
    {<br />
        printf(&#8220;tchdbforeach failed\n&#8221;);<br />
        return -1;<br />
    }</p>
<p>the result of them are equal, they all dumped 10w+ records, (not include my test key &#8220;149511433&#8243;).</p>
<p>Also, after doing optimization on the TC hash-db file, the records number reduced to 10w+ and the file size falls sharply.(before optimize, i uses tchmgr which shows my records number is 879w+)</p>
<p>I checked all my TC files, half of them are &#8220;broken&#8221; (records number falls sharply after optimization).</p>
<p>Yes, this is strange, i also performed some tests before deploying TT+TC in product environment, So i&#8217;m trying to find out whether there is someone else met the same problem.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Toru Maesaka</title>
		<link>http://torum.net/2009/05/tokyo-cabinet-protected-database-iteration/comment-page-1/#comment-10865</link>
		<dc:creator>Toru Maesaka</dc:creator>
		<pubDate>Wed, 12 May 2010 05:27:11 +0000</pubDate>
		<guid isPermaLink="false">http://torum.net/?p=1688#comment-10865</guid>
		<description>&lt;a href=&quot;#comment-10856&quot; rel=&quot;nofollow&quot;&gt;@bbxiong.xiao&lt;/a&gt; Hi! that&#039;s odd. Can you reach the records that can&#039;t be iterated through? For example, with tchdbget(). I&#039;m currently doing some tests with n million records and my table scanner works 100% as expected.

It would help if you could either email or write your loop condition here.</description>
		<content:encoded><![CDATA[<p><a href="#comment-10856" rel="nofollow">@bbxiong.xiao</a> Hi! that&#8217;s odd. Can you reach the records that can&#8217;t be iterated through? For example, with tchdbget(). I&#8217;m currently doing some tests with n million records and my table scanner works 100% as expected.</p>
<p>It would help if you could either email or write your loop condition here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bbxiong.xiao</title>
		<link>http://torum.net/2009/05/tokyo-cabinet-protected-database-iteration/comment-page-1/#comment-10856</link>
		<dc:creator>bbxiong.xiao</dc:creator>
		<pubDate>Tue, 11 May 2010 17:25:43 +0000</pubDate>
		<guid isPermaLink="false">http://torum.net/?p=1688#comment-10856</guid>
		<description>Hi, toru, Actually, i can not dump all my keys in either way, my TC/TT is running over 2 months, the records number is over 7,000,000 (uses &quot;tchmgr inform&quot;, i can see &quot;record number: 7xxxxxx&quot; clearly), when i tried to dump all keys inside TC, either way give the correct result, only 100, 000 records dumped and TC begins to complain &quot;no more records&quot;, i just can not believe it. My TC/TT runs well in getting/setting records, but i just can not dump all my keys!
I tried to optimize my TC file(maybe my TC file is broken?) and it doesn&#039;t work.

Do you have any clue what happened?</description>
		<content:encoded><![CDATA[<p>Hi, toru, Actually, i can not dump all my keys in either way, my TC/TT is running over 2 months, the records number is over 7,000,000 (uses &#8220;tchmgr inform&#8221;, i can see &#8220;record number: 7xxxxxx&#8221; clearly), when i tried to dump all keys inside TC, either way give the correct result, only 100, 000 records dumped and TC begins to complain &#8220;no more records&#8221;, i just can not believe it. My TC/TT runs well in getting/setting records, but i just can not dump all my keys!<br />
I tried to optimize my TC file(maybe my TC file is broken?) and it doesn&#8217;t work.</p>
<p>Do you have any clue what happened?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: tmaesaka</title>
		<link>http://torum.net/2009/05/tokyo-cabinet-protected-database-iteration/comment-page-1/#comment-1195</link>
		<dc:creator>tmaesaka</dc:creator>
		<pubDate>Thu, 14 May 2009 08:15:35 +0000</pubDate>
		<guid isPermaLink="false">http://torum.net/?p=1688#comment-1195</guid>
		<description>Agreed, it would suck to burn memory just for that :(

So, by meaning key refetch, does the core do this with rnd_next()? If so, I would have thought a refetch would be beyond rnd_next()&#039;s responsibility...

We could keep track of how many times the core iterated inside the engine and return the same key as it returned previously. Either that or we could keep a copy of the &quot;previous key&quot; and fetch the value, which could be more computation friendly.

I&#039;ll ping you on IRC about this when I come across it.</description>
		<content:encoded><![CDATA[<p>Agreed, it would suck to burn memory just for that :(</p>
<p>So, by meaning key refetch, does the core do this with rnd_next()? If so, I would have thought a refetch would be beyond rnd_next()&#8217;s responsibility&#8230;</p>
<p>We could keep track of how many times the core iterated inside the engine and return the same key as it returned previously. Either that or we could keep a copy of the &#8220;previous key&#8221; and fetch the value, which could be more computation friendly.</p>
<p>I&#8217;ll ping you on IRC about this when I come across it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian Aker</title>
		<link>http://torum.net/2009/05/tokyo-cabinet-protected-database-iteration/comment-page-1/#comment-1186</link>
		<dc:creator>Brian Aker</dc:creator>
		<pubDate>Wed, 13 May 2009 14:49:15 +0000</pubDate>
		<guid isPermaLink="false">http://torum.net/?p=1688#comment-1186</guid>
		<description>Can the key be refetched? MySQL/Drizzle can ask for a key/value a second in cases of ORDER BY. It is expected that the engine can provide the data from a second request. There is row cache that can be used to solve this but if it has to save the entire table there can be performance issues.</description>
		<content:encoded><![CDATA[<p>Can the key be refetched? MySQL/Drizzle can ask for a key/value a second in cases of ORDER BY. It is expected that the engine can provide the data from a second request. There is row cache that can be used to solve this but if it has to save the entire table there can be performance issues.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
