Introducing Digest::MurmurHash for Perl
I noticed that an interface to the MurmurHash algorithm wasn’t available on CPAN so I quickly whipped up an XS module for it and it is now shipped to CPAN. The module itself is very simple and only exports one function that will return a corresponding uint32_t value for the original data that you feed. All you need is a C99 compliant compiler and a build environment to get this module up and running (and Perl of course).
http://search.cpan.org/~tmaesaka/Digest-MurmurHash-0.10/
If you’ve never heard of MurmurHash, it is a hash function that provides excellent speed, collision resistance and distribution characteristics. If you’re interested in the details, Appleby’s experiment result is a good place to look at.
To illustrate the speed of this module, I compared it with various other hash modules on CPAN in the following way, and resulted in this outcome.
#!/usr/bin/perl use strict; use warnings; use Benchmark qw(timethese); use Digest::FNV qw(fnv); use Digest::JHash qw(jhash); use Digest::MD5 qw(md5); use Digest::MurmurHash qw(murmur_hash); use Digest::Pearson qw(pearson); use Digest::SHA1 qw(sha1); use String::CRC32; my $data = "some_random_string_to_hash"; timethese(100000000, { crc32 => "crc32($data)", fowler => "fnv($data)", jenkins => "jhash($data)", md5 => "md5($data)", murmur => "murmur_hash($data)", pearson => "pearson($data)", sha1 => "sha1($data)", });
To be honest, being the fastest in this particular benchmark isn’t so important but I was glad to see that it is fast enough to be a possible candidate for those that are looking for a hash function in Perl. FYI, looks like the folks at Ruby are considering the algorithm (watch out, it’s in Japanese!) for their internal hash function.
Oh, and this happens to be my first ever CPAN module too! Thanks to lyokato and bonnu for guiding me through the process of packaging and shipping the module and tokuhirom for helping me with XS issues.
I know, its a tiny module but it was enough to get me excited :)
