Drizzle’s String Library Diet
Lately I’ve been spending most of my time with Drizzle working towards the Cirrus milestone. Specifically speaking, I’ve been slowly standardizing the codebase by throwing out lots of code in MySQL’s string library and replacing them with appropriate libc and C++ alternatives.
You see, back in the 80s MySQL had reinvented a lot of the string functionalities provided by libc for reasons that I do not know (because it was before my time). Turns out that most of the code is still in use today and I guess there was a good reason back in the day but nowadays this doesn’t seem to make much sense, since:
- Despite the criticisms, glibc works darn well.
- The priority of optimizing library functions is much higher for standard library developers than it is for you as an application developer.
- Using the standard library also helps new Drizzle community developers understand the codebase much faster from seeing functions that they are already familiar with.
Arguably, being returned a pointer to the terminating NULL like most of MySQL functions makes string appending slightly easier but if you ask me, many people (including myself) are not comfortable with this and it makes the codebase look weird, IMHO. An example of this is having to rewind the pointer when passing the string to a third-party function.
Benefits gained from narrowing to UTF-8
Because UTF-8 is the prominent encoding in the areas that we are targeting (web and the cloud), currently Drizzle uses only UTF-8 for its internal representation. So needless to say, support for anything other than UTF-8 were thrown out from the library which helped reduce the size of the library greatly.
Interested in how much slimmer the Drizzle string library is compared to the original one in MySQL 5.1? To illustrate the difference, here are the results from counting the files and lines:
$ wc -l mysql-5.1.30/strings/*.c ... 96798 total $ ll mysql-5.1.30/strings/ | wc -l 78
$ wc -l drizzle/mystrings/*.cc ... 24634 total $ ll drizzle/mystrings/ | wc -l 31
AWESOME.

hey, I think I’m doing pretty much the same thing in my libmyisam attempt as you do above. I have successfully reduced ctype-*.c into ctype-utf8.c.
take a look at the following if you are interested. note that mystrings and mysys are incorporated into mysys.
http://code.launchpad.net/~moriyoshi/+junk/libmyisam