Archive

Archive for May, 2009

Storage Engine Dev Journal #2 : Command Line Options

May 22nd, 2009

If you’re working on developing a Drizzle plugin, you may come across situations where you want to accept user options for it at server startup. For example, if you design your plugin to create files for activity logging, you may want to allow the DBA to specify where to write those files out.

In my case, I decided to provide a command line option to BlitzDB for row based query caching. This option is intended for special use-cases where the read/write ratio is 9:1. For those that are interested, row caching is disabled by default because it creates overhead in the engine for read-through logic and cache invalidation _unless_ read requests are significantly higher than update requests.

There are situations where BlitzDB’s row cache can be helpful but this is beyond the scope of this entry so I will save it for another day :)

Adding startup options to your plugin

Drizzle allows you to add command line options to your plugin without editing the server code. But before you start hacking away, there are few not-so-obvious things that you need to understand.

So, let us first look at the data types that your plugin can accept:

  • DRIZZLE_SYSVAR_BOOL
  • DRIZZLE_SYSVAR_STR
  • DRIZZLE_SYSVAR_INT
  • DRIZZLE_SYSVAR_UINT
  • DRIZZLE_SYSVAR_LONG
  • DRIZZLE_SYSVAR_ULONG
  • DRIZZLE_SYSVAR_LONGLONG
  • DRIZZLE_SYSVAR_ULONGLONG
  • DRIZZLE_SYSVAR_ENUM

As you can see, there is a wide range of types that you can choose from. What you should choose depends on what you want to use the value for.

Pick your data type

So lets take my row cache option as an example. Caching over 4 billion rows in one physical server is very unlikely and since we’re not interested in negative numbers, we’re going to pick:

  • DRIZZLE_SYSVAR_UINT

which we can store the value as uint32_t in the plugin.

Declare that your plugin accepts options

Every plugin must declare itself as a plugin which looks like this for BlitzDB:

drizzle_declare_plugin(blitz) {
  "BLITZ",
  "0.3",
  "Toru Maesaka",
  "Non-transactional General Purpose Engine",
  PLUGIN_LICENSE_GPL,
  blitz_init,             /*  Plugin Init      */
  blitz_deinit,           /*  Plugin Deinit    */
  NULL,                   /*  status variables */
  blitz_system_variables, /*  system variables */
  NULL                    /*  config options   */
}
drizzle_declare_plugin_end;

Here, we’re interested in the second last argument which is called blitz_system_variables in the above example. Feel free to call this whatever you like for your plugin.

So what exactly is blitz_system_variables? Its a null-terminated array of system variables that your plugin accepts. This is what it looks like for BlitzDB:

static struct st_mysql_sys_var *blitz_system_variables[] = { 
  DRIZZLE_SYSVAR(row_cache),
  NULL
};

As you can see, BlitzDB only supports one option at the moment so there is only one entry called row_cache.

Define your options

You must define every option that you’ve added to the system variable array. We decided to use DRIZZLE_SYSVAR_UINT earlier and called it row_cache so it is defined like this:

static DRIZZLE_SYSVAR_UINT (
  row_cache, /* option name */
  blitz_row_cache_size, /* variable to set the value to */
  PLUGIN_VAR_READONLY, /* mode */
  N_("Enable row caching for BlitzDB tables."),
  NULL,       /*  check func    */
  NULL,       /*  update func   */
  0,          /*  default value */
  0,          /*  minimum value */
  UINT32_MAX, /*  maximum value */
  0           /*  block size    */
);

The comments pretty much explains what the arguments are but for more details, you should take a look at the macros in drizzled/plugin.h. You could also look at what other plugins do by grepping for the system variable type that you’re interested in.

Test your new startup option

If all goes well you should be able to compile Drizzle and check whether command line options are visible from the plugin. An option takes the following form:

--<name_of_plugin>-<option_name>

So, in the row cache example, row cache can be enabled like this:

/usr/local/sbin/drizzled --blitz-row_cache=10000

Also note that you can replace the underscore with a hyphen:

/usr/local/sbin/drizzled --blitz-row-cache=10000

That’s it! it should be relatively easy to add more options once you successfully get your first one done.

Toru Maesaka drizzle, knowledge, oss ,

Tokyo Cabinet Tip: Protected Database Iteration

May 13th, 2009

Tokyo Cabinet (TC) provides iteration functionality for both it’s persistent and non-persistent data structures. For example, if you wanted to iterate through TC’s hash database, you can use the tchdbiternext() function. This is really straight forward to use such that:

void *key;
int key_len;
 
if (tchdbiterinit(tc_database_handle) != true) {
  /* failed to initialize iterator */
}
 
while ((key = tchdbiternext(tc_database_handle, &key_len)) != NULL) {
  /* work with the fetched key and key_len */
}

will iterate through the entire hash database that “tc_database_handle” object is responsible for. This can be handy if you need to loop through your database for some arbitrary reason.

However, there is a consequence in using this function in a concurrent environment with a use-case where the order of records _really_ matter. This is because even though TC is a thread-safe library, the iteration functions aren’t thread-safe in a way that we expect.

For example, if a write operation occurs while the application iterates over the database, you will end up iterating over a database that is in a changed state. This will not make the cursor go crazy and crash your application since TC handles this internally but you still end up iterating over a database that is in a state that you did not initially intend on looping through.

Solution to this is to simply block write operations to the database while your application iterates through. For example, you could use pthread’s rw_lock to allow other threads to read while you iterate but block writes until you finish iterating.

I was planning on doing this for a table scanner in the storage engine that I’m currently working on but turns out TC has an undocumented function that will take care of this internally. I’ve talked to Mikio about this function and apparently it is intentional that he hasn’t documented it on his specification page. He has no plans on throwing it out so you do not have to worry about it to magically disappear one day. For more information, you can take a look at his header file (tchdb.h for hash database).

Explanation and Simple Example

The function is called tchdbforeach() which will atomically iterate through your database from beginning to the end by supplying each key/value pair to the callback function that you provide. The signature of the callback is the following:

bool callback(const void *kbuf, int ksiz, const void *vbuf,
              int vsiz, void *op);

where the fifth argument, “void *op” is an opaque pointer to the data that you can pass to the callback. Here is a simple example that will increment a counter integer on each iteration using this function:

/* Do whatever you like with the provided key/value pair in here */
bool callback(const void *kbuf, int ksiz, const void *vbuf,
              int vsiz, void *op) {
  if (op == NULL)
    return false;
 
  *((int *)op) += 1;
 
  return true;
}
 
int main(void) {
  int niter = 0;
 
  ...
 
  if (!tchdbforeach(tc_database_handle, callback, &niter)) {
    fprintf(stderr, "failed to iterate the database\n");
    return EXIT_FAILURE;
  }
 
  printf("iterated %d times\n", niter);
 
  ...
 
  return EXIT_SUCCESS:
}

If all goes well, the counter variable will be set to the number of records in the database. This function is slightly more complex than using tchdbiternext() but you are guaranteed to iterate atomically which is pretty important for a table scanner.

I hope this function can help you too.

Toru Maesaka knowledge, oss , ,

Journal of Storage Engine Development on Drizzle

May 12th, 2009

I’ve decided to start a series of blog entries on not-so-obvious findings that I’ve found while working on my new project. By archiving the findings, I’m hoping that I can help those that are looking into developing a storage engine for the MySQL family in the future.

Accumulating these mini-knowledge would also be useful for me since I can refer back to it when I forget something. Also, once I write enough entries I’m planning on summarizing them and making it available on the Drizzle Wiki. If MySQL is interested in updating the engine documentation, I would be more than happy to help there too.

So to begin with, I’ll describe something trivial that I stumbled across while trying to catch an error on duplicate primary key insertion to the data table.

Background

In brief, the database kernel does not care if the INSERT query contains a duplicate primary key for a given table or not. It is the storage engine’s job to tell the kernel that the request was invalid due to key collision. If a storage engine fails to do this, the kernel will acknowledge that the query was successful (given that no other errors were thrown) and will keep doing what it needs to do.

Mechanics

Data insertion is handled inside the write_row() function that your engine must implement. The return value of this function is an integer that represents the status of the work it had done. After looking through the possible error statuses in “drizzled/base.h”, I immediately found this:

#define HA_ERR_FOUND_DUPP_KEY 121 /* Dupplicate key on write */

I also looked through MyISAM and InnoDB to confirm that this was indeed the correct error status to return on duplicate primary key. Here is the snippet of my row insertion at the time:

/* TC's tchdbputkeep will not insert a row to the table if there
   was a collision */
if (tchdbputkeep(data_table, primary_key, primary_key_length, buf,
                 table->s->reclength) == false) {
  my_errno = HA_ERR_GENERIC;
 
  /* check for primary key collision */
  if (tchdbecode(data_table) == TCEKEEP)
    my_errno = HA_ERR_FOUND_DUPP_KEY;
 
  return my_errno;
}

On first glimpse, this seems right but the error I was getting from the command line prompt always differed with MyISAM and InnoDB despite returning the same error status. Specifically, this is what I was getting:

ERROR 1022 (23000): Can't write; duplicate key in table 't1'

whereas I was getting this error on other engines:

ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'

At this stage I couldn’t make sense of what I was doing wrong but it turned out that the solution was pretty simple.

Solution

After talking to Stewart Smith about my issue in #drizzle @ freenode, it turned out I am supposed to keep track of which key the duplication was found in write_row() and inform it to the kernel via the info() function.

You can do this by setting the errkey integer variable to the key number that is used internally by the kernel. So, obtaining the internal primary key number with this call in write_row():

share->errkey = table->s->primary_key;

and adding the following code to info():

if (flag & HA_STATUS_ERRKEY) {
  errkey = share->errkey;
}

happily fixed the issue I was experiencing. Yay.

I guess reading the section on info() in the document gives a hint that this is where you supply the key number on key-error but frankly, this is really easy to forget and miss since the importance isn’t so emphasized.

Anyhow, thats all I have to say in the first of this series and hopefully I’ll write something more interesting in the upcoming entries. Until then, happy hacking ;)

Toru Maesaka drizzle, knowledge, oss , , ,

Playing with Drizzle’s new plugin subsystem

May 7th, 2009

Something notable that has changed in Drizzle this week is the build system for plugins.

Previously we were using the old plugin system that was inherited from MySQL but Drizzle now uses a Python based system that allows us to aggregate your plugin build rules to the top level Makefile. This change also gets rid of the nasty behavior that was giving people like Monty Taylor and other build system hackers heachaches.

But hey, as a plugin developer you’re not so interested in how things are handled cleanly inside right? you’re more interested in how to create or port your plugin over to the new build system! As a developer, you are interested in the following three files:

  • plugin.ini
  • plugin.ac
  • plugin.am

where the mandatory file is plugin.ini. This file is where you write the basic details of the plugin like the name, source files and relevant compiler options. For example this is what plugin.ini looks like for the Blackhole storage engine:

[plugin]
name=blackhole
title=Blackhole Storage Engine
description=Basic Write-only Read-never tables
sources=ha_blackhole.cc
headers=ha_blackhole.h

You can also specify the plugin to be loaded by default with this line: “load_by_default=yes”. If you don’t add this line, the plugin is enabled by specifying it with the “–plugin_load” server startup option.

As for the optional plugin.ac and plugin.am files, these are where you can add your own autoconf and automake rules for the plugin. For example you might want to check/search for a library or build an internal library for your plugin.

Example of Linking an external library

If you write a plugin, you’ll most likely want to link a particular library to your program. After all, thats one of the major points of writing a plugin right? to bring the external goodness over to the database server for solving a particular need/requirement in your application.

For those that are interested, I’ll leave a snippet of how I linked Tokyo Cabinet to the storage engine I am currently working on. Firstly, you want to search whether Tokyo Cabinet exists in the environment that you’re building in. Clearly this is what configure is for so I added this to my plugin.ac:

AC_LIB_HAVE_LINKFLAGS(tokyocabinet,,
  [#include <tchdb.h>],
  [
     TCHDB hdb;
  ])  
  AS_IF([test "x$ac_cv_libtokyocabinet" = "xno"],
        AC_MSG_WARN([tokyocabinet not found: not building plugin.]))
DRIZZLED_PLUGIN_DEP_LIBS="${DRIZZLED_PLUGIN_DEP_LIBS} ${LTLIBTOKYOCABINET}"

The above will check for Tokyo Cabinet and whether the TCHDB structure exists. If it doesn’t exist then it will print a warning but if it does exist, it will add the linker option to plugin dependencies. You can now tell the build system to link Tokyo Cabinet, which you do by assigning the LTLIBTOKYOCABINET variable to ldflags in plugin.ini:

ldflags=${LTLIBTOKYOCABINET}

You could directly write “ldflags = -ltokyocabinet” to plugin.ini but you really want to take advantage of configure. configure (more rather autotools) is your friend.

So, this is all I have to cover in this entry and I hope this entry will be helpful to those that are looking into working on a Drizzle plugin. If you would like more information, the Drizzle Wiki should be updated with more detailed explanation soon.

Toru Maesaka drizzle, oss , ,