Archive

Posts Tagged ‘parser’

Progress on the SQL Parser work

March 16th, 2009

Continued from my previous post on making the SQL parser pluggable in Drizzle.

So despite mentioning that I wanted to get the prototype done by the MySQL Users Conference 09 in april, it only took three motivated days to get what I stated done. One day on code reading (grasping the execution flow), one day on hacking it, and another day on testing and debugging. If you’re interested in the outcome, you can see the branch on Launchpad.

Pushing out the SQL parser from the core was easy since all I had to do was override the mysql_parse() entry point. The only issue I came across was how MySQL is designed to execute the query inside the parser. Needless to say, this was troublesome since a given component must be completely decoupled from the system for it to be separated.

To get around this obstacle, I ended up ripping out the query execution routine from the parser and to compensate this, I introduced a new wrapper function in the Drizzle core called sql_parse_and_execute().

static void sql_parse_and_execute(Session *session, const char *query,
                                  const size_t query_len,
                                  const char **found_semicolon)
{
  bool error= sql_parse(session, query, query_len, found_semicolon);
 
  if (!error && !session->is_error())
  {
    mysql_execute_command(session);
  }
 
  clean_parsed_tree(session);
}

Admittedly this solution is rough and I wasn’t 100% confident but it passed all the test cases so I pushed it to my experimental branch and threw it at the list anyway. This was a really good move since it brought up various discussions and the fundamental question of,

“is MySQL doing the right thing?”

from fellow Drizzle and MySQL developers.

As I previously mentioned, the current SQL Parser does too much by design. Ideally a parser should just create and return a Parse Tree from the query it was given. The core would then do whatever it needs to do with the tree (like create a execution plan) and free it when it’s done. Whether the execution planner should be part of the Parser Module or not is a different story of course.

The current parser is also tightly coupled with the Session object (known as THD class in MySQL) which needs to be dealt with. There are many other issues pointed out by the community and for those that are interested, you can view the thread here:

Doesn’t this show how transparent the Drizzle project is? :)

Admittedly, I’m too inexperienced at this stage to go any further on my own so I’m now planning on working closely with the veterans in the community and slowly learn as I go.

Happy Hacking!

Toru Maesaka drizzle, oss , ,

Next Mission, Pluggable SQL Parser

February 25th, 2009

Lately I’ve been fairly busy tackling bugs in Drizzle that I wanted to fix before we start rolling out tarballs which should be announced by the community soon. I’ve now committed fixes for those bugs so I am going to spend the spare time I’ve gained on something else, namely making the SQL parser pluggable.

It is a known fact that most of the time spent in processing a query in MySQL and Drizzle is in the SQL parser and naturally, many people are eager to improve it. For example, take a look at this blog post by Jay on how the current parser is expensive.

When people approach me about Drizzle in Japan, most people seem to request a pluggable query parser as well, simply because they know that it can be improved. So it makes sense for Drizzle, as a project to provide a easy way to improve the damn thing without forcing the developers to study extra things like server architecture and concurrency control (though it’s nice to know these things!).

The first version is going to be very simple. Everything behind the current entry point of the SQL Parser will be modularized and pushed out from the core, leaving only the plugin hook. Ultimately, a parser module will only have to do the following:

  • Check if the given SQL string is malicious
  • Process/Parse the given SQL string
  • Update/Populate the Session object (known as THD in MySQL)

As for the interface, I’m suspecting that it will remain the same as the mysql_parse() function :

bool sql_parse(Session *session, const char *query, const size_t query_len,
               const char **found_semicolon);

But I can’t say for sure at this stage of course. Also as mentioned by Brian, we ideally need a multi-stage interface that will take care of parser failure, which I’m hoping to introduce in the second version.

This task is really high in my priority queue so hopefully this entry will help pressure me into concentrating on it (I’m terrible at this since I tend to hop between OSS projects).

My goal is to get the first version done by the MySQL Conference in Santa Clara, CA in april.

Toru Maesaka drizzle, oss , ,