Tuesday, 6 October 2015

Learning MySQL: Find In Set and Bulk Insert Options

With a number of different data-driven projects on my plate both at home and at work, I find myself ever more impressed with the built-in abilities of MySQL.  Like many open source software products, it’s available at no cost as long as it’s used under the guidelines of the provided OS license.  But MySQL has proven again and again that it’s no ordinary piece of OS software.  Where other database software companies are charging hundreds or thousands of dollars for licensing fees, MySQL continues to offer enterprise-level features and performance without the typically-associated costs.

I’m a bit of a newcomer to MySQL, I’ll admit.  I only jumped aboard the PHP-MySQL ship some three years back.  Since that time, the speed and depth of my learning has varied from day to day.  Much of what I learn is based on the particular requirements of the project at hand.  The more demanding and intricate the project, the more digging I’m forced to do.  Not to sound to “self-help-ish”, but the end result of this process is typically a better, more knowledgeable me.  Some of the oddest requests in the past have actually turned out to be some of the coolest features in an app.  And I’m learning more in the process so it’s really a win-win.

In a never-ending effort to improve the “product”, I am always seeking new knowledge.  From a purely logical standpoint, the more I know, the better the application should, in turn, operate and perform.  As an example, recent changes to my Scaffolder utility resulted in a highly-improved product and some cool new tools for my coding arsenal.  A coworker of mine recently pointed out the possibility of performing multiple record inserts using a single SQL statement.  Although the concept seemed like a good one, I’d always assumed this was not possible.  As it turns out, MySQL has built-in support for this very feature.  Some brief testing resulted in a startling revelation: the performance difference between inserting 1000 records as separate queries versus a single query string was absolutely amazing!

For those of you who are unfamiliar with this built-in capability, I highly recommend reading up on it and then immediately converting any of your bulk insert queries to use this new format.  The basic gist from the MySQL site is as follows:

    INSERT statements that use VALUES syntax can insert multiple rows. To do this, include multiple lists of column values, each enclosed within parentheses and separated by commas. The values list for each row must be enclosed within parentheses. Example:

    INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9)

Another little tip I recently picked up was the FIND_IN_SET option.  This built-in function allows you to search and match a comma-separated string of values (saved within a database field) for one particular value.  Up ’til now, I had relied on the following alternative, but much more complicated, syntax:

    field REGEXP CONCAT('(^|,)(',REPLACE(value,',','|'),')($|,)')

FIND_IN_SET accomplishes the exact same thing without the overly-bloated syntax and unnecessary conversions/operations. The above statement can easily be converted to use FIND_IN_SET in the following manner:

    FIND_IN_SET(value,field)

I don’t have the tools required to determine whether there is a significant difference in performance between these two methods. But based on what I’ve heard about MySQL’s REGEXP capabilities, combined with the reduction in syntax and operations, I’m assuming the latter option will result in a speedier delivery.  I did test the difference between multiple individual insert statements and a single bulk insert statement and the performance difference was night and day.  Where a large amount of inserts are required, the single, bulk insert statement will reduce performance overhead tremendously, saving developers and end users a lot of time.

There’s not a whole lot more detail on either of these two options but if you’re interested in viewing the documentation direct from the source you can read about these options using the following links:

    MySQL’s BULK INSERT
    MySQL’s FIND_IN_SET

Since learning of these new options, the Scaffolder project has been updated accordingly, incorporating both of these options into its source code. The performance of the Bulk Insert option within Scaffolder is now significantly improved. Testing showed that 10,000 random and data-type specific records could be inserted in roughly 40 seconds flat. And those tests were run on a development desktop platform. Server performance would undoubtedly be even better.

I guess my lesson here is that there’s never a point where we stop learning. My projects provide me with a guide and driver that takes me to new places each and every day. And I sincerely hope the newness never goes away. I love a challenge and constantly learning something new. I’ve given up on the idea of “catching up” with the existing development community. I realize now that this process is constantly evolving and will never really reach a stopping point. I guess that’s what makes it so much fun.

0 comments:

Post a Comment