Monday, 27 August 2018

MySQLdb does not return all arguments converted with & ldquo; on the update and rdquo of the duplicate key;


With MySQLdb package in python, I want to insert records with checking some unique keys. The method I used is executemany. The arguments are sql sentence and a tuple. But when I executed it, it raised an error which said "not all argument converted". The codes are as following:
dData = [[u'Daniel', u'00-50-56-C0-00-12', u'Daniel']]
sql = "INSERT INTO app_network_white_black_list (biz_id, shop_id, type, mac_phone, remarks, create_time) " \
      "VALUES ({bsid}, {shop_id}, {type}, %s, %s, NOW()) " \
      "ON DUPLICATE KEY UPDATE type={type}, remarks=%s, create_time=NOW()".format(bsid=bsid, shop_id=shop_id, type=dType)
cur.executemany(sql, tuple(dData))

Someone said this is a bug. But they didn't give me a path to jump over it. Please provide a method if this is a bug.

What's going wrong

After checking the link in your comment below and doing some more research and testing, I was able to reproduce the error with MySQLdb versions 1.2.4b4 and 1.2.5. As explained in unubtu's answer, this has to do with the limitations of a regular expression that appears in cursors.py. The exact regular expression is slightly different in each release, probably because people keep finding cases it doesn't handle and adjusting the expression instead of looking for a better approach entirely.
What the regular expression does is try to match the VALUES ( ... ) clause of the INSERT statement and identify the beginning and end of the tuple expression it contains. If the match succeeds, executemany tries to convert the single-row insert statement template into a multiple-row insert statement so that it runs faster. I.e., instead of executing this for every row you want to insert:
INSERT INTO table
  (foo, bar, ...)
VALUES
  (%s, %s, ...);

It tries to rewrite the statement so that it only has to execute once:
INSERT INTO table
  (foo, bar, ...)
VALUES
  (1, 2, ...),
  (3, 4, ...),
  (5, 6, ...),
  ...;

The problem you're running into is that executemany assumes you only have parameter placeholders in the tuple immediately after VALUES. When you also have placeholders later on, it takes this:
INSERT INTO table
  (foo, bar, ...)
VALUES
  (%s, %s, ...)
ON DUPLICATE KEY UPDATE baz=%s;

And tries to rewrite it like this:
INSERT INTO table
  (foo, bar, ...)
VALUES
  (1, 2, ...),
  (3, 4, ...),
  (5, 6, ...),
  ...
ON DUPLICATE KEY UPDATE baz=%s;

The problem here is that MySQLdb is trying to do string formatting at the same time that it's rewriting the query. Only the VALUES ( ... ) clause needs to be rewritten, so MySQLdb tries to put all your parameters into the matching group (%s, %s, ...), not realizing that some parameters need to go into the UPDATE clause instead.
If you only send parameters for the VALUES clause to executemany, you'll avoid the TypeError but run into a different problem. Notice that the rewritten INSERT ... ON DUPLICATE UPDATE query has numeric literals in the VALUES clause, but there's still a %s placeholder in the UPDATE clause. That's going to throw a syntax error when it reaches the MySQL server.
When I first tested your sample code, I was using MySQLdb 1.2.3c1 and couldn't reproduce your problem. Amusingly, the reason that particular version of the package avoids these problems is that the regular expression is broken and doesn't match the statement at all. Since it doesn't match, executemany doesn't attempt to rewrite the query, and instead just loops through your parameters calling execute repeatedly.

What to do about it

First of all, don't go back and install 1.2.3c1 to make this work. You want to be using updated code where possible.
You could move to another package, as unubtu suggests in the linked Q&A, but that would involve some amount of adjustment and possibly changes to other code.
What I would recommend instead is to rewrite your query in a way that is more straightforward and takes advantage of the VALUES() function in your UPDATE clause. This function allows you to refer back to the values that you would have inserted in the absence of a duplicate key violation, by column name (examples are in the MySQL docs).
With that in mind, here's one way to do it:
dData = [[u'Daniel', u'00-50-56-C0-00-12', u'Daniel']]  # exact input you gave

sql = """
INSERT INTO app_network_white_black_list
  (biz_id, shop_id, type, mac_phone, remarks, create_time)
VALUES
  (%s, %s, %s, %s, %s, NOW())
ON DUPLICATE KEY UPDATE
  type=VALUES(type), remarks=VALUES(remarks), create_time=VALUES(create_time);
"""  # keep parameters in one part of the statement

# generator expression takes care of the repeated values
cur.executemany(sql, ((bsid, shop_id, dType, mac, rem) for mac, rem in dData))

This approach should work because there are no parameters in the UPDATE clause, meaning MySQLdb will be able to successfully convert the single-line insert template with parameters into a multi-line insert statement with literal values.
Some things to note:
  • You don't have to supply a tuple to executemany; any iterable is fine.
  • Multiline strings make for much more readable SQL statements in your Python code than implicitly concatenated strings; when you separate the statement from the string delimiters, it's easy to quickly grab the statement and copy it into a client application for testing.
  • If you're going to parameterize part of your query, why not parameterize all of your query? Even if only part of it is user input, it's more readable and maintainable to handle all your input values the same way.
  • That said, I didn't parameterize NOW(). My preferred approach here would be to use CURRENT_TIMESTAMP as the column default and take advantage of DEFAULT in the statement. Others might prefer to generate this value in the application and supply it as a parameter. If you're not worried about version compatibility, it's probably fine as-is.
  • If you can't avoid having parameter placeholders in the UPDATE clause – e.g., because the UPDATE value(s) can't be hard-coded in the statement or derived from the VALUES tuple – you'll have to iterate over execute instead of using executemany.

0 comments:

Post a Comment