Friday, 2 November 2018

Mysql: “INSERT IGNORE” vs “INSERT … ON DUPLICATE KEY UPDATE”

While executing an INSERT statement with many rows, I want to skip duplicate entries that would otherwise cause failure. After some research, my options appear to be the use of either:
  • ON DUPLICATE KEY UPDATE which implies an unnecessary update at some cost, or
  • INSERT IGNORE which implies an invitation for other kinds of failure to slip in unannounced.
Am I right in these assumptions? What's the best way to simply skip the rows that might cause duplicates and just continue on to the other rows?

 Answers


I would recommend using INSERT...ON DUPLICATE KEY UPDATE.
If you use INSERT IGNORE, then the row won't actually be inserted if it results in a duplicate key. But the statement won't generate an error. It generates a warning instead. These cases include:
  • Inserting a duplicate key in columns with PRIMARY KEY or UNIQUE constraints.
  • Inserting a NULL into a column with a NOT NULL constraint.
  • Inserting a row to a partitioned table, but the values you insert don't map to a partition.
If you use REPLACE, MySQL actually does a DELETE followed by an INSERT internally, which has some unexpected side effects:
  • A new auto-increment ID is allocated.
  • Dependent rows with foreign keys may be deleted (if you use cascading foreign keys) or else prevent the REPLACE.
  • Triggers that fire on DELETE are executed unnecessarily.
  • Side effects are propagated to replication slaves too.
correction: both REPLACE and INSERT...ON DUPLICATE KEY UPDATE are non-standard, proprietary inventions specific to MySQL. ANSI SQL 2003 defines a MERGE statement that can solve the same need (and more), but MySQL does not support the MERGE statement.

A user tried to edit this post (the edit was rejected by moderators). The edit tried to add a claim that INSERT...ON DUPLICATE KEY UPDATE causes a new auto-increment id to be allocated. It's true that the new id is generated, but it is not used in the changed row.
See demonstration below, tested with Percona Server 5.5.28. The configuration variable innodb_autoinc_lock_mode=1 (the default):
mysql> create table foo (id serial primary key, u int, unique key (u));
mysql> insert into foo (u) values (10);
mysql> select * from foo;
+----+------+
| id | u    |
+----+------+
|  1 |   10 |
+----+------+

mysql> show create table foo\G
CREATE TABLE `foo` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `u` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `u` (`u`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1

mysql> insert into foo (u) values (10) on duplicate key update u = 20;
mysql> select * from foo;
+----+------+
| id | u    |
+----+------+
|  1 |   20 |
+----+------+

mysql> show create table foo\G
CREATE TABLE `foo` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `u` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `u` (`u`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1
The above demonstrates that the IODKU statement detects the duplicate, and invokes the update to change the value of u. Note the AUTO_INCREMENT=3 indicates an id was generated, but not used in the row.
Whereas REPLACE does delete the original row and inserts a new row, generating and storing a new auto-increment id:
mysql> select * from foo;
+----+------+
| id | u    |
+----+------+
|  1 |   20 |
+----+------+
mysql> replace into foo (u) values (20);
mysql> select * from foo;
+----+------+
| id | u    |
+----+------+
|  3 |   20 |
+----+------+



Something important to add: When using INSERT IGNORE and you do have key violations, MySQL does NOT raise a warning!
If you try for instance to insert 100 records at a time, with one faulty one, you would get in interactive mode:
Query OK, 99 rows affected (0.04 sec)
Records: 100 Duplicates: 1 Warnings: 0
As you see: No Warnings! This behavior is even wrongly described in the official Mysql Documentation.
If your script needs to be informed, if some records have not been added (due to key violations) you have to call mysql_info() and parse it for the "Duplicates" value.



I routinely use INSERT IGNORE, and it sounds like exactly the kind of behavior you're looking for as well. As long as you know that rows which would cause index conflicts will not be inserted and you plan your program accordingly, it shouldn't cause any trouble.



Replace Into seems like an option. Or you can check with
IF NOT EXISTS(QUERY) Then INSERT
This will insert or delete then insert. I tend to go for a IF NOT EXISTS check first.



If using insert ignore having a SHOW WARNINGS; statement at the end of your query set will show a table with all the warnings, including which IDs were the duplicates.

0 comments:

Post a Comment