Wednesday, 25 July 2018

Delete all duplicate rows in MySQL

In this post, I am sharing a demonstration on how to remove all rows except for one in MySQL.
Before a few days ago, I was working in one of our production reports and found that there are a lot of duplicate records.
This table has more than 2,00,00,000 records and I removed duplicate based upon two varchar column and required to store one record for each group.
I prepared a demo with a different solution because I have also to take care of performance.
The Solutions are:
Using Self – Join to check every next row with current row and create a delete flag for duplicate rows.
Swapping of the table – copy unique records in the temp table and restore back to the original table after deleting of duplicate records.
But this didn’t work for me because this solution requires a massive amount of DML operation which will degrade performance.

After some research, I found that MySQL has a feature called ALTER IGNORE.
Using ALTER IGNORE, you can apply unique key on those columns which required for removing duplicates. Once you execute this DDL command, it creates a unique key constraint and removes all duplicate records from the database.
This is an excellent option in MySQL and also worked for me.
Let me demonstrate this:
First, create demo table and data for varchar datatype:
SET old ALTER table behavior:
Execute ALTER IGNORE:
Find result:
MySQL Alter Ignore Statement
As you can see in the above result, all duplicate records removed.

If your column has a BLOB data type, you cannot directly insert any unique key constraint on it.
If you have a BLOB data type column like TEXT, so temporary, you have to create one MD5 column and use this MD5 column in ALTER IGNORE.
Below is a demonstration for deleting BLOB duplicate records.
First, create a demo table and data for TEXT datatype:
Create MD5 column for EmpID(Integer) + EmpName(Text):
Execute ALTER IGNORE: 
Now check the result:
Remove MD5 column:

0 comments:

Post a Comment