In this post, I am sharing a demonstration on how to remove all rows except for one in MySQL.
Before a few days ago, I was working in one of our production reports and found that there are a lot of duplicate records.This table has more than 2,00,00,000 records and I removed duplicate based upon two varchar column and required to store one record for each group.
I prepared a demo with a different solution because I have also to take care of performance.
The Solutions are:
Using Self – Join to check every next row with current row and create a delete flag for duplicate rows.
Swapping of the table – copy unique records in the temp table and restore back to the original table after deleting of duplicate records.
Swapping of the table – copy unique records in the temp table and restore back to the original table after deleting of duplicate records.
But this didn’t work for me because this solution requires a massive amount of DML operation which will degrade performance.
After some research, I found that MySQL has a feature called ALTER IGNORE.
Using ALTER IGNORE, you can apply unique key on those columns which required for removing duplicates. Once you execute this DDL command, it creates a unique key constraint and removes all duplicate records from the database.
Using ALTER IGNORE, you can apply unique key on those columns which required for removing duplicates. Once you execute this DDL command, it creates a unique key constraint and removes all duplicate records from the database.
This is an excellent option in MySQL and also worked for me.
Let me demonstrate this:
First, create demo table and data for varchar datatype:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
CREATE DATABASE Employee;
CREATE TABLE Employee.tbl_DuplicateEmployee_VarcharDatatype
(
EmpID INTEGER
,EmpName VARCHAR(250)
);
INSERT INTO Employee.tbl_DuplicateEmployee_VarcharDatatype
VALUES
(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR');
|
SET old ALTER table behavior:
1
|
SET SESSION old_alter_table=1;
|
Execute ALTER IGNORE:
1
2
3
4
|
ALTER IGNORE
TABLE Employee.tbl_DuplicateEmployee_VarcharDatatype
ADD UNIQUE INDEX idx_tbl_DuplicateEmployee_VarcharDatatype
(EmpID,EmpName);
|
Find result:
1
2
|
SELECT *FROM
Employee.tbl_DuplicateEmployee_VarcharDatatype;
|
As you can see in the above result, all duplicate records removed.
If your column has a BLOB data type, you cannot directly insert any unique key constraint on it.
If you have a BLOB data type column like TEXT, so temporary, you have to create one MD5 column and use this MD5 column in ALTER IGNORE.
Below is a demonstration for deleting BLOB duplicate records.
First, create a demo table and data for TEXT datatype:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
CREATE TABLE Employee.tbl_DuplicateEmployee_BLOBDataType
(
EmpID INTEGER
,EmpName TEXT
);
INSERT INTO Employee.tbl_DuplicateEmployee_BLOBDataType
VALUES
(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR')
,(1,'ABC'),(2,'XYZ'),(3,'PQR');
|
Create MD5 column for EmpID(Integer) + EmpName(Text):
1
2
3
4
|
ALTER TABLE Employee.tbl_DuplicateEmployee_BLOBDataType ADD MD5Column VARCHAR(50);
UPDATE Employee.tbl_DuplicateEmployee_BLOBDataType
SET MD5Column = MD5(CONCAT(EmpID,EmpName));
|
Execute ALTER IGNORE:
1
|
ALTER IGNORE TABLE Employee.tbl_DuplicateEmployee_BLOBDataType ADD UNIQUE INDEX idx_tbl_DuplicateEmployee_BLOBDataType(MD5Column);
|
Now check the result:
1
2
|
SELECT *FROM
Employee.tbl_DuplicateEmployee_BLOBDataType;
|
Remove MD5 column:
1
|
ALTER TABLE Employee.tbl_DuplicateEmployee_BLOBDataType DROP COLUMN MD5Column;
|
0 comments:
Post a Comment