In this article we look at ways to remove all duplicate rows except one in an SQL database. For all examples in this article, we'll be using MySQL and the following table called
user
with names of users:+----+--------+ | id | name | +----+--------+ | 1 | john | | 2 | david | | 3 | john | | 4 | john | | 5 | wayne | | 6 | david | +----+--------+
CREATE TABLE user ( id INT NOT NULL AUTO_INCREMENT, name VARCHAR(25) NOT NULL, PRIMARY KEY (id) ); INSERT INTO user(name) VALUES ('john'), ('david'), ('john'), ('john'), ('wayne'), ('david');
Create A New Table With Unique Values Copied From Original Table
You can simply create a new table with unique values copied over from the original table like so:
CREATE TABLE user2 AS ( SELECT id, name FROM user GROUP BY name )
Result:
+----+--------+ | id | name | +----+--------+ | 1 | john | | 2 | david | | 5 | wayne | +----+--------+
You will have to add original table's constraints and indexes on the new table as they aren't copied over.
Use Temporary Table To Fill Original Table With Unique Rows
You can use a temporary table to copy over unique values from the original table, and add them back to the original table. This can be useful when a primary key does not exist.
# Step 1: Copy distinct values to temporary table CREATE TEMPORARY TABLE tmp_user ( SELECT id, name FROM user GROUP BY name ); # Step 2: Remove all rows from original table DELETE FROM user; # Step 3: Remove all rows from original table INSERT INTO user (SELECT * FROM tmp_user); # Step 4: Remove temporary table DROP TABLE tmp_user;
Be careful when using this method because if there are foreign key constraints referencing the original table, rows deleted in step 2 may delete referenced rows in child tables as well to maintain referential integrity.
Result:
+----+--------+ | id | name | +----+--------+ | 1 | john | | 2 | david | | 5 | wayne | +----+--------+
Add Unique Constraint And Copy Unique Rows To Original Table
To make sure there are no duplicates added in future, we can extend the previous set of queries to add a
UNIQUE
constraint to the column.# Step 1: Copy distinct values to temporary table CREATE TEMPORARY TABLE tmp_user ( SELECT id, name FROM user GROUP BY name ); # Step 2: Remove all rows from original table DELETE FROM user; # Step 3: Add Unique constraint ALTER TABLE user ADD UNIQUE(name); # Step 4: Remove all rows from original table INSERT IGNORE INTO user (SELECT * FROM tmp_user); # Step 5: Remove temporary table DROP TABLE tmp_user;
Be careful when using this method because if there are foreign key constraints referencing the original table, rows deleted in step 2 may delete referenced rows in child tables as well to maintain referential integrity.
Result:
+----+--------+ | id | name | +----+--------+ | 1 | john | | 2 | david | | 5 | wayne | +----+--------+
Remove Duplicates And Keep Row With Lowest ID
Using the methods listed below, row with the lowest id is kept, all other duplicates are removed.
Expected Result:
+----+--------+ | id | name | +----+--------+ | 1 | john | | 2 | david | | 5 | wayne | +----+--------+
Method #1:
DELETE u1 FROM user u1, user u2 WHERE u1.id > u2.id AND u1.name = u2.name
Method #2:
DELETE u1 FROM user u1 INNER JOIN user u2 ON u1.id > u2.id AND u1.name = u2.name;
Method #3:
DELETE FROM user WHERE id NOT IN ( SELECT * FROM ( SELECT MIN(id) FROM user GROUP BY name ) temp )
Method 4:
# Step 1: Copy all unique rows with lowest id CREATE TEMPORARY TABLE tmp_user ( SELECT MIN(id) id FROM user GROUP BY name ); # Step 2: Delete all rows in original table that are not in the temporary table DELETE FROM user WHERE id NOT IN (SELECT id FROM tmp_user); # Step 3: Remove temporary table DROP TABLE tmp_user;
Remove Duplicates And Keep Row With Highest ID
Using the methods listed below, row with the highest id is kept, all other duplicates are removed.
Expected Result:
+----+--------+ | id | name | +----+--------+ | 4 | john | | 5 | wayne | | 6 | david | +----+--------+
Method #1:
DELETE u1 FROM user u1, user u2 WHERE u1.id < u2.id AND u1.name = u2.name
Method #2:
DELETE u1 FROM user u1 INNER JOIN user u2 ON u1.id < u2.id AND u1.name = u2.name;
Method #3:
DELETE FROM user WHERE id NOT IN ( SELECT * FROM ( SELECT MAX(id) FROM user GROUP BY name ) temp )
Method 4:
# Step 1: Copy all unique rows with lowest id CREATE TEMPORARY TABLE tmp_user ( SELECT MAX(id) id FROM user GROUP BY name ); # Step 2: Delete all rows in original table that are not in the temporary table DELETE FROM user WHERE id NOT IN (SELECT id FROM tmp_user); # Step 3: Remove temporary table DROP TABLE tmp_user;
0 comments:
Post a Comment