How to delete duplicates on a MySQL table? ~ Tech Blog

Friday, 16 November 2018

How to delete duplicates on a MySQL table?

I need to DELETE duplicated rows for specified sid on a MySQL table.

How can I do this with an SQL query?

DELETE (DUPLICATED TITLES) FROM table WHERE SID = "1"

Something like this, but I don't know how to do it.

Answers

this removes duplicates in place, without making a new table

ALTER IGNORE TABLE `table_name` ADD UNIQUE (title, SID)

note: only works well if index fits in memory

Following remove duplicates for all SID-s, not only single one.

With temp table

CREATE TABLE table_temp AS
SELECT * FROM table GROUP BY title, SID;

DROP TABLE table;
RENAME TABLE table_temp TO table;

Since temp_table is freshly created it has no indexes. You'll need to recreate them

after removing duplicates. You can check what indexes you have in the table with

SHOW INDEXES IN table

Without temp table:

DELETE FROM `table` WHERE id IN (
  SELECT all_duplicates.id FROM (
    SELECT id FROM `table` WHERE (`title`, `SID`) IN (
      SELECT `title`, `SID` FROM `table` GROUP BY `title`, `SID` having count(*) > 1
    )
  ) AS all_duplicates 
  LEFT JOIN (
    SELECT id FROM `table` GROUP BY `title`, `SID` having count(*) > 1
  ) AS grouped_duplicates 
  ON all_duplicates.id = grouped_duplicates.id 
  WHERE grouped_duplicates.id IS NULL
)

This always seems to work for me:

CREATE TABLE NoDupeTable LIKE DupeTable; 
INSERT NoDupeTable SELECT * FROM DupeTable group by CommonField1,CommonFieldN;

Which keeps the lowest ID on each of the dupes and the rest of the non-dupe records.

I've also taken to doing the following so that the dupe issue no longer occurs after the

removal:

CREATE TABLE NoDupeTable LIKE DupeTable; 
Alter table NoDupeTable Add Unique `Unique` (CommonField1,CommonField2);
INSERT IGNORE NoDupeTable SELECT * FROM DupeTable;

In other words, I create a duplicate of the first table, add a unique index on the fields I

don't want duplicates of, and then do an Insert IGNORE which has the advantage of not

failing as a normal Insert would the first time it tried to add a duplicate record based on

the two fields and rather ignores any such records.

Moving fwd it becomes impossible to create any duplicate records based on those two

fields.

After running into this issue myself, on a huge database, I wasn't completely impressed

with the performance of any of the other answers. I want to keep only the latest duplicate

row, and delete the rest.

In a one-query statement, without a temp table, this worked best for me,

DELETE e.*
FROM employee e
WHERE id IN
 (SELECT id
   FROM (SELECT MIN(id) as id
          FROM employee e2
          GROUP BY first_name, last_name
          HAVING COUNT(*) > 1) x);

The only caveat is that I have to run the query multiple times, but even with that,

I found it worked better for me than the other options.

This work for me to remove old records:

delete from table where id in 
(select min(e.id)
    from (select * from table) e 
    group by column1, column2
    having count(*) > 1
);

You can replace min(e.id) to max(e.id) to remove newest records.

The following works for all tables

CREATE TABLE `noDup` LIKE `Dup` ;
INSERT `noDup` SELECT DISTINCT * FROM `Dup` ;
DROP TABLE `Dup` ;
ALTER TABLE `noDup` RENAME `Dup` ;

delete from `table` where `table`.`SID` in

( select t.SID from table t join table t1 on t.title = t1.title where t.SID > t1.SID )

This works for large tables:

CREATE Temporary table duplicates AS select max(id) as id, url from links group by url

having count(*) > 1;

DELETE l from links l inner join duplicates ld on ld.id = l.id WHERE ld.id IS NOT NULL;

To delete oldest change max(id) to min(id)

Deleting duplicates on MySQL tables is a common issue, that usually comes with specific

needs. In case anyone is interested, here (Remove duplicate rows in MySQL) I explain how

to use a temporary table to delete MySQL duplicates in a reliable and fast way, also valid

to handle big data sources (with examples for different use cases).

Ali, in your case, you can run something like this:

-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;

-- add a unique constraint    
ALTER TABLE tmp_table1 ADD UNIQUE(sid, title);

-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;

-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;

You could just use a DISTINCT clause to select the "cleaned up" list

There are just a few basic steps when removing duplicate data from your table:

Back up your table!
Find the duplicate rows
Remove the duplicate rows

Tech Blog

Friday, 16 November 2018