Friday, 16 November 2018

How to delete duplicates on a MySQL table?

I need to DELETE duplicated rows for specified sid on a MySQL table.
How can I do this with an SQL query?
DELETE (DUPLICATED TITLES) FROM table WHERE SID = "1"
Something like this, but I don't know how to do it.

 Answers


this removes duplicates in place, without making a new table
ALTER IGNORE TABLE `table_name` ADD UNIQUE (title, SID)
note: only works well if index fits in memory



Following remove duplicates for all SID-s, not only single one.
With temp table
CREATE TABLE table_temp AS
SELECT * FROM table GROUP BY title, SID;

DROP TABLE table;
RENAME TABLE table_temp TO table;
Since temp_table is freshly created it has no indexes. You'll need to recreate them 
after removing duplicates. You can check what indexes you have in the table with 
SHOW INDEXES IN table
Without temp table:
DELETE FROM `table` WHERE id IN (
  SELECT all_duplicates.id FROM (
    SELECT id FROM `table` WHERE (`title`, `SID`) IN (
      SELECT `title`, `SID` FROM `table` GROUP BY `title`, `SID` having count(*) > 1
    )
  ) AS all_duplicates 
  LEFT JOIN (
    SELECT id FROM `table` GROUP BY `title`, `SID` having count(*) > 1
  ) AS grouped_duplicates 
  ON all_duplicates.id = grouped_duplicates.id 
  WHERE grouped_duplicates.id IS NULL
)



This always seems to work for me:
CREATE TABLE NoDupeTable LIKE DupeTable; 
INSERT NoDupeTable SELECT * FROM DupeTable group by CommonField1,CommonFieldN;
Which keeps the lowest ID on each of the dupes and the rest of the non-dupe records.
I've also taken to doing the following so that the dupe issue no longer occurs after the 
removal:
CREATE TABLE NoDupeTable LIKE DupeTable; 
Alter table NoDupeTable Add Unique `Unique` (CommonField1,CommonField2);
INSERT IGNORE NoDupeTable SELECT * FROM DupeTable;
In other words, I create a duplicate of the first table, add a unique index on the fields I 
don't want duplicates of, and then do an Insert IGNORE which has the advantage of not 
failing as a normal Insert would the first time it tried to add a duplicate record based on
 the two fields and rather ignores any such records.
Moving fwd it becomes impossible to create any duplicate records based on those two
 fields.



After running into this issue myself, on a huge database, I wasn't completely impressed 
with the performance of any of the other answers. I want to keep only the latest duplicate 
row, and delete the rest.
In a one-query statement, without a temp table, this worked best for me,
DELETE e.*
FROM employee e
WHERE id IN
 (SELECT id
   FROM (SELECT MIN(id) as id
          FROM employee e2
          GROUP BY first_name, last_name
          HAVING COUNT(*) > 1) x);
The only caveat is that I have to run the query multiple times, but even with that, 
I found it worked better for me than the other options.



This work for me to remove old records:
delete from table where id in 
(select min(e.id)
    from (select * from table) e 
    group by column1, column2
    having count(*) > 1
); 
You can replace min(e.id) to max(e.id) to remove newest records.



The following works for all tables
CREATE TABLE `noDup` LIKE `Dup` ;
INSERT `noDup` SELECT DISTINCT * FROM `Dup` ;
DROP TABLE `Dup` ;
ALTER TABLE `noDup` RENAME `Dup` ;



delete from `table` where `table`.`SID` in
 ( select t.SID from table t join table t1 on t.title = t1.title where t.SID > t1.SID )


This works for large tables:
CREATE Temporary table duplicates AS select max(id) as id, url from links group by url 
having count(*) > 1;
 DELETE l from links l inner join duplicates ld on ld.id = l.id WHERE ld.id IS NOT NULL;
To delete oldest change max(id) to min(id)



Deleting duplicates on MySQL tables is a common issue, that usually comes with specific
 needs. In case anyone is interested, here (Remove duplicate rows in MySQL) I explain how 
to use a temporary table to delete MySQL duplicates in a reliable and fast way, also valid 
to handle big data sources (with examples for different use cases).
Ali, in your case, you can run something like this:
-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;

-- add a unique constraint    
ALTER TABLE tmp_table1 ADD UNIQUE(sid, title);

-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;

-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;



You could just use a DISTINCT clause to select the "cleaned up" list 


There are just a few basic steps when removing duplicate data from your table:
  • Back up your table!
  • Find the duplicate rows
  • Remove the duplicate rows

0 comments:

Post a Comment