I need to
DELETE
duplicated rows for specified sid on a MySQL
table.
How can I do this with an SQL query?
DELETE (DUPLICATED TITLES) FROM table WHERE SID = "1"
Something like this, but I don't know how to do it.
Answers
this removes duplicates in place, without making a new table
ALTER IGNORE TABLE `table_name` ADD UNIQUE (title, SID)
note: only works well if index fits in memory
Following remove duplicates for all SID-s, not only single one.
With temp table
CREATE TABLE table_temp AS
SELECT * FROM table GROUP BY title, SID;
DROP TABLE table;
RENAME TABLE table_temp TO table;
Since
temp_table
is freshly created it has no indexes. You'll need to recreate them
after removing duplicates. You can check what indexes you have in the table with
SHOW INDEXES IN table
Without temp table:
DELETE FROM `table` WHERE id IN (
SELECT all_duplicates.id FROM (
SELECT id FROM `table` WHERE (`title`, `SID`) IN (
SELECT `title`, `SID` FROM `table` GROUP BY `title`, `SID` having count(*) > 1
)
) AS all_duplicates
LEFT JOIN (
SELECT id FROM `table` GROUP BY `title`, `SID` having count(*) > 1
) AS grouped_duplicates
ON all_duplicates.id = grouped_duplicates.id
WHERE grouped_duplicates.id IS NULL
)
This always seems to work for me:
CREATE TABLE NoDupeTable LIKE DupeTable;
INSERT NoDupeTable SELECT * FROM DupeTable group by CommonField1,CommonFieldN;
Which keeps the lowest ID on each of the dupes and the rest of the non-dupe records.
I've also taken to doing the following so that the dupe issue no longer occurs after the
removal:
CREATE TABLE NoDupeTable LIKE DupeTable;
Alter table NoDupeTable Add Unique `Unique` (CommonField1,CommonField2);
INSERT IGNORE NoDupeTable SELECT * FROM DupeTable;
In other words, I create a duplicate of the first table, add a unique index on the fields I
don't want duplicates of, and then do an
Insert IGNORE
which has the advantage of not
failing as a normal
Insert
would the first time it tried to add a duplicate record based on
the two fields and rather ignores any such records.
Moving fwd it becomes impossible to create any duplicate records based on those two
fields.
After running into this issue myself, on a huge database, I wasn't completely impressed
with the performance of any of the other answers. I want to keep only the latest duplicate
row, and delete the rest.
In a one-query statement, without a temp table, this worked best for me,
DELETE e.*
FROM employee e
WHERE id IN
(SELECT id
FROM (SELECT MIN(id) as id
FROM employee e2
GROUP BY first_name, last_name
HAVING COUNT(*) > 1) x);
The only caveat is that I have to run the query multiple times, but even with that,
I found it worked better for me than the other options.
This work for me to remove old records:
delete from table where id in
(select min(e.id)
from (select * from table) e
group by column1, column2
having count(*) > 1
);
You can replace min(e.id) to max(e.id) to remove newest records.
The following works for all tables
CREATE TABLE `noDup` LIKE `Dup` ;
INSERT `noDup` SELECT DISTINCT * FROM `Dup` ;
DROP TABLE `Dup` ;
ALTER TABLE `noDup` RENAME `Dup` ;
delete from `table` where `table`.`SID` in
( select t.SID from table t join table t1 on t.title = t1.title where t.SID > t1.SID )
having count(*) > 1; DELETE l from links l inner join duplicates ld on ld.id = l.id WHERE ld.id IS NOT NULL;
This works for large tables:
CREATE Temporary table duplicates AS select max(id) as id, url from links group by url
To delete oldest change
max(id)
to min(id)
Deleting duplicates on MySQL tables is a common issue, that usually comes with specific
needs. In case anyone is interested, here (Remove duplicate rows in MySQL) I explain how
to use a temporary table to delete MySQL duplicates in a reliable and fast way, also valid
to handle big data sources (with examples for different use cases).
Ali, in your case, you can run something like this:
-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;
-- add a unique constraint
ALTER TABLE tmp_table1 ADD UNIQUE(sid, title);
-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;
-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;
You could just use a DISTINCT clause to select the "cleaned up" list
There are just a few basic steps when removing duplicate data from your table:
- Back up your table!
- Find the duplicate rows
- Remove the duplicate rows
0 comments:
Post a Comment