Finding duplicate values in MySQL ~ Tech Blog

Wednesday, 31 October 2018

Finding duplicate values in MySQL

I have a table with a varchar column, and I would like to find all the records that have duplicate values in this column. What is the best query I can use to find the duplicates?

Answers

Do a SELECT with a GROUP BY clause. Let's say name is the column you want to find duplicates in:

SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;

This will return a result with the name value in the first column, and a count of how many times that value appears in the second.

SELECT  *
FROM    mytable mto
WHERE   EXISTS
        (
        SELECT  1
        FROM    mytable mti
        WHERE   mti.varchar_column = mto.varchar_column
        LIMIT 1, 1
        )

This query returns complete records, not just distinct varchar_column's.

This query doesn't use COUNT(*). If there are lots of duplicates, COUNT(*) is expensive, and you don't need the whole COUNT(*), you just need to know if there are two rows with same value.

Having an index on varchar_column will, of course, speed up this query greatly.

Assuming your table is named TableABC and the column which you want is Col and the primary key to T1 is Key.

SELECT a.Key, b.Key, a.Col 
FROM TableABC a, TableABC b
WHERE a.Col = b.Col 
AND a.Key <> b.Key

The advantage of this approach over the above answer is it gives the Key.

To find how many records are duplicates in name column in Employee, the query below is helpful;

Select name from employee group by name having count(*)>1;

SELECT t.*,(select count(*) from city as tt
  where tt.name=t.name) as count
  FROM `city` as t
  where (
     select count(*) from city as tt
     where tt.name=t.name
  ) > 1 order by count desc

Replace city with your Table. Replace name with your field name

My final query incorporated a few of the answers here that helped - combining group by, count & GROUP_CONCAT.

SELECT GROUP_CONCAT(id), `magento_simple`, COUNT(*) c 
FROM product_variant 
GROUP BY `magento_simple` HAVING c > 1;

This provides the id of both examples (comma separated), the barcode I needed, and how many duplicates.

Change table and columns accordingly.

SELECT DISTINCT a.email FROM `users` a LEFT JOIN `users` b ON a.email = b.email WHERE a.id != b.id;

For removing duplicate rows with multiple fields , first cancate them to the new unique key which is specified for the only distinct rows, then use "group by" command to removing duplicate rows with the same new unique key:

Create TEMPORARY table tmp select concat(f1,f2) as cfs,t1.* from mytable as t1;
Create index x_tmp_cfs on tmp(cfs);
Create table unduptable select f1,f2,... from tmp group by cfs;

Select column_name, column_name1,column_name2, count(1) as temp from table_name group by column_name having temp > 1

SELECT ColumnA, COUNT( * )
FROM Table
GROUP BY ColumnA
HAVING COUNT( * ) > 1

SELECT DISTINCT name, count(name) as times FROM yourtable GROUP BY name

Tech Blog

Wednesday, 31 October 2018