Wednesday 24 October 2018

Find duplicate records in MySQL

I want to pull out duplicate records in a MySQL Database. This can be done with:
SELECT address, count(id) as cnt FROM list
GROUP BY address HAVING cnt > 1
Which results in:
100 MAIN ST    2
I would like to pull it so that it shows each row that is a duplicate. Something like:
JIM    JONES    100 MAIN ST
JOHN   SMITH    100 MAIN ST
Any thoughts on how this can be done? I'm trying to avoid doing the first one then looking up the duplicates with a second query in the code.

Answers:
The key is to rewrite this query so that it can be used as a subquery.
SELECT firstname, 
   lastname, 
   list.address 
FROM list
   INNER JOIN (SELECT address
               FROM   list
               GROUP  BY address
               HAVING COUNT(id) > 1) dup
           ON list.address = dup.address;



Why not just INNER JOIN the table with itself?
SELECT a.firstname, a.lastname, a.address
FROM list a
INNER JOIN list b ON a.address = b.address
WHERE a.id <> b.id
A DISTINCT is needed if the address could exist more than two times.



select `cityname` from `codcities` group by `cityname` having count(*)>=2
This is the similar query you have asked for and its 200% working and easy too. Enjoy!!!



we can found the duplicates depends on more then one fields also.For those cases you can use below format.
SELECT COUNT(*), column1, column2 
FROM tablename
GROUP BY column1, column2
HAVING COUNT(*)>1;



Finding duplicate addresses is much more complex than it seems, especially if you require accuracy. A MySQL query is not enough in this case...
I work at SmartyStreets, where we do address validation and de-duplication and other stuff, and I've seen a lot of diverse challenges with similar problems.
There are several third-party services which will flag duplicates in a list for you. Doing this solely with a MySQL subquery will not account for differences in address formats and standards. The USPS (for US address) has certain guidelines to make these standard, but only a handful of vendors are certified to perform such operations.
So, I would recommend the best answer for you is to export the table into a CSV file, for instance, and submit it to a capable list processor. One such is LiveAddress which will have it done for you in a few seconds to a few minutes automatically. It will flag duplicate rows with a new field called "Duplicate" and a value of Y in it.



Not going to be very efficient, but it should work:
SELECT *
FROM list AS outer
WHERE (SELECT COUNT(*)
        FROM list AS inner
        WHERE inner.address = outer.address) > 1;



 SELECT firstname, lastname, address FROM list
 WHERE 
 Address in 
 (SELECT address FROM list
 GROUP BY address
 HAVING count(*) > 1)



Fastest duplicates removal queries procedure:
/* create temp table with one primary column id */
INSERT INTO temp(id) SELECT MIN(id) FROM list GROUP BY (isbn) HAVING COUNT(*)>1;
DELETE FROM list WHERE id IN (SELECT id FROM temp);
DELETE FROM temp;



SELECT t.*,(select count(*) from city as tt where tt.name=t.name) as count FROM `city` as t where (select count(*) from city as tt where tt.name=t.name) > 1 order by count desc
Replace city with your Table. Replace name with your field name



Isn't this easier :
SELECT *
FROM tc_tariff_groups
GROUP BY group_id
HAVING COUNT(group_id) >1
?



    Find duplicate Records:

    Suppose we have table : Student 
    student_id int
    student_name varchar
    Records:
    +------------+---------------------+
    | student_id | student_name        |
    +------------+---------------------+
    |        101 | usman               |
    |        101 | usman               |
    |        101 | usman               |
    |        102 | usmanyaqoob         |
    |        103 | muhammadusmanyaqoob |
    |        103 | muhammadusmanyaqoob |
    +------------+---------------------+

    Now we want to see duplicate records
    Use this query:


   select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;

+--------------------+------------+---+
| student_name        | student_id | c |
+---------------------+------------+---+
| usman               |        101 | 3 |
| muhammadusmanyaqoob |        103 | 2 |
+---------------------+------------+---+

0 comments:

Post a Comment