Wednesday, 14 November 2018

MySQL DISTINCT on a GROUP_CONCAT()

I am doing SELECT GROUP_CONCAT(categories SEPARATOR ' ') FROM table
Sample data below:
categories
----------
test1 test2 test3
test4
test1 test3
test1 test3
However, I am getting test1 test2 test3 test4 test1 test3 back and
 I would like to get test1 test2 test3 test4 back. Any ideas?

 Answers


GROUP_CONCAT has DISTINCT attribute:
SELECT GROUP_CONCAT(DISTINCT categories ORDER BY categories ASC 
SEPARATOR ' ') FROM table



Other answers to this question do not return what the OP needs, they will return 
a string like:
test1 test2 test3 test1 test3 test4
(notice that test1 and test3 are duplicated) while the OP wants to return this string:
test1 test2 test3 test4
the problem here is that the string "test1 test3" is duplicated and is inserted only once, 
but all of the others are distinct to each other ("test1 test2 test3" is distinct than
 "test1 test3", even if some tests contained in the whole string are duplicated).
What we need to do here is to split each string into different rows, and we first need to 
create a numbers table:
CREATE TABLE numbers (n INT);
INSERT INTO numbers VALUES
(1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
then we can run this query:
SELECT
  SUBSTRING_INDEX(
    SUBSTRING_INDEX(tableName.categories, ' ', numbers.n),
    ' ',
    -1) category
FROM
  numbers INNER JOIN tableName
  ON
    LENGTH(tableName.categories)>=
    LENGTH(REPLACE(tableName.categories, ' ', ''))+numbers.n-1;
and we get a result like this:
test1
test4
test1
test1
test2
test3
test3
test3
and then we can apply GROUP_CONCAT aggregate function, using DISTINCT clause:
SELECT
  GROUP_CONCAT(DISTINCT category ORDER BY category SEPARATOR ' ')
FROM (
  SELECT
    SUBSTRING_INDEX(SUBSTRING_INDEX(tableName.categories, ' ', numbers.n), ' ', -1) category
  FROM
    numbers INNER JOIN tableName
    ON LENGTH(tableName.categories)>=LENGTH(REPLACE(tableName.categories, ' ', ''))+numbers.n-1
  ) s;

I realize this question is old, but I feel like this should be mentioned: group_concat 
with distinct = performance killer. If you work in small databases, you won't notice, 
but when it scales - it won't work very well.

0 comments:

Post a Comment