Tuesday, 7 August 2018

MySQL GROUP_CONCAT() Has Buggy Interaction With UNION Clauses

So, earlier this week, I blogged about the unfortunate way in which MySQL's GROUP_CONCAT() function will fail silently when you hit its size limit. Well, it turns out that GROUP_CONCAT() is even more unfortunate. After some testing, it looks like GROUP_CONCAT() fails even harder (and more mysteriously) when you use it in conjunction with a UNION or UNION ALL clause.
To see what I mean, I'm going to revamp my previous demo in which I am grouping transaction IDs by day of the week. This time, however, I'm going to run the report twice. The first time, I'm using the same code as earlier; the second time, I'm gathering each day of the week individually, as an aggregate, and then joining all the individual aggregates using UNION ALL clauses:
<!---
First, let's populate the audit-log for testing. Here, we're doing to store the day
of the week and the transaction ID that took place.
--->
<cfquery name="populate" datasource="testing">
TRUNCATE TABLE audit_log;
<cfset transactionIndex = 100000 />
<!--- For each day of the week, add 1,000 transactions. --->
<cfloop index="dayIndex" from="1" to="7" step="1">
<cfloop index="i" from="1" to="1000" step="1">
INSERT INTO audit_log
(
dayOfWeek,
transactionID
) VALUES (
#dayIndex#,
#( ++transactionIndex )#
);
</cfloop>
</cfloop>
</cfquery>
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!---
Now, let's report on the transactions that have taken place. Grouping the log by the
day of the week, we're going to concatenate the transactionIDs into a grouped-list.
--->
<cfquery name="report" datasource="testing">
SELECT
dayOfWeek,
<!--- Collapse the transaction IDs into a comma-delimited list. --->
GROUP_CONCAT( transactionID ) AS transactionList
FROM
audit_log
GROUP BY
dayOfWeek
ORDER BY
dayOfWeek ASC
</cfquery>
<!--- Output report aggregates. In "theory", each day should have 1,000 items. --->
<cfoutput query="report">
Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />
</cfoutput>
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<br />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!---
This time, rather than grouping the entire record-set by day, we're going to select
each aggregate individually by day of the week, then UNION these aggregates together.
In theory, this should be the exact same result since we're still groupging all
transactions by day of the week.
--->
<cfquery name="report" datasource="testing">
(
SELECT
dayOfWeek,
GROUP_CONCAT( transactionID ) AS transactionList
FROM
audit_log
WHERE
dayOfWeek = 1 <!--- Get only this day of the week. --->
GROUP BY
dayOfWeek
)
<!--- UNION the other 6 days of the week. --->
<cfloop index="i" from="2" to="7" step="1">
UNION ALL
(
SELECT
dayOfWeek,
GROUP_CONCAT( transactionID ) AS transactionList
FROM
audit_log
WHERE
dayOfWeek = #i# <!--- Get only this day of the week. --->
GROUP BY
dayOfWeek
)
</cfloop>
ORDER BY
dayOfWeek ASC
</cfquery>
<!--- Output report aggregates. --->
<cfoutput query="report">
Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />
</cfoutput>
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<br />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!---
As a final sanity check, let's look at just one of the groupings that we used in
the previous exammple, to see how it behaves when we don't use a UNION clause.
--->
<cfquery name="report" datasource="testing">
SELECT
dayOfWeek,
GROUP_CONCAT( transactionID ) AS transactionList
FROM
audit_log
WHERE
dayOfWeek = 1 <!--- Get only this day of the week. --->
GROUP BY
dayOfWeek
</cfquery>
<!--- Output report aggregates. --->
<cfoutput query="report">
Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />
</cfoutput>

In theory, these two reports should show the exact same result. Yet, when we run the code, we get the following output:
Day Of Week (1): 147
Day Of Week (2): 147
Day Of Week (3): 147
Day Of Week (4): 147
Day Of Week (5): 147
Day Of Week (6): 147
Day Of Week (7): 147

Day Of Week (1): 49
Day Of Week (2): 49
Day Of Week (3): 49
Day Of Week (4): 49
Day Of Week (5): 49
Day Of Week (6): 49
Day Of Week (7): 49

Day Of Week (1): 147
As you can see, when we gather each day of the week individually, in the context of a UNION ALL, GROUP_CONCAT() is returning an even shorter list (49 items). Of course, if we do the same thing without a UNION ALL (the final sanity check), we get the expected 147 item truncation.
What the heck is going on here? I feel like I'm taking crazy pills! This makes no sense at all. Not only does GROUP_CONCAT() fail silently, it also doesn't seem to fail consistently. This pretty much eliminates it as a viable option for me.
NOTE: I am running MySQL 5.6.15.

0 comments:

Post a Comment