Monday 3 September 2018

How to get advanced statistical data from MySQL?

I have a very large MySQL database with a table having structure like this:
In the exmaple, the DATE is in unix timestamp format. So it will need to be converted to normal US date format and this is just a few records from my DB.
ID      DATE                REG_TYPE
--------------------------------------
1     1251917888                 0
2     1251917888                 1
3     1251917888                 1
4     1251917888                 0
5     1251917888                 2
6     1251917888                 3
7     1251917888                 2
8     1251917888                 4
9     1251917888                 0
10   1251917888                 0
The problem is that i want to get the count of every REG_TYPE ordered by date. The table below of what i want to get is shown:
DATE         REGTYPE(0)     REGTYPE(1)     REGTYPE(2)     REGTYPE(3)     REGTYPE(4)
--------------------------------------------------------------------------------------------------
xxxx                     4                     2                             2                     1                             1
xxxx                     x                     x                             x                     x                                x
I want to get this for each and every date in database just like a summary for each date.
Can anybody suggest a possible solution to this? I have to convert this output into an array in PHP after fetching data from MYSQL. Reason for not using loops in PHP for this is that the database is too big and will cause timeouts.
Best Regards

What you want to do is a pivot operation, which is not directly supported by SQL syntax. However, it's not too complicated, and conceptually involves 2 steps:
  1. "Blow up" the data into many columns, with one row per row in the original data set. This is usually done with CASE WHEN ... ELSE ... END or occasionally with functions (like decode() in oracle). I'll use CASE WHEN in the example below, since it works equally well for most RDBMSes
  2. Use GROUP BY and aggregate functions (SUM, MIN, MAX, etc) to collapse the many rows into the output row set you want.
I'm using this data set for the example:
mysql> select * from foo;
+----+------------+----------+
| id | thedate    | reg_type |
+----+------------+----------+
|  1 | 1251917888 |        0 |
|  2 | 1251917888 |        1 |
|  3 | 1251917888 |        1 |
|  4 | 1251917888 |        0 |
|  5 | 1251917888 |        2 |
|  6 | 1251917888 |        3 |
|  7 | 1251917888 |        2 |
|  8 | 1251917888 |        4 |
|  9 | 1251917888 |        0 |
| 10 | 1251917888 |        0 |
| 11 | 1251831488 |        1 |
| 12 | 1251831488 |        2 |
| 13 | 1251831488 |        2 |
| 14 | 1251831488 |        1 |
| 15 | 1251831488 |        3 |
| 16 | 1251831488 |        4 |
| 17 | 1251831488 |        3 |
| 18 | 1251831488 |        5 |
| 19 | 1251831488 |        1 |
| 20 | 1251831488 |        1 |
+----+------------+----------+

Step 1 is to "blow up" the data set:
select id
     , thedate
     , case when reg_type = 0 then 1 else 0 end as reg_type_0
     , case when reg_type = 1 then 1 else 0 end as reg_type_1
     , case when reg_type = 2 then 1 else 0 end as reg_type_2
     , case when reg_type = 3 then 1 else 0 end as reg_type_3
     , case when reg_type = 4 then 1 else 0 end as reg_type_4
     , case when reg_type = 5 then 1 else 0 end as reg_type_5
  from foo;

which gives:
+----+------------+------------+------------+------------+------------+------------+------------+
| id | thedate    | reg_type_0 | reg_type_1 | reg_type_2 | reg_type_3 | reg_type_4 | reg_type_5 |
+----+------------+------------+------------+------------+------------+------------+------------+
|  1 | 1251917888 |          1 |          0 |          0 |          0 |          0 |          0 |
|  2 | 1251917888 |          0 |          1 |          0 |          0 |          0 |          0 |
|  3 | 1251917888 |          0 |          1 |          0 |          0 |          0 |          0 |
|  4 | 1251917888 |          1 |          0 |          0 |          0 |          0 |          0 |
|  5 | 1251917888 |          0 |          0 |          1 |          0 |          0 |          0 |
|  6 | 1251917888 |          0 |          0 |          0 |          1 |          0 |          0 |
|  7 | 1251917888 |          0 |          0 |          1 |          0 |          0 |          0 |
|  8 | 1251917888 |          0 |          0 |          0 |          0 |          1 |          0 |
|  9 | 1251917888 |          1 |          0 |          0 |          0 |          0 |          0 |
| 10 | 1251917888 |          1 |          0 |          0 |          0 |          0 |          0 |
| 11 | 1251831488 |          0 |          1 |          0 |          0 |          0 |          0 |
| 12 | 1251831488 |          0 |          0 |          1 |          0 |          0 |          0 |
| 13 | 1251831488 |          0 |          0 |          1 |          0 |          0 |          0 |
| 14 | 1251831488 |          0 |          1 |          0 |          0 |          0 |          0 |
| 15 | 1251831488 |          0 |          0 |          0 |          1 |          0 |          0 |
| 16 | 1251831488 |          0 |          0 |          0 |          0 |          1 |          0 |
| 17 | 1251831488 |          0 |          0 |          0 |          1 |          0 |          0 |
| 18 | 1251831488 |          0 |          0 |          0 |          0 |          0 |          1 |
| 19 | 1251831488 |          0 |          1 |          0 |          0 |          0 |          0 |
| 20 | 1251831488 |          0 |          1 |          0 |          0 |          0 |          0 |
+----+------------+------------+------------+------------+------------+------------+------------+

Next we collapse to one row in the output per date, and sum each of the reg_type_* columns, using or initial query as an inline view (aka a "subquery"):
select thedate
     , sum(i.reg_type_0) as reg_type_0
     , sum(i.reg_type_1) as reg_type_1
     , sum(i.reg_type_2) as reg_type_2
     , sum(i.reg_type_3) as reg_type_3
     , sum(i.reg_type_4) as reg_type_4
     , sum(i.reg_type_5) as reg_type_5
  from (
         select id
              , thedate
              , case when reg_type = 0 then 1 else 0 end as reg_type_0
              , case when reg_type = 1 then 1 else 0 end as reg_type_1
              , case when reg_type = 2 then 1 else 0 end as reg_type_2
              , case when reg_type = 3 then 1 else 0 end as reg_type_3
              , case when reg_type = 4 then 1 else 0 end as reg_type_4
              , case when reg_type = 5 then 1 else 0 end as reg_type_5
           from foo
       ) i
 group by thedate
 order by thedate asc;

(Note also that you can collapse these two queries into one, though I've shown them separately here for clarity; In MySQL at least, this seems to result in a simpler execution plan, which often means faster execution -- as always, test your SQL performance on realistic data sets, don't take my word for it!)
This gives us:
+------------+------------+------------+------------+------------+------------+------------+
| thedate    | reg_type_0 | reg_type_1 | reg_type_2 | reg_type_3 | reg_type_4 | reg_type_5 |
+------------+------------+------------+------------+------------+------------+------------+
| 1251831488 |          0 |          4 |          2 |          2 |          1 |          1 |
| 1251917888 |          4 |          2 |          2 |          1 |          1 |          0 |
+------------+------------+------------+------------+------------+------------+------------+

Which is the desired result. You can convert thedate to a DATE with the MySQL function FROM_UNIXTIME, and it will likely be most efficient to do this in the part #2 query (the function is evaluated fewer times, and comparisons for group by are made on the integer, not on the DATE -- not sure if that's actually any different in MySQL).

0 comments:

Post a Comment