Friday, 16 November 2018

How to create a MySQL hierarchical recursive query

I have a MySQL table which is as follows:
id | name        | parent_id
19 | category1   | 0
20 | category2   | 19
21 | category3   | 20
22 | category4   | 21
......
Now, I want to have a single MySQL query to which I simply supply the id [for instance say
 'id = 19'] then I should get all its child ids [i.e. result should have ids '20,21,22'].... Also,
 the hierarchy of the children is not known it can vary....
Also, I already have the solution using the for loop..... Let me know how to achieve the 
same using a single MySQL query if possible.

 Answers


For MySql versions that do not support Common Table Expressions (up to version 5.7), 
you would achieve this with the following query:
select  id,
        name,
        parent_id 
from    (select * from products
         order by parent_id, id) products_sorted,
        (select @pv := '19') initialisation
where   find_in_set(parent_id, @pv)
and     length(@pv := concat(@pv, ',', id))
The value specified in @pv := '19' should be set to the id of the parent you want to 
select all the descendants of.
This will work also if a parent has multiple children. However, it is required that each 
record fulfills the condition parent_id < id, otherwise the results will not be complete.
This query uses specific MySql syntax: variables are assigned and modified during its 
execution. Some assumptions are made about the order of execution:
  • The from clause is evaluated first. So that is where @pv gets initialised.
  • The where clause is evaluated for each record in the order of retrieval from the 
  • from aliases. So this is where a condition is put to only include records for which
    the parent was already identified as being in the descendant tree (all descendants of
    the primary parent are progressively added to @pv).
  • The conditions in this where clause are evaluated in order, and the evaluation is 
    interrupted once the total outcome is certain. Therefore the second condition must
    be in second place, as it adds the id to the parent list, and this should only happen
    if the id passes the first condition. The lengthfunction is only called to make sure
    this condition is always true, even if the pv string would for some reason yield a falsy value.
All in all, one may find these assumptions too risky to rely on -- there is no documented 
guarantee for them, and even though it works consistently, the evaluation order may in 
theory still change when you use this query as a view or sub-query in a larger query.
Also note that for very large data sets this solution might get slow, as the find_in_set 
operation is not the most ideal way to find a number in a list, certainly not in a list that 
reaches a size in the same order of magnitude as the number of records returned.
Alternative 1: WITH RECURSIVECONNECT BY
with recursive cte (id, name, parent_id) as
(
 select     id,
            name,
            parent_id
 from       products
 where      parent_id = 19
 union all
 select     p.id,
            p.name,
            p.parent_id
 from       products p
 inner join cte
         on p.parent_id = cte.id
)
select * from cte;
Some databases have an alternative, non-standard syntax for hierarchical look-ups, 
such as the CONNECT BY clause available on Oracle databases. DB2 also supports this
 alternative syntax.
MySql version 5.7 does not offer such a feature. When your database engine 
provides this syntax, then that is certainly the best option to go for. If not, then also 
consider the following alternatives.
Alternative 2: Path-style Identifiers
Things become a lot easier if you would assign id values that contain the hierarchical 
information: a path. For example, in your case this could look like this:
ID       | NAME
19       | category1   
19/1     | category2  
19/1/1   | category3  
19/1/1/1 | category4  
Then your select would look like this:
select  id,
        name 
from    products
where   id like '19/%'
Alternative 3: Repeated Self-joins
If you know an upper limit for how deep your hierarchy tree can become, you can use a 
standard sql like this:
select      p6.parent_id as parent6_id,
            p5.parent_id as parent5_id,
            p4.parent_id as parent4_id,
            p3.parent_id as parent3_id,
            p2.parent_id as parent2_id,
            p1.parent_id as parent_id,
            p1.id as product_id,
            p1.name
from        products p1
left join   products p2 on p2.id = p1.parent_id 
left join   products p3 on p3.id = p2.parent_id 
left join   products p4 on p4.id = p3.parent_id  
left join   products p5 on p5.id = p4.parent_id  
left join   products p6 on p6.id = p5.parent_id
where       19 in (p1.parent_id, 
                   p2.parent_id, 
                   p3.parent_id, 
                   p4.parent_id, 
                   p5.parent_id, 
                   p6.parent_id) 
order       by 1, 2, 3, 4, 5, 6, 7;
The where condition specifies which parent you want to retrieve the descendants of. 
You can extend this query with more levels as needed.



The best approiach I've came-up with is
  1. Use lineage to store\sort\trace trees. That's more than enough, and works thousands 
  2. times faster for reading than any other approach. It also allows to stay on that pattern
    even if DB will change(as ANY db will allow that pattern to be used)
  3. Use function that determines lineage for specific ID.
  4. Use it as you wish (in selects, or on CUD operations, or even by jobs).
Lineage approach descr. can be found wherever, for example 
 As of function - that is what enspired me.
In the end - got more-or-less simple, relatively fast, and SIMPLE solution.
Function's body
-- --------------------------------------------------------------------------------
-- Routine DDL
-- Note: comments before and after the routine body will not be stored by the server
-- --------------------------------------------------------------------------------
DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `get_lineage`(the_id INT) 
RETURNS text CHARSET utf8
READS SQL DATA
BEGIN
DECLARE v_rec INT DEFAULT 0;
DECLARE done INT DEFAULT FALSE;
DECLARE v_res text DEFAULT '';
DECLARE v_papa int;
DECLARE v_papa_papa int DEFAULT -1;
DECLARE csr CURSOR FOR
select _id,parent_id -- @n:=@n+1 as rownum,T1.*
from
(SELECT @r AS _id,
(SELECT @r := table_parent_id FROM table WHERE table_id = _id) AS parent_id,
@l := @l + 1 AS lvl
FROM
(SELECT @r := the_id, @l := 0,@n:=0) vars,
table m
WHERE @r <> 0
) T1
where T1.parent_id is not null
ORDER BY T1.lvl DESC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
open csr;
read_loop: LOOP
fetch csr into v_papa,v_papa_papa;
SET v_rec = v_rec+1;
IF done THEN
LEAVE read_loop;
END IF;
-- add first
IF v_rec = 1 THEN
SET v_res = v_papa_papa;
END IF;
SET v_res = CONCAT(v_res,'-',v_papa);
END LOOP;
close csr;
return v_res;
END
And then you just
select get_lineage(the_id)
Hope it helps somebody :)



Try these:
Table definition:
DROP TABLE IF EXISTS category;
CREATE TABLE category (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(20),
    parent_id INT,
    CONSTRAINT fk_category_parent FOREIGN KEY (parent_id)
    REFERENCES category (id)
) engine=innodb;
Experimental rows:
INSERT INTO category VALUES
(19, 'category1', NULL),
(20, 'category2', 19),
(21, 'category3', 20),
(22, 'category4', 21),
(23, 'categoryA', 19),
(24, 'categoryB', 23),
(25, 'categoryC', 23),
(26, 'categoryD', 24);
Recursive Stored procedure:
DROP PROCEDURE IF EXISTS getpath;
DELIMITER $$
CREATE PROCEDURE getpath(IN cat_id INT, OUT path TEXT)
BEGIN
    DECLARE catname VARCHAR(20);
    DECLARE temppath TEXT;
    DECLARE tempparent INT;
    SET max_sp_recursion_depth = 255;
    SELECT name, parent_id FROM category WHERE id=cat_id INTO catname, tempparent;
    IF tempparent IS NULL
    THEN
        SET path = catname;
    ELSE
        CALL getpath(tempparent, temppath);
        SET path = CONCAT(temppath, '/', catname);
    END IF;
END$$
DELIMITER ;
Wrapper function for the stored procedure:
DROP FUNCTION IF EXISTS getpath;
DELIMITER $$
CREATE FUNCTION getpath(cat_id INT) RETURNS TEXT DETERMINISTIC
BEGIN
    DECLARE res TEXT;
    CALL getpath(cat_id, res);
    RETURN res;
END$$
DELIMITER ;
Select example:
SELECT id, name, getpath(id) AS path FROM category;
Output:
+----+-----------+-----------------------------------------+
| id | name      | path                                    |
+----+-----------+-----------------------------------------+
| 19 | category1 | category1                               |
| 20 | category2 | category1/category2                     |
| 21 | category3 | category1/category2/category3           |
| 22 | category4 | category1/category2/category3/category4 |
| 23 | categoryA | category1/categoryA                     |
| 24 | categoryB | category1/categoryA/categoryB           |
| 25 | categoryC | category1/categoryA/categoryC           |
| 26 | categoryD | category1/categoryA/categoryB/categoryD |
+----+-----------+-----------------------------------------+
Filtering rows with certain path:
SELECT id, name, getpath(id) AS path FROM category 
HAVING path LIKE 'category1/category2%';
Output:
+----+-----------+-----------------------------------------+
| id | name      | path                                    |
+----+-----------+-----------------------------------------+
| 20 | category2 | category1/category2                     |
| 21 | category3 | category1/category2/category3           |
| 22 | category4 | category1/category2/category3/category4 |
+----+-----------+-----------------------------------------+



You can do it like this in other databases quite easily with a recursive query 
(YMMV on performance).
The other way to do it is to store two extra bits of data, a left and right value. The left
 and right value are derived from a pre-order traversal of the tree structure you're representing.
This is know as Modified Preorder Tree Traversal and lets you run a simple query to 
get all parent values at once. It also goes by the name "nested set".



Its a little tricky one, check this whether it is working for you
select a.id,if(a.parent = 0,@varw:=concat(a.id,','),@varw:=concat(a.id,',',@varw)) as list 
from (select * from recursivejoin order by if(parent=0,id,parent) asc) a 
left join recursivejoin b on (a.id = b.parent),(select @varw:='') as c having list like '%19,%';
Replace with your field and table name appropriately.



I found it more easily to :
1) create a function that will check if a item is anywhere in the parent hierarchy of another 
one. Something like this (I will not write the function, make it with WHILE DO) :
is_related(id, parent_id);
in your example
is_related(21, 19) == 1;
is_related(20, 19) == 1;
is_related(21, 18) == 0;
2) use a sub-select , something like this:
select ...
from table t
join table pt on pt.id in (select i.id from table i where is_related(t.id,i.id));



I have made a query for you. This will give you Recursive Category with a Single Query:
SELECT id,NAME,'' AS subName,'' AS subsubName,'' AS subsubsubName FROM Table1 WHERE
 prent is NULL
UNION
SELECT b.id,a.name,b.name AS subName,'' AS subsubName,'' AS subsubsubName FROM 
Table1 AS a LEFT JOIN Table1 AS b ON b.prent=a.id WHERE a.prent is NULL AND b.name 
IS NOT NULL
UNION
SELECT c.id,a.name,b.name AS subName,c.name AS subsubName,'' AS subsubsubName 
FROM Table1 AS a LEFT JOIN Table1 AS b ON b.prent=a.id LEFT JOIN Table1 AS c 
ON c.prent=b.id WHERE a.prent is NULL AND c.name IS NOT NULL
UNION
SELECT d.id,a.name,b.name AS subName,c.name AS subsubName,d.name AS subsubsubName
 FROM Table1 AS a LEFT JOIN Table1 AS b ON b.prent=a.id LEFT JOIN Table1 AS c 
ON c.prent=b.id LEFT JOIN Table1 AS d ON d.prent=c.id WHERE a.prent is NULL 
AND d.name IS NOT NULL
ORDER BY NAME,subName,subsubName,subsubsubName

0 comments:

Post a Comment