SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Filtered hierarchical data - previous available ancestor?

paul zipJul 10 2013 — edited Jul 10 2013

I'm trying to come up with a solution to a problem relating to hierarchical data and the concept of nearest available ancestor - my platform is 10gR2 and 11gR2. I am given some hierarchical data (in its simplest form it is in form ID, NAME, PARENT_ID, where PARENT_ID links back to ID).

For example:

with qryData as (

select 1 as ID, 'Bert' as NAME, to_number(null) as PARENT_ID from dual union

select 2 as ID, 'Mark' , 1 from dual union

select 3 as ID, 'Brenda' , 1 from dual union

select 4 as ID, 'Mike' , 3 from dual union

select 5 as ID, 'Steve' , 4 from dual union

select 6 as ID, 'John' , 2 from dual union

select 7 as ID, 'Jo' , 6 from dual union

select 8 as ID, 'Jim' , 2 from dual union

select 9 as ID, 'Jane' , 7 from dual

)

select q.*, sys_connect_by_path(ID, '/') ID_PATH

from qryData q

start with parent_id is null

connect by prior ID = PARENT_ID

ID NAME PARENT_ID ID_PATH

---------- ------ ---------- ----------------

1 Bert /1

2 Mark 1 /1/2

6 John 2 /1/2/6

7 Jo 6 /1/2/6/7

9 Jane 7 /1/2/6/7/9

8 Jim 2 /1/2/8

3 Brenda 1 /1/3

4 Mike 3 /1/3/4

5 Steve 4 /1/3/4/5

In reality this dataset can be several thousand rows with tens of nesting levels, multiple start nodes with no parent but most importantly often filtered, so certain arbitrary rows are missing. When this happens I need to find the nearest available ancestor that appears in the list. "Nearest available ancestor" is closest ancestor relation in your family tree that still exists in the filtered list found starting from parent, then contuniing upwards to grandparent, great-grandparent .... nth great grandparent until one ancestor is found.

For example:

ID NAME PARENT_ID ID_PATH AVAIL_ANCESTOR_ID

---------- ------ ---------- ------------- -----------------

1 Bert /1

6 John 2 /1/2/6 1

8 Jim 2 /1/2/8 1

9 Jane 7 /1/2/6/7/9 6

3 Brenda 1 /1/3 1

5 Steve 4 /1/3/4/5 3

For example. Who is Steve's nearest available ancestor in the filtered list?

Steve's family tree is : /Bert/Brenda/Mike/Steve

Mike (ID = 4) isn't in the filtered list, so we move onto Brenda, who is. Brenda, ID = 3, So ANCESTOR_ID = 3

I do have access to the original table, so can join the filtered list back to the full table, I can also ask for other columns (such as ROWID, ROWNUM, LEVEL, SYS_CONNECT_ROOT) to be included in the filtered set. I've tried various approaches to achieve this, but all seem to be quite poor in performance (ID and PARENT_ID columns are indexed where appropriate) or don't quite give the correct result, such as...

1. Parsing the ID_PATH and treewalking based on that and joining back to the list...

select distinct CUSTOMER_ID, regexp_substr( ID_PATH, '[^/]+', 1, level) as ANCESTOR_ID

from qryData

connect by regexp_substr( ID_PATH, '[^/]+', 1, level) is not null

2. Anti-joining back to full list to identify missing items , then treewalking for those.

3. Writing a function which treewalks back.

It's quite a challenging problem, so can anyone think of a solution which performs well? Ideally I'm trying to find a SQL based solution?

This post has been answered by Frank Kulash on Jul 10 2013

Jump to Answer

BrendanP

Looks like you haven't got a well-defined problem as what you're looking for, 'nearest available ancestor' doesn't mean anything until you specify it, and your example doesn't do that.

Solomon Yakobson

You just need to traverse the hierarchy from leaf to root:

with qryData as (

select 1 as ID, 'Bert' as NAME, to_number(null) as PARENT_ID from dual union

select 2 as ID, 'Mark' , 1 from dual union

select 3 as ID, 'Brenda' , 1 from dual union

select 4 as ID, 'Mike' , 3 from dual union

select 5 as ID, 'Steve' , 4 from dual union

select 6 as ID, 'John' , 2 from dual union

select 7 as ID, 'Jo' , 6 from dual union

select 8 as ID, 'Jim' , 2 from dual union

select 9 as ID, 'Jane' , 7 from dual

)

select connect_by_root id id,

id ancestor_id

from qryData

connect by ID = prior PARENT_ID

order by id,

ancestor_id

ID ANCESTOR_ID
--- -----------
1           1
2           1
2           2
3           1
3           3
4           1
4           3
4           4
5           1
5           3
5           4

ID ANCESTOR_ID
--- -----------
5           5
6           1
6           2
6           6
7           1
7           2
7           6
7           7
8           1
8           2
8           8

ID ANCESTOR_ID
--- -----------
9           1
9           2
9           6
9           7
9           9

27 rows selected.

SQL>

SY.

Frank Kulash

Hi,

So you need to preserve the tree structure (in things like id_path) after you remove some of the rows from the output, is that the problem?
Here's one way to do that:

WITH connect_by_results AS

(

SELECT q.*

, SYS_CONNECT_BY_PATH (id, '/') AS id_path

, SYS_CONNECT_BY_PATH ( CASE

WHEN id IN (1, 3, 5, 6, 8, 9) -- or whatever

THEN id

END

, '/'

) AS good_id_path

, ROWNUM AS r_num

FROM qryData q

START WITH parent_id IS NULL

CONNECT BY PRIOR id = parent_id

)

SELECT id, name, parent_id, id_path

, REGEXP_SUBSTR ( RTRIM ( SUBSTR ( good_id_path

, 1

, INSTR (good_id_path, '/', -1)

)

, '/'

)

, '[^/]+$'

) AS avail_ancestor_id

, good_id_path -- Only for debugging

FROM connect_by_results

WHERE SUBSTR (good_id_path, -1) != '/'

ORDER BY r_num

;

This doesn't involve a second CONNECT BY, a self-join, or any analytic functions, all of which can be slow.

Whatever criteia determine whether a row is in the result set or not are put in the CASE expression that produces good_id_path, and not in a WHERE clause, so good_id_path will include only ids from the final result set. If good_id_path does not end with a '/', then it will be included. Finding the available_ancestor_id is simply a matter of finding the next-to-last id (if any) in good_id_path.

Output:

AVAIL_

PARENT ANCESTOR

ID NAME _ID ID_PATH _ID GOOD_ID_PATH

--- ------ ------ ------------ ---------- ------------

1 Bert /1 /1

6 John 2 /1/2/6 1 /1//6

9 Jane 7 /1/2/6/7/9 6 /1//6//9

8 Jim 2 /1/2/8 1 /1//8

3 Brenda 1 /1/3 1 /1/3

5 Steve 4 /1/3/4/5 3 /1/3//5

1 person found this helpful

paul zip

"Nearest available ancestor" is closest ancestor relation in your family tree that exists in the list starting from parent, grandparent, great-grandparent .... nth great grandparent .

For example. Who is Steve's nearest available ancestor in the filtered list?

Steve's family tree is : /Bert/Brenda/Mike/Steve

Mike isn't in the list, so we move onto Brenda, who is. Brenda, ID = 3, So ANCESTOR_ID = 3

Solomon Yakobson

It looks like I didn't understand the requirements. Based on

ID NAME PARENT_ID ID_PATH AVAIL_ANCESTOR_ID

---------- ------ ---------- ------------- -----------------

1 Bert /1

6 John 2 /1/2/6 1

8 Jim 2 /1/2/8 1

9 Jane 7 /1/2/6/7/9 6

3 Brenda 1 /1/3 1

5 Steve 4 /1/3/4/5 3

You want a grandparent id, or parent id if there is no grandparent. If so:

with qryData as (

select 1 as ID, 'Bert' as NAME, to_number(null) as PARENT_ID from dual union

select 2 as ID, 'Mark' , 1 from dual union

select 3 as ID, 'Brenda' , 1 from dual union

select 4 as ID, 'Mike' , 3 from dual union

select 5 as ID, 'Steve' , 4 from dual union

select 6 as ID, 'John' , 2 from dual union

select 7 as ID, 'Jo' , 6 from dual union

select 8 as ID, 'Jim' , 2 from dual union

select 9 as ID, 'Jane' , 7 from dual

)

select q.*,

sys_connect_by_path(ID, '/') ID_PATH,

nvl(prior parent_id,parent_id) AVAIL_ANCESTOR_ID

from qryData q

where id IN (1,3,5,6,8,9)

start with parent_id is null

connect by prior ID = PARENT_ID

ID NAME    PARENT_ID ID_PATH               AVAIL_ANCESTOR_ID
--- ------ ---------- -------------------- ------------------
1 Bert              /1
6 John            2 /1/2/6                                1
9 Jane            7 /1/2/6/7/9                            6
8 Jim             2 /1/2/8                                1
3 Brenda          1 /1/3                                  1
5 Steve           4 /1/3/4/5                              3

6 rows selected.

SQL>

SY.

BrendanP

Well, perhaps I'm missing something obvious, but surely you don't have the family tree as such, only id, parent_id and have to deduce the tree, and as the parent of Steve doesn't exist, how do you know Brenda is the one? If you actually have the id_path - your qryData test subquery doesn't - then ok.

Message was edited by: BrendanP Looks like OP has updated the subquery so it now does have the 'missing' ids, but still seems underspecified - where are the missing ids identified as such?

paul zip

This is a clever solution! The only problem I have is I wouldn't be able to generate "good_id_path" as part of the filtered list, as filtering is not done by me and is done after the id_path is generated. I'd have to join filtered list back to the full list to generate it, unless there's a tweak to do it?

paul zip

This only works for missing parents, I need to continue from parent => grandparent => great grandparent....... => great (x n) grandparent

if I use "where id IN (1,3,5,8,9)" it says Jane's ANCESTOR_ID = 6 (John) but he isn't in the list, so it is incorrect.

Solomon Yakobson

OK. It looks like now I undestand your requirements. As I showed in my first reply, you can traverse hierarchy from leaf to root:

with qryData as (

select 1 as ID, 'Bert' as NAME, to_number(null) as PARENT_ID from dual union

select 2 as ID, 'Mark' , 1 from dual union

select 3 as ID, 'Brenda' , 1 from dual union

select 4 as ID, 'Mike' , 3 from dual union

select 5 as ID, 'Steve' , 4 from dual union

select 6 as ID, 'John' , 2 from dual union

select 7 as ID, 'Jo' , 6 from dual union

select 8 as ID, 'Jim' , 2 from dual union

select 9 as ID, 'Jane' , 7 from dual

ancestors as (

select level lvl,

connect_by_root id id,

connect_by_root name name,

connect_by_root parent_id parent_id,

parent_id ancestor_id

from qryData

start with id in (1,3,5,6,8,9)

connect by ID = prior PARENT_ID

)

select id,

name,

parent_id,

min(ancestor_id) keep(dense_rank first order by lvl) avail_ancestor_id

from ancestors

where ancestor_id in (1,3,5,6,8,9)

or ancestor_id is null

group by id,

name,

parent_id

order by id,

avail_ancestor_id

ID NAME    PARENT_ID AVAIL_ANCESTOR_ID
--- ------ ---------- ------------------
1 Bert
3 Brenda          1                  1
5 Steve           4                  3
6 John            2                  1
8 Jim             2                  1
9 Jane            7                  6

6 rows selected.

SQL>

SY.

Frank Kulash

Answer

Hi,

paulzip wrote:

I see; you're given connect_by_results (with id_path), and that's all you have to work with, besides the original table.

Here's one way:

SELECT d.id, d.name, d.parent_id, d.id_path

, MIN (a.id) KEEP (DENSE_RANK LAST ORDER BY a.id_path) AS avail_ancestor_id

FROM connect_by_results d

LEFT OUTER JOIN connect_by_results a ON d.id_path LIKE a.id_path || '/%'

GROUP BY d.id, d.name, d.parent_id, d.id_path

ORDER BY d.id_path

;

The fastest way depends on lots of things, some of which can't be known until run-time. The query above might be best if the number of rows in the result set is small, but you don't know anything else about them (e.g., whether they all have a common ancestor).

Marked as Answer by paul zip · Sep 27 2020

Solomon Yakobson

Actually, it can be optimized even more:

with qryData as (

select 1 as ID, 'Bert' as NAME, to_number(null) as PARENT_ID from dual union

select 2 as ID, 'Mark' , 1 from dual union

select 3 as ID, 'Brenda' , 1 from dual union

select 4 as ID, 'Mike' , 3 from dual union

select 5 as ID, 'Steve' , 4 from dual union

select 6 as ID, 'John' , 2 from dual union

select 7 as ID, 'Jo' , 6 from dual union

select 8 as ID, 'Jim' , 2 from dual union

select 9 as ID, 'Jane' , 7 from dual

ancestors as (

select level lvl,

connect_by_root id id,

connect_by_root name name,

connect_by_root parent_id parent_id,

parent_id ancestor_id

from qryData

start with id in (1,3,5,6,8,9)

connect by id = prior parent_id

and id not in (1,3,5,6,8,9)

)

select id,

name,

parent_id,

min(ancestor_id) keep(dense_rank first order by case when parent_id in (1,3,5,6,8,9) then lvl end nulls last) avail_ancestor_id

from ancestors

group by id,

name,

parent_id

order by id,

avail_ancestor_id

6 rows selected.

SQL>

SY.

paul zip

I was very impressed with this approach. Yes, it can see may be slow with large datasets, but with smaller sets it'll be fine. What impressed me is it was extremely clever in the way it only uses the filtered dataset to yield the answer, Quite genius.

paul zip

I can see what you are doing, but my list could contain 100s / 1000s of items, how would this transpose to your "in (1,3,5,6,8,9)" situations, without multiple table scans?

Solomon Yakobson

paulzip wrote:

I can see what you are doing, but my list could contain 100s / 1000s of items, how would this transpose to your "in (1,3,5,6,8,9)" situations, without multiple table scans?

So what prevents you from using nested table/varray or a driver table:

SQL> select *
2 from qryData
3 /

ID NAME    PARENT_ID
--- ------ ----------
1 Bert
2 Mark            1
3 Brenda          1
4 Mike            3
5 Steve           4
6 John            2
7 Jo              6
8 Jim             2
9 Jane            7

9 rows selected.

SQL> explain plan for
2 with ancestors as (
3                   select level lvl,
4                           connect_by_root id id,
5                           connect_by_root name name,
6                           connect_by_root parent_id parent_id,
7                           parent_id ancestor_id
8                     from qryData
9                     start with id in (select * from table(sys.OdciNumberList(1,3,5,6,8,9)))
10                     connect by id = prior parent_id
11                            and id not in (select * from table(sys.OdciNumberList(1,3,5,6,8,9)))
12                  )
13 select id,
14          name,
15          parent_id,
16          min(ancestor_id) keep(dense_rank first order by case when parent_id in (select * from table(sys.OdciNumberList(1,3,5,6,8,9))) then lvl end nulls last) avail_ancestor_id
17    from ancestors
18    group by id,
19             name,
20             parent_id
21    order by id,
22             avail_ancestor_id
23 /

Explained.

SQL> select *
2 from table(dbms_xplan.display)
3 /

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3643621826

------------------------------------------------------------------------------------------------------
| Id | Operation                                  | Name    | Rows | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |         |     9 |   513 |     5 (40)| 00:00:01 |
|* 1 | COLLECTION ITERATOR CONSTRUCTOR FETCH     |         |     1 |     2 |     2   (0)| 00:00:01 |
|   2 | SORT ORDER BY                             |         |     9 |   513 |     5 (40)| 00:00:01 |
|   3 |   SORT GROUP BY                            |         |     9 |   513 |     5 (40)| 00:00:01 |
|   4 |    VIEW                                    |         |     9 |   513 |    22 (87)| 00:00:01 |
|* 5 |     CONNECT BY NO FILTERING WITH START-WITH|         |       |       |            |          |

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
|   6 |      TABLE ACCESS FULL                     | QRYDATA |     9 |   279 |     3   (0)| 00:00:01 |
|* 7 |      COLLECTION ITERATOR CONSTRUCTOR FETCH |         |     2 |     4 |     2   (0)| 00:00:01 |
|* 8 |      COLLECTION ITERATOR CONSTRUCTOR FETCH |         |     1 |     2 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(VALUE(KOKBF$)=:B1)
   5 - access("ID"=PRIOR "PARENT_ID")
       filter( NOT EXISTS (SELECT 0 FROM TABLE() "KOKBF$3" WHERE LNNVL(VALUE(KOKBF$)<>:B1))

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
              AND EXISTS (SELECT 0 FROM TABLE() "KOKBF$0" WHERE VALUE(KOKBF$)=:B2))
   7 - filter(LNNVL(VALUE(KOKBF$)<>:B1))
   8 - filter(VALUE(KOKBF$)=:B1)

Note
-----
- dynamic sampling used for this statement (level=2)

29 rows selected.

SQL>

SY.

1 person found this helpful

1 - 14

Locked Post

New comments cannot be posted to this locked post.

Locked on Aug 7 2013

Added on Jul 10 2013

#pl, #plsql, #query, #select, #sql, #table

14 comments

2,584 views

SQL & PL/SQL

Filtered hierarchical data - previous available ancestor?

Comments

Post Details