I'm trying to come up with a solution to a problem relating to hierarchical data and the concept of nearest available ancestor - my platform is 10gR2 and 11gR2. I am given some hierarchical data (in its simplest form it is in form ID, NAME, PARENT_ID, where PARENT_ID links back to ID).
For example:
with qryData as (
select 1 as ID, 'Bert' as NAME, to_number(null) as PARENT_ID from dual union
select 2 as ID, 'Mark' , 1 from dual union
select 3 as ID, 'Brenda' , 1 from dual union
select 4 as ID, 'Mike' , 3 from dual union
select 5 as ID, 'Steve' , 4 from dual union
select 6 as ID, 'John' , 2 from dual union
select 7 as ID, 'Jo' , 6 from dual union
select 8 as ID, 'Jim' , 2 from dual union
select 9 as ID, 'Jane' , 7 from dual
)
select q.*, sys_connect_by_path(ID, '/') ID_PATH
from qryData q
start with parent_id is null
connect by prior ID = PARENT_ID
/
ID NAME PARENT_ID ID_PATH
---------- ------ ---------- ----------------
1 Bert /1
2 Mark 1 /1/2
6 John 2 /1/2/6
7 Jo 6 /1/2/6/7
9 Jane 7 /1/2/6/7/9
8 Jim 2 /1/2/8
3 Brenda 1 /1/3
4 Mike 3 /1/3/4
5 Steve 4 /1/3/4/5
In reality this dataset can be several thousand rows with tens of nesting levels, multiple start nodes with no parent but most importantly often filtered, so certain arbitrary rows are missing. When this happens I need to find the nearest available ancestor that appears in the list. "Nearest available ancestor" is closest ancestor relation in your family tree that still exists in the filtered list found starting from parent, then contuniing upwards to grandparent, great-grandparent .... nth great grandparent until one ancestor is found.
For example:
ID NAME PARENT_ID ID_PATH AVAIL_ANCESTOR_ID
---------- ------ ---------- ------------- -----------------
1 Bert /1
6 John 2 /1/2/6 1
8 Jim 2 /1/2/8 1
9 Jane 7 /1/2/6/7/9 6
3 Brenda 1 /1/3 1
5 Steve 4 /1/3/4/5 3
For example. Who is Steve's nearest available ancestor in the filtered list?
Steve's family tree is : /Bert/Brenda/Mike/Steve
Mike (ID = 4) isn't in the filtered list, so we move onto Brenda, who is. Brenda, ID = 3, So ANCESTOR_ID = 3
I do have access to the original table, so can join the filtered list back to the full table, I can also ask for other columns (such as ROWID, ROWNUM, LEVEL, SYS_CONNECT_ROOT) to be included in the filtered set. I've tried various approaches to achieve this, but all seem to be quite poor in performance (ID and PARENT_ID columns are indexed where appropriate) or don't quite give the correct result, such as...
1. Parsing the ID_PATH and treewalking based on that and joining back to the list...
select distinct CUSTOMER_ID, regexp_substr( ID_PATH, '[^/]+', 1, level) as ANCESTOR_ID
from qryData
connect by regexp_substr( ID_PATH, '[^/]+', 1, level) is not null
2. Anti-joining back to full list to identify missing items , then treewalking for those.
3. Writing a function which treewalks back.
It's quite a challenging problem, so can anyone think of a solution which performs well? Ideally I'm trying to find a SQL based solution?