 3,733,057 Users
 2,246,687 Discussions
 7,856,491 Comments
Forum Stats
Discussions
Howdy, Stranger!
Categories
 380.9K All Categories
 2.1K Data
 203 Big Data Appliance
 1.9K Data Science
 446.1K Databases
 220.4K General Database Discussions
 3.7K Java and JavaScript in the Database
 22 Multilingual Engine
 506 MySQL Community Space
 459 NoSQL Database
 7.7K Oracle Database Express Edition (XE)
 2.8K ORDS, SODA & JSON in the Database
 437 SQLcl
 3.9K SQL Developer Data Modeler
 185.4K SQL & PL/SQL
 20.7K SQL Developer
 291.2K Development
 6 Developer Projects
 116 Programming Languages
 288K Development Tools
 96 DevOps
 3K QA/Testing
 645.2K Java
 16 Java Learning Subscription
 36.9K Database Connectivity
 148 Java Community Process
 104 Java 25
 22.1K Java APIs
 137.7K Java Development Tools
 165.3K Java EE (Java Enterprise Edition)
 12 Java Essentials
 138 Java 8 Questions
 85.9K Java Programming
 79 Java Puzzle Ball
 65.1K New To Java
 1.7K Training / Learning / Certification
 13.8K Java HotSpot Virtual Machine
 94.2K Java SE
 13.8K Java Security
 195 Java User Groups
 22 JavaScript  Nashorn
 Programs
 175 LiveLabs
 33 Workshops
 10.2K Software
 6.7K Berkeley DB Family
 3.5K JHeadstart
 5.7K Other Languages
 2.3K Chinese
 165 Deutsche Oracle Community
 1.2K Español
 1.9K Japanese
 225 Portuguese
How to use Recursive Subquery Factoring (RSF) to Implement Dijkstra’s shortest path algorithm?
Can someone who knows Recursive Subquery Factoring (RSF) can tell me if my Dijkstra’s shortest path algorithm is implement correctly and efficiently? Below you will found my SQL query and a trial dataset.
Below my query, a trial dataset and table definition to contain the data:
/*Dijkstra Shortest path*/
WITH UsefulArcs AS
(SELECT SRC, DST, DISTANCE FROM ARCS
),
Paths (SrcArc, DestArc, SrcPath, DestPath, Path, OpDist, DistPath, DistPathMin) AS
(SELECT SRC AS SrcArc ,
DST AS DestArc ,
SRC AS SrcPath ,
DST AS DestPath ,
SRC
'>'
 DST AS Path ,
''
DISTANCE AS OpDist ,
DISTANCE AS DistPath,
DISTANCE AS DistPathMin
FROM UsefulArcs
WHERE SRC=:prmSource
UNION ALL
SELECT a.SRC AS SrcArc,
a.DST AS DestArc,
Paths.SrcPath ,
a.DST AS DestPath ,
Paths.Path
'>'
a.DST AS Path,
Paths.OpDist
'+'
a.DISTANCE AS OpDist ,
Paths.DistPath +a.DISTANCE AS DistPath,
MIN(Paths.DistPath+a.DISTANCE) OVER (PARTITION BY a.DST) AS DistPathMin
FROM UsefulArcs a
INNER JOIN Paths
ON (Paths.DestArc =a.SRC)
WHERE Paths.DistPathMin=Paths.DistPath
) SEARCH BREADTH FIRST BY DestPath
SEARCH DEPTH FIRST BY DestPath
SET Order01
SELECT *
FROM Paths
WHERE SrcPath =:prmSource
AND DestPath =:prmSink
AND DistPath =
(SELECT MIN(DistPath) FROM Paths WHERE SrcPath=:prmSource AND DestPath =:prmSink
)
ORDER BY DistPath;
Set prmSource=1 and prmSink=10.
Dijkstra’s algorithm definition: http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
Problems data: http://www.ime.unicamp.br/~andreani/MS515/capitulo7.pdf
CREATE TABLE "ARCS"
( "SRC" NUMBER(3,0),
"DST" NUMBER(3,0),
"DISTANCE" NUMBER(3,0)
) ;
CREATE TABLE "NODES"
( "NODES" NUMBER(3,0)
) ;
REM INSERTING into ARCS
SET DEFINE OFF;
Insert into ARCS (SRC,DST,DISTANCE) values ('1','2','2');
Insert into ARCS (SRC,DST,DISTANCE) values ('1','3','4');
Insert into ARCS (SRC,DST,DISTANCE) values ('1','4','3');
Insert into ARCS (SRC,DST,DISTANCE) values ('2','5','7');
Insert into ARCS (SRC,DST,DISTANCE) values ('3','5','3');
Insert into ARCS (SRC,DST,DISTANCE) values ('4','5','4');
Insert into ARCS (SRC,DST,DISTANCE) values ('2','6','4');
Insert into ARCS (SRC,DST,DISTANCE) values ('3','6','2');
Insert into ARCS (SRC,DST,DISTANCE) values ('4','6','1');
Insert into ARCS (SRC,DST,DISTANCE) values ('2','7','6');
Insert into ARCS (SRC,DST,DISTANCE) values ('3','7','4');
Insert into ARCS (SRC,DST,DISTANCE) values ('4','7','5');
Insert into ARCS (SRC,DST,DISTANCE) values ('5','8','1');
Insert into ARCS (SRC,DST,DISTANCE) values ('6','8','6');
Insert into ARCS (SRC,DST,DISTANCE) values ('7','8','3');
Insert into ARCS (SRC,DST,DISTANCE) values ('5','9','4');
Insert into ARCS (SRC,DST,DISTANCE) values ('6','9','3');
Insert into ARCS (SRC,DST,DISTANCE) values ('7','9','3');
Insert into ARCS (SRC,DST,DISTANCE) values ('8','10','3');
Insert into ARCS (SRC,DST,DISTANCE) values ('9','10','4');
REM INSERTING into NODES
SET DEFINE OFF;
Insert into NODES (NODES) values ('1');
Insert into NODES (NODES) values ('2');
Insert into NODES (NODES) values ('3');
Insert into NODES (NODES) values ('4');
Insert into NODES (NODES) values ('5');
Insert into NODES (NODES) values ('6');
Insert into NODES (NODES) values ('7');
Insert into NODES (NODES) values ('8');
Insert into NODES (NODES) values ('9');
Insert into NODES (NODES) values ('10');
CREATE UNIQUE INDEX "ARCS_PK1" ON "ARCS" ("SRC", "DST")
;
CREATE UNIQUE INDEX "NOEUDS_PK1" ON "NODES" ("NODES")
;
ALTER TABLE "ARCS" MODIFY ("SRC" NOT NULL ENABLE);
ALTER TABLE "ARCS" MODIFY ("DST" NOT NULL ENABLE);
ALTER TABLE "ARCS" MODIFY ("DISTANCE" NOT NULL ENABLE);
ALTER TABLE "ARCS" ADD CONSTRAINT "ARCS_PK" PRIMARY KEY ("SRC", "DST") ENABLE;
ALTER TABLE "NODES" MODIFY ("NODES" NOT NULL ENABLE);
ALTER TABLE "NODES" ADD CONSTRAINT "NOEUDS_PK" PRIMARY KEY ("NODES") ENABLE;
ALTER TABLE "ARCS" ADD CONSTRAINT "ARCS_FK1" FOREIGN KEY ("SRC")
REFERENCES "NODES" ("NODES") ENABLE;
ALTER TABLE "ARCS" ADD CONSTRAINT "ARCS_FK2" FOREIGN KEY ("DST")
REFERENCES "NODES" ("NODES") ENABLE;
Answers

There's one thing you can improve. If prmSink already exists in the path you can prune all the branches longer than that known path.

This is no implementation of /*Dijkstra Shortest path*/ imo, but a "simple" brute force algorithm, wich checks all possible ways from prmSource.

There's a restriction with RSF: in the recursive part you can only see the last level of data.
So the Paths.DistPathMin=Paths.DistPath condition only works for the paths having the same length.
If we add this row:
Insert into ARCS (SRC,DST,DISTANCE) values (1,5,4);
Now path >1>5>8>10> is the best, but the query wil still search along >1>3>5>8>10> and >1>4>5>8>10> because when it sees "5" at level 3, the "5" at level 2 is invisible and can't be included in the calculation of DistPathMin.
In order to see the level 2 "5" node we can add fake nodes to bring the data from all previous level:
WITH UsefulArcs AS
(SELECT SRC, DST, DISTANCE FROM ARCS
UNION SELECT SRC,SRC,0 FROM ARCS  add fake nodes
UNION SELECT DST,DST,0 FROM ARCS
),
Paths (SrcArc, SrcPath, DestPath, Path, OpDist, DistPath, DistPathMin,LVL,CNT,found_dist) AS
(SELECT SRC AS SrcArc,
SRC AS SrcPath ,
DST AS DestPath ,
CAST('>'SRC'>'DST'>' AS VARCHAR2(4000)) AS Path ,
''
DISTANCE AS OpDist ,
DISTANCE AS DistPath,
DISTANCE AS DistPathMin
,1 AS LVL
,COUNT(*) OVER() AS CNT
,MIN(CASE WHEN DST=:prmSink THEN DISTANCE END) OVER() AS found_dist  the best found distance
FROM UsefulArcs
WHERE SRC=:prmSource AND SRC<>DST
UNION ALL
SELECT a.SRC AS SrcArc,
Paths.SrcPath ,
a.DST AS DestPath ,
Paths.Path
CASE WHEN a.DST<>a.SRC THEN a.DST'>' END AS Path,
Paths.OpDist
CASE WHEN a.DST <>a.SRC THEN '+'a.DISTANCE END AS OpDist ,
Paths.DistPath +a.DISTANCE AS DistPath,
MIN(Paths.DistPath+a.DISTANCE) OVER (PARTITION BY a.DST) AS DistPathMin
,LVL+1
,COUNT(CASE WHEN a.DST <>a.SRC THEN a.src END) OVER() AS CNT
,MIN(CASE WHEN a.DST=:prmSink AND (paths.found_dist IS NULL OR Paths.DistPath+a.DISTANCE<paths.found_dist)
THEN Paths.DistPath+a.DISTANCE  if new path found and it's better than replace the old one
ELSE paths.found_dist
END) OVER() AS found_dist
FROM UsefulArcs a
JOIN Paths
ON (Paths.DestPath =a.SRC)
AND (Paths.SrcArc<>Paths.DestPath AND INSTR(Paths.Path,'>'a.DST'>')=0  add only new destinations
OR a.DST=a.SRC  or it's the end
)
WHERE Paths.DistPathMin=Paths.DistPath
AND cnt>0  if no more new nodes then it's the end
AND paths.DestPath<>:prmSink  if we reach prmSink then no need to continue on this path
AND (paths.found_dist IS NULL  not found yet
OR paths.found_dist>=Paths.DistPath+a.DISTANCE  if not better than the path found then stop
)
)
CYCLE path,lvl SET cycle_flag TO 'Y' DEFAULT 'N'  cycle may appear because of outer join may keep the same path from last level
SELECT DISTINCT path,OpDist'='DistPath as distance
FROM Paths
WHERE SrcPath =:prmSource
AND DestPath =:prmSink
AND DistPath = found_dist
;
But this raises another issue when nodes will be repeated at all levels.
Actually in the final select we only need the latest level of data, but unfortunately there's no option to discard the older levels during the recursive process.
If we rewrite this RSF into a PL/SQL loop then it's possible to keep only the latest level and implement this algorithm.

Something to play with
with
arcs as
(select '1' src,'2' dst,2 d from dual union all
select '1','3',4 from dual union all
select '1','4',3 from dual union all
select '2','5',7 from dual union all
select '3','5',3 from dual union all
select '4','5',4 from dual union all
select '2','6',4 from dual union all
select '3','6',2 from dual union all
select '4','6',1 from dual union all
select '2','7',6 from dual union all
select '3','7',4 from dual union all
select '4','7',5 from dual union all
select '5','8',1 from dual union all
select '6','8',6 from dual union all
select '7','8',3 from dual union all
select '5','9',4 from dual union all
select '6','9',3 from dual union all
select '7','9',3 from dual union all
select '8','10',3 from dual union all
select '9','10',4 from dual
),
shortest_path(src,dst,path,dist,stop,step) as
(select src,dst,path,dist,null,1
from (select src,:dst dst,src  '~'  dst path,d,min(d) over () dist
from arcs
where src = :src
)
where d = dist
union all
select a.src,
p.dst,p.path  '~'  a.dst  case when a.dst = p.dst then '+' end,
case when a.d = min(a.d) over ()
then p.dist + a.d
when a.dst = p.dst
then p.dist + a.d
end,
last_value(case when a.dst = p.dst then 'stop' end) ignore nulls over (order by null),
step + 1
from arcs a,
shortest_path p
where a.src = regexp_substr(p.path,'\d+$')
and p.dst != regexp_substr(p.path,'\d+$')
and p.dist is not null
and p.stop is null
)
select rtrim(path,'+') path,dist
from shortest_path
where stop is not null
and instr(path,'+') > 0
PATH DIST 1~2~6~9~10 13 Regards
Etbin

You pick the closest node each time, which does not necessarily lead to the shortest path. Actually it may not lead to the destination at all.

That's why something to play with stands for.
I just proposed a query (using recursive subquery factoring) that (in some cases) produces a solution, so I thought others might be interested in  I was playing too .
I'm here mainly to learn and to stay exercised as providing a solution of some kind is (for a nonnative speaker) much more simple than debating about how it might have been achieved and a forum IMHO is certainly not the place to exhibit proofs whether a certain algorithm is correct or not.
As I'm typing this already I can share my reasoning:
 at each step I pick the nodes reachable from the starting node or from the last node on the current shortest path.
 If the nodes include the destination node I pick the destination node and stop else I concatenate the nearest node to the shortest path incrementing the distance.
I don't claim it solves a general case leaving to others to find examples when it doesn't work.
Regards
Etbin
As there's not much traffic an easy couter example where my query would give an incorrect result
1
/ \
2 3
/ \
2___4___3
\ \
5 2
\ \
\________4
with
arcs as
(select '1' src,'2' dst,2 d from dual union all
select '1','3',3 from dual union all
select '2','3',4 from dual union all
select '2','4',5 from dual union all
select '3','4',2 from dual
)
PATH5 DIST 1~2~4 7 

This solution finds all minimum distance routes to each node. I stuck with the lettered nodes in the data set link.
NODE LEV PATH DST_PATH     B 1 A,B 2 C 1 A,C 4 D 1 A,D 3 E 2 A,D,E 7 2 A,C,E 7 F 2 A,D,F 4 G 2 A,D,G 8 2 A,B,G 8 2 A,C,G 8 H 3 A,D,E,H 8 3 A,C,E,H 8 I 3 A,D,F,I 7 J 4 A,C,E,H,J 11 4 A,D,F,I,J 11 4 A,D,E,H,J 11 15 rows selected. 1 WITH arcs_both_ways AS ( 2 SELECT src, dst, distance 3 FROM arcs 4 UNION ALL 5 SELECT dst, src, distance 6 FROM arcs 7 ), paths (node, path, dst_path, rnk, nodes_visited, lev) AS ( 8 SELECT a.dst, a.src  ','  a.dst, a.distance, 1, a.src  ','  Listagg (a.dst, ',') WITHIN GROUP (ORDER BY a.dst) OVER (), 1 9 FROM arcs_both_ways a 10 WHERE a.src = :prmSource 11 UNION ALL 12 SELECT a.dst, 13 p.path  ','  a.dst, 14 p.dst_path + a.distance, 15 Rank () OVER (PARTITION BY a.dst ORDER BY p.dst_path + a.distance), 16 p.nodes_visited  ','  Listagg (a.dst, ',') WITHIN GROUP (ORDER BY a.dst) OVER (), 17 p.lev + 1 18 FROM paths p 19 JOIN arcs_both_ways a 20 ON a.src = p.node 21 AND p.rnk = 1 22 WHERE Instr (','  p.path  ',', ','  a.dst  ',') = 0 23 ), paths_ranked AS ( 24 SELECT lev, node, path, dst_path, Rank () OVER (PARTITION BY node ORDER BY dst_path) rnk_tot 25 FROM paths 26 WHERE rnk = 1 27 ) 28 SELECT node, lev, path, dst_path 29 FROM paths_ranked 30 WHERE rnk_tot = 1 31* ORDER BY 1, 2
130415 0705:
Small tidyup, and use Oracle's cycle direction to replace my instr. Will try to add a comment later today, but suggest to OP to try reversing his source and sink.
WITH arcs_both_ways AS ( SELECT src, dst, distance FROM arcs UNION ALL SELECT dst, src, distance FROM arcs ), paths (node, path, dst_path, rnk, lev) AS ( SELECT a.dst, a.src  ','  a.dst, a.distance, 1, 1 FROM arcs_both_ways a WHERE a.src = :prmSource UNION ALL SELECT a.dst, p.path  ','  a.dst, p.dst_path + a.distance, Rank () OVER (PARTITION BY a.dst ORDER BY p.dst_path + a.distance), p.lev + 1 FROM paths p JOIN arcs_both_ways a ON a.src = p.node AND p.rnk = 1 ) CYCLE node SET IS_CYCLE TO '1' DEFAULT '0' , paths_ranked AS ( SELECT lev, node, path, dst_path, Rank () OVER (PARTITION BY node ORDER BY dst_path) rnk_tot FROM paths WHERE rnk = 1 ) SELECT node, lev, path, dst_path FROM paths_ranked WHERE rnk_tot = 1 ORDER BY 1, 2

Your query is very similar to OP's and the rank() has the same problem that it can't see the same dst which appears prior to the last level. This isn't as efficient as Dijkstra's algorithm but still gives the correct result.

It's a very interesting thread, I will spend a day to study it!

You are right. I will work to imagine how to do that using RSF. I will also implement the first insight you give me.

I know that this is not Dijkstra’s. It’s the reason of this post. Below is the "brute force" shortest path. This enumerates 107 paths. In comparison, the first query that I propose only enumerate 27 paths. It's less but it can be better as @James Su mention. We can add a global cut after the first solution is found. We also have to found how to compare branches from different level.
/*Shortest path*/
WITH UsefulArcs AS
(SELECT SRC, DST, DISTANCE FROM ARCS
),
Paths (SrcArc, DestArc, SrcPath, DestPath, Path, OpDist, DistPath, DistPathMin) AS
(SELECT SRC AS SrcArc ,
DST AS DestArc ,
SRC AS SrcPath ,
DST AS DestPath ,
SRC
'>'
 DST AS Path ,
''
DISTANCE AS OpDist ,
DISTANCE AS DistPath,
DISTANCE AS DistPathMin
FROM UsefulArcs
UNION ALL
SELECT a.SRC AS SrcArc,
a.DST AS DestArc,
Paths.SrcPath ,
a.DST AS DestPath ,
Paths.Path
'>'
a.DST AS Path,
Paths.OpDist
'+'
a.DISTANCE AS OpDist ,
Paths.DistPath +a.DISTANCE AS ,
MIN(Paths.DistPath+a.DISTANCE) OVER (PARTITION BY a.DST) AS DistPathMin
FROM UsefulArcs a
INNER JOIN Paths
ON (Paths.DestArc=a.SRC)
) SEARCH BREADTH FIRST BY DestPath
SEARCH DEPTH FIRST BY DestPath
SET Order01
SELECT *
FROM Paths
WHERE SrcPath=:prmSource
AND DestPath =:prmSink
AND DistPath =
(SELECT MIN(DistPath) FROM Paths WHERE SrcPath=:prmSource AND DestPath =:prmSink
)
ORDER BY DistPath;

Blais wrote: You are right. I will work to imagine how to do that using RSF. I will also implement the first insight you give me.
It had better to quickly find one path to destination and use its distance as reference to eliminate all other paths with longer distances for next level process. Otherwise one may take a trip around the would and then back to the destination . Before that it is hard to drop any path for further blind searching.
But for large collection of nodes and far away between start and destination points, at any intermediate point (M), you can pick the one with MIN distance in EACH SAME group and continue searching, eg for all rows in such group. Such aggregate function during recursive process will be expensive but it can certainly save PGA for longer process.
(S)>A>B>(M)
(S)>B>A>(M)

How do you pick the M point? If it's from the neighbours then this is already described in the algorithm.

You are right, it does look like a poem and it's very enjoyable reading experience. Thank you!

hmm, my understanding is at start point it can move to any neighbors in all directions but each node can be only in the path once, eg, only instr(path, next neighbor)=0 is valid path at next level.
So there are 4 paths from S to M (S>A>M, S>B>M, S>A>B>M, S>B>A>M). And the later 2 can be reduced to one for further searching. Such intermediate points are possibly existing at level 3 and beyond.
S  A 
  
_______B  M  ...........  D

Blais wrote: I know that this is not Dijkstra’s. It’s the reason of this post.
In your initial post you asked: "if my Dijkstra’s shortest path algorithm is implement correctly"?
So what is in fact your question if you know already the answer?
As far as i know the Dijkstra algorithm only leads to one shortest path and not to all shortest pathes.
In the thread i mentioned above i implemented Dijkstra by using model clause. The solution works for your example too (109641=11).
As far as i remember i came to the conclusion that it is not possible to implement the Dijkstra by using recursive subquery, but i may be wrong.
(That does not mean that it is not possible to solve shortest path problems at all with this tool!)

with agraph as ( select src source, dst destination, distance from arcs union all select 0, 1, 0 from dual ) ,m as (  eval the nodes from the arcs, not from the nodes table
select source n from agraph union select destination from agraph ) ,  calculates all shortest pathes from start node (#1) to all other nodes  reachable from the start node  if another start node should be choosen just renumerate them so that  the new one is #1 pathes as (select nn node, p predecessor, d dist from m model reference destinations on (select source, destination, distance from agraph) dimension by (source,destination) measures (source s, distance dist, destination dest) main mm dimension by(n) measures(n as nn, 0 as p, case n when 0 then 0 else 1000 end as d, 0 as v, 0 as it) UNIQUE SINGLE REFERENCE rules UPDATE iterate (2000) until d[0]=1 ( p[0]=min(nn) keep(dense_rank first order by v,d)[any] ,it[p[0]]=ITERATION_number+1 ,p[any] order by d,n= case when cv(n) = dest[p[0],cv(n)] and d[cv(n)] > dist[p[0],cv(n)] + d[p[0]] then s[p[0],cv(n)] else p[cv(n)] end ,d[any] order by d,n= case when cv(n) = dest[p[0],cv(n)] and d[cv(n)] > dist[p[0],cv(n)] + d[p[0]] then dist[p[0],cv(n)] + d[p[0]] else d[cv(n)] end ,v[p[0]]=1 ,d[0]=min(v)[any] )) select ltrim(min(sys_connect_by_path (node, '<')) keep (dense_rank first order by dist), '<') path ,max(dist) dist from pathes connect by nocycle node = prior predecessor start with node = 10 PATHDIST 10<9<6<4<1 11 
This algorithm is for directed graph. And it will only keep the shortest path and never needs to merge.

Aggregation is not allowed in recursive subquery, but analytic function is supported so we can achieve the same goal. Now the only issue is we might end up with too much intermediate data in the final result set. Otherwise it already implements this algorithm.

It seems I have different understanding. To me it is like puzzle palace game, one starts at entrance and try to exit. So one may either reach to a dead end or luckily get out in different paths sooner or later.
But if one gets some (good) directions at certain points/nodes then it is not puzzle palace game.

Remember reading document about using aggregate and/or analytic function in recursive sql. And based on some posts here it seems the case for 11.2.0.3+. Do you know if it is the case for 12c?

Aggregation is never supported; analytic function has been supported in document since RSF was first introduced in 11gR2, but it was not working in 11.2.0.1. The first time I saw it working was in 11.2.0.3. It's also working in 12c.

I see. Thanks. Have it long in mind to test the aggregate function in scalar query using tmp table in recursive sql and just keep slipping so far. But it is time to install 12c.

Having had a closer look at the query in Blais' original post, I agree with James Su that mine above is very similar.
If anyone is still interested I wrote more about this approach here "SQL for Shortest Path Problems  A Programmer Writes… (Brendan's Blog)"
Update, 4 May 2015: I posted this followup: SQL for Shortest Path Problems 2: A Branch and Bound Approach