This discussion is archived
8 Replies Latest reply: May 1, 2013 6:06 AM by ronnie-m RSS

Creating connectivity data from spatial data

ronnie-m Newbie
Currently Being Moderated
Hi,

I have some spatial data, of which I will try to describe the relevant aspects:

- A LineSegment table, which contains information on cables, including Area ID (approx 4 million rows)
- A Location table, which contains a LineSegmentID (one-to-one), and a Geometry (spatial) column (approx 4 million rows).

What I need to do is create a new table, containing conceptual "nodes", containing the following columns:

- NodeID (number)
- LineSegmentID (number)
- LineSegmentEnd (1 or 2)

So I need to work out, for each cable, which other cables it connects to, by comparing its endpoints with the endpoints of other cables in the same area. An area contains up to 464 cables. There are a total of 160 thousand areas.

I am trying to work out the most efficient way of achieving this, ideally by performing a batch job that will take less than half an hour. Oracle is relatively new to me, but I am guessing the correct approach would be to use a series of intermediate (staging) tables, as I believe nested cursors would be way too slow (I have performed a simple test to confirm this).

I guess I will need to extract, into a temp table, the start point and end point of each cable using SDO_LRS.GEOM_SEGMENT_START_PT and SDO_LRS.GEOM_SEGMENT_END_PT, along with area ID. Then join the table with itself and use SDO_GEOM.SDO_DISTANCE to work out which points are close to each other (e.g. less than one metre). However I am struggling to outline a step by step process.

Does anyone have any thoughts which may help?

(Oracle 11g)

Sample data to illustrate problem:

create table line_location (lineid number,
geometry sdo_geometry);
create table line (id number, areaid number);

-- one cable in area 1, 2 in area 4, etc.
insert into line values (1, 1);
insert into line values (2, 4);
insert into line values (3, 4);
insert into line values (4, 3);
insert into line values (5, 3);
insert into line values (6, 3);
insert into line values (7, 2);
insert into line values (8, 2);
insert into line values (9, 2);
insert into line values (10, 2);

-- in reality the lines aren't necessarily straight and simple like these...
insert into line_location values (1, MDSYS.SDO_GEOMETRY(2002,3785,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(10,1,10,4))); -- area 1
insert into line_location values (2, MDSYS.SDO_GEOMETRY(2002,3785,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(3,9,5,9))); -- area 4
insert into line_location values (3, MDSYS.SDO_GEOMETRY(2002,3785,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(5,9,5,10))); -- area 4
insert into line_location values (4, MDSYS.SDO_GEOMETRY(2002,3785,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(1,1,2,1))); -- area 3
insert into line_location values (5, MDSYS.SDO_GEOMETRY(2002,3785,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(2,3,2,1))); -- area 3
insert into line_location values (6, MDSYS.SDO_GEOMETRY(2002,3785,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(2,3,3,3))); -- area 3
insert into line_location values (7, MDSYS.SDO_GEOMETRY(2002,3785,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(5,3,5,4))); -- area 2
insert into line_location values (8, MDSYS.SDO_GEOMETRY(2002,3785,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(4,4,5,4))); -- area 2
insert into line_location values (9, MDSYS.SDO_GEOMETRY(2002,3785,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(5,9,6,4))); -- area 2
insert into line_location values (10, MDSYS.SDO_GEOMETRY(2002,3785,NULL,MDSYS.SDO_ELEM_INFO_ARRAY(1,2,1),MDSYS.SDO_ORDINATE_ARRAY(5,7,5,9))); -- area 2

create table node_line (node_id number,
line_id number,
endpoint_id number, -- 1 for startpoint, 2 for endpoint,
area_id number
);

-- expected output here. if two lines are less than 0.5 from each other, then they should share a node.
insert into node_line values (1, 1, 1, 1); -- isolated cable in area 1, start point node
insert into node_line values (2, 1, 2, 1); -- isolated cable in area 1, end point node
insert into node_line values (3, 2, 1, 4); -- area 4, cable 2, start point node
insert into node_line values (4, 2, 2, 4); -- area 4, cable 2, end point node
insert into node_line values (4, 3, 1, 4); -- cable 2's end point node = cable 3's start point node, etc
insert into node_line values (5, 3, 2, 4);
insert into node_line values (6, 4, 1, 3); -- node at (1,1)
insert into node_line values (7, 4, 2, 3); -- node at (2,1)
insert into node_line values (8, 5, 1, 3); -- node at (2,3)
insert into node_line values (7, 5, 2, 3); -- node at (2,1)
insert into node_line values (8, 6, 1, 3); -- node at (2,3)
insert into node_line values (9, 6, 2, 3); -- node at (3,3)
insert into node_line values (10, 7, 1, 2); -- node at (5,3)
insert into node_line values (11, 7, 2, 2); -- node at (5,4)
insert into node_line values (12, 7, 1, 2); -- node at (4,4)
insert into node_line values (11, 7, 2, 2); -- node at (5,4)
insert into node_line values (13, 7, 1, 2); -- node at (5,9)
insert into node_line values (14, 7, 2, 2); -- node at (6,4)
insert into node_line values (15, 7, 1, 2); -- node at (5,7)
insert into node_line values (13, 7, 2, 2); -- node at (5,9)

Thanks
  • 1. Re: Creating connectivity data from spatial data
    Luc Van Linden Pro
    Currently Being Moderated
    Hi Ronnie

    Have a look at this thread. Re: Spatial Network constraints

    This should get you started.

    I think you just have to refine the filtering on the area id.

    Let us know how you get along or get in touch.

    Regards

    Luc
  • 2. Re: Creating connectivity data from spatial data
    Stefan Jager Journeyer
    Currently Being Moderated
    Hello Ronnie,

    For what it's worth, here's my 2 cents:

    1. Validate all the geometries
    2. Get the start and endpoints off the geometry using SDO_UTIL.GETNUMVERTICES and SDO_UTIL.GETVERTICES, and stuff them into a temp table
    3. set a spatial index on the endpoints and startpoints, and on the area id in the temp table.
    4. Select all start and endpoints using the area id, since you only need to connect them per area
    5. Compare them and register as needed. If you create point-geometries of them, you'll get some performance benefit - no need to fill the sdo_ordinate_array. SDO_EQUAL should then do what you want, using one-metre tolerance. Or SDO_NN.

    Will this take less then half an hour? Don't know, but let's look at a worst case scenario: 464 cables times two = 928 points you have to compare to eachother, and you have to do that 160.000 times. If you can get something to do these checks rather fast, in let's say 0.1 millisecond, it will still take you 14848000 milliseconds (unless you can do it parallel).

    And of course you can look into letting oracle do the work for you by generating an LRS network (that'll give you Node and Segment tables, which you might be able to query to generate the information you need.

    HTH,
    Stefan
  • 3. Re: Creating connectivity data from spatial data
    ronnie-m Newbie
    Currently Being Moderated
    Thanks Stefan.

    I made some similar calculations earlier and realised this is a mammoth task.

    Consequently I have decided, for now, to just attempt to populate the data for one area, and from there I may be able to calculate the connectivity for each area on demand.

    Have worked out how to do this using a cursor (I think), but ideally would like to figure out how to do it without using a cursor (for probable performance impact).
  • 4. Re: Creating connectivity data from spatial data
    Luc Van Linden Pro
    Currently Being Moderated
    Hi Ronnie

    Did you had a look at the script on the older thread?
    This performs in a slightly different result structure what you are after.
    I took the time this morning to see it a bit optimized.

    Find below the result.



    With a couple on indexes and the use of SDO_JOIN rather the sdo_relate this is speeding up dramatically.
    I had it tested on a 600k line objects (which is not 4 million I know) and is reasonable ok on my (not optimized) test environment.

    On the "1 meter" close to each other, I would tend to have taken care of by Spatial itself by setting the tolerance appropriately, so
    there should be no reason to perform a within distance check. Obviously only if the resolution of your data allows this.


    Have a look at it.

    Note that the final table is different in structure, but this should be easily adjusted in the script if your node_line table needs to be exactly as you defined it.

    Luc



    -- drop existing SEQ_TOPO

    drop sequence SEQ_TOPO;



    -- create sequence with caching

    CREATE SEQUENCE SEQ_TOPO CACHE 2000;



    commit;



    --drop temp table

    drop table temp_nodes cascade constraints;





    -- create temp table and populate with startponts, and a field saying these are starts, also X,Y as we use it later to remove duplicates in a non spatial manner

    create table temp_nodes

    as

    select a.lineid, a.areaid , sdo_geometry(2001, a.node.sdo_srid, sdo_point_type(t.X, t.Y, null), null, null) as node, SEQ_TOPO.nextval node_id, 'S' AS STEND, t.x, t.y

    from

    (select lineid, areaid , sdo_lrs.geom_segment_start_pt(geometry) node from line_location, LINE where line_location.lineid = line.line_id) a, TABLE(SDO_UTIL.GETVERTICES(a.node)) t ;



    commit;



    -- insert the end points into the temp table

    insert into temp_nodes

    select a.lineid, a.areaid , sdo_geometry(2001, a.node.sdo_srid, sdo_point_type(t.X, t.Y, null), null, null) as node, SEQ_TOPO.nextval node_id, 'E' AS STEND, t.x, t.y

    from

    (select lineid, areaid , sdo_lrs.geom_segment_end_pt(geometry) node from line_location, LINE where line_location.lineid = line.line_id) a, TABLE(SDO_UTIL.GETVERTICES(a.node)) t ;



    commit;



    -- insert user_sdo_geom_metadata and have index created for temp_nodes

    -- adjust with appropriate metadata values for srid, upper and lowebounds and tolerance of your dataset

    -- Here the tolerance is set to 1 meter, this way we do not need to use a within distance, just let the tolerance help us out here

    -- obviously this can only work if the tolerance is smaller then the distance between start and end from the link itself.



    delete from user_sdo_geom_metadata where table_name = 'TEMP_NODES';

    INSERT INTO user_sdo_geom_metadata VALUES ('TEMP_NODES','NODE',SDO_DIM_ARRAY(SDO_DIM_ELEMENT('X', 0, 1000000, 1),SDO_DIM_ELEMENT('Y', 0, 100000, 1)), 3785);



    -- create spatial index with gtype=POINT to use internal optimisation

    drop index node_sx;

    CREATE INDEX node_sx ON temp_nodes(node) INDEXTYPE IS MDSYS.SPATIAL_INDEX PARAMETERS('sdo_indx_dims=2, layer_gtype=POINT');



    -- create index on X,Y combination to speed up "removing duplicates" (with the group by) is actually a "select unique" rather then remove duplicates

    CREATE INDEX INDEX1 ON TEMP_NODES (X,Y);

    CREATE INDEX INDEX2 ON TEMP_NODES (node_id);





    -- create the final node table with unique nodes from the temp table, x,y could be ommitted

    create table node_topo

    as

    select a.nodeid, t.node, t.x, t.y

    from

    (

    select min(node_id) as nodeid

    from

    temp_nodes

    group by X, Y

    ) a inner join

    temp_nodes t

    on (a.nodeid = t.node_id)

    ;



    commit;



    -- insert metadata info



    delete from user_sdo_geom_metadata where table_name = 'NODE_TOPO';

    INSERT INTO user_sdo_geom_metadata VALUES ('NODE_TOPO','NODE',SDO_DIM_ARRAY(SDO_DIM_ELEMENT('X', 0, 1000000, 1),SDO_DIM_ELEMENT('Y', 0, 100000, 1)), 3785);



    -- create spatial index on final node table with gtype=POINT (internal optimization)



    drop index node_topo_sx;

    CREATE INDEX node_topo_sx ON NODE_TOPO(node) INDEXTYPE IS MDSYS.SPATIAL_INDEX PARAMETERS('sdo_indx_dims=2, layer_gtype=POINT');



    -- create final node_link table using SDO_JOIN between final node table and temp table

    -- the NAYINTERACT should take care of the "snapping" as the tolerance will be applied



    create table node_line

    as

    select lineid, max(st_ID) START_NODE_ID, max(en_ID) END_NODE_ID, max(areaid) AREAID

    from

    (

    SELECT b.lineid, case when b.stend = 'S' THEN a.nodeid else 0 end st_ID,

    case when b.stend = 'E' THEN a.nodeid else 0 end en_ID, areaid

    FROM TABLE(SDO_JOIN('NODE_TOPO', 'NODE',

    'TEMP_NODES', 'NODE',

    'mask=ANYINTERACT')) c,

    node_topo a,

    temp_nodes b

    WHERE c.rowid1 = a.rowid AND c.rowid2 = b.rowid

    )

    group by lineid;



    commit;



    -- drop temp objects



    drop table temp_nodes cascade constraints;

    delete from user_sdo_geom_metadata where table_name = 'TEMP_NODES';



    commit;



    drop sequence seq_topo;



    commit;
  • 5. Re: Creating connectivity data from spatial data
    ronnie-m Newbie
    Currently Being Moderated
    Wow, that looks really promising Luc, thank you.

    Unfortunately I am away until next week, but I will try this approach and let you know how I get on.
  • 6. Re: Creating connectivity data from spatial data
    ronnie-m Newbie
    Currently Being Moderated
    This works, thank you.

    Will test it on the large dataset later this week and post performance results.
  • 7. Re: Creating connectivity data from spatial data
    Luc Van Linden Pro
    Currently Being Moderated
    Hi Ronnie

    Glad to hear this. Looking forward to your performance results.

    Luc

    Edited by: Luc Van Linden on Apr 22, 2013 2:15 PM
  • 8. Re: Creating connectivity data from spatial data
    ronnie-m Newbie
    Currently Being Moderated
    Have today ran query on large dataset and performance was great, minutes rather than hours!

    Thanks again.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points