3 Replies Latest reply on Aug 26, 2008 11:49 AM by dgeringe-Oracle

    ArcGIS > Oracle

      I downloaded the US state borders from the US Census Bureau (1) and loaded the shapefile into Oracle 11.1. I ran sdo_migrate.to_current on the data and created a spatial index. I then tried to take a union of the entire dataset to create a US border like this:

      select sdo_aggr_union(SDOAGGRTYPE(geom, 0.5)) from states

      This operation completed in 25 min on a Core 2 Duo 2.4GHz laptop.

      Next I loaded the same shapefile into ArcMap 9.2 and did the same union operation. This completed in less than a second. I then loaded the new shapefile into Oracle and had my result in a few minutes.

      Why is Oracle 1000x slower than ArcMap at doing unions?

      (1) http://www.census.gov/geo/cob/bdy/st/st00shp/st99_d00_shp.zip
        • 1. Re: ArcGIS > Oracle
          Hi Mike,

          When you put the data into ArcMap 9.2, are you putting it into a topology or a coverage? If you are, then SDO_AGGR_UNION is not an apples to apples comparison.

          Oracle Spatial also has a topology data model.

          I loaded the shapefile, fixed the invalid geomtries, cleaned the data so it can get put into a topology, and put the data into an Oracle Spatial Topology. If you use the Java Shapefile converter, there is no need to run SDO_MIGRATE.TO_CURRENT.

          Once in a topology, I was able to union the 273 polyons to generate the US country boundary in 30 seconds.

          Below is a summary of my analysis and the steps I took:

          The states shape file has 273 polygons, ranging from 5 vertices to 7387 vertices.
          SELECT state, name, sdo_util.getnumvertices (geom)
          FROM census_states
          ORDER BY 3;

          3 have more than 5000 vertices, with a max of 7387.
          2 between 4001 and 5000 vertices
          11 between 3001 and 4000 vertices
          19 between 2001 and 3000 vertices
          7 between 1001 and 2000 vertices
          231 between 0 and 1000 vertices

          There were invalid geometries in the shapefile, and also the geometry edges did not line up perfectly. To fix the geometries and align the edges it took 3 minutes.

          To import the states into a topology (SDO_GEOMETRY to SDO_TOPO_GEOMETRY) took 1.5 minutes.

          Once in the topology, unioning all the states to create a country polygon took 30 seconds.
          • 2. Re: ArcGIS > Oracle
            I really haven't played with topologies in Oracle and the book I have (Pro Oracle Spatial) only has a short appendix on the subject. In the end, it still looks like it still took 5 minutes to do the union, which is much slower than anything else I have tried. Here are some results I have compiled for the aggregate union of the US Census state boundaries:

            Oracle 11.1:     25 min
            PostGIS 1.3.3:     42 sec
            Java2D 1.5:     4 sec
            ArcMap 9.2:     1 sec

            FYI: there is currenty a discussion of these results on the PostGIS mailing list. An upcoming version of PostGIS should be able to do the union in about 4 sec.

            Until just recently, I had assumed that both PostGIS and ArcGIS took into account that the earth is round when doing CAG operations, but apparently that is not true of either. Therefore, my initial comparison was not exactly apples to apples. So today I did another experiment to add to these results. I copied the data into a table with a null SRID and did the aggregate union there. It completed in 5 min. So it seems Oracle is much faster if you tell it to do the union using planar geometry. But even at 5 minutes Oracle is still much slower than everything else.

            I have an unrelated question, can the Oracle topology stuff be used to find and locate gaps between adjacent polygons? I learned recently that the topology stuff in ArcGIS can do this. I successfully found and fixed some gaps in the Digital Chart of the World state boundaries (http://www.maproom.psu.edu/dcw) using ArcMap. It would be interesting to know if I could do the same thing with Oracle Spatial.
            • 3. Re: ArcGIS > Oracle

              Hi Mike,

              Topology is much more than a way to union geometries.

              It is a way to store shared edges and nodes across features just once so moving a shared edge implicitly changes the shape of all features associated with that edge.

              Also, topological relationships are maintained during edits, for example, when editing the shared edge of two adjacent political boundaries, a topology ensures a point (ie city) remains inside the correct boundary.

              Oracle Spatial's topology model leverages enterprise database features, such as high availability, manageability, security, and concurrent user access. These are just a few reasons why customers like the US Census Bureau have chosen the Oracle Spatial topology data model for projects like the TIGER 2010 modernization.

              Comparison with a non-database storage ArcGIS solution that compromises high availability, manageability, security, and concurrent user access, in my opinion, is not an apples to apples comparison.

              Once the features are loaded in Oracle Spatial's topology data model, unioning is one of many things you can do.

              Any combination of unions can be made, ie. two states, ten states, all 273 polygons, etc, without repeating the data cleansing/load processes. The union of all 273 polygons took 30 seconds from the topology.

              I would not count the data cleansing or topology load times as part of the union time.

              That said, we are currently investigating making SDO_AGGR_UNION faster.

              Once again, SDO_AGGR_UNION is only one feature of many Oracle Spatial features.

              Most spatial operations are much faster on Oracle Spatial/Locator than PostGIS.
              This has been confirmed by several customers testing on their own data sets.

              Here are some excerpts from the PostGIS documentation, with my comments in parenthesis:

              * Functions such as distance() cannot use the index to optimize their operation. \\      For example, the following query would be quite slow on a large table: \\      SELECT the_geom \\      FROM geom_table \\      WHERE distance( the_geom, GeomFromText( 'POINT(100000 200000)', -1 ) ) < 100 \\ \\      We can avoid this by using the && operator to reduce the number of \\      distance calculations required: \\ \\      SELECT the_geom \\      FROM geom_table \\      WHERE the_geom && 'BOX3D(90900 190900, 100100 200100)'::box3d \\      AND distance( the_geom, GeomFromText( 'POINT(100000 200000)', -1 ) ) < 100 \\ \\      (My comment - NOTE all the candidates that return from && must get evaluated by distance. This is done much more optimally with Oracle Spatial's SDO_WITHIN_DISTANCE).

              * SELECT m.name, sum(length(r.the_geom))/1000 as roads_km \\      FROM bc_roads AS r,bc_municipality AS m \\      WHERE r.the_geom && m.the_geom \\      AND contains(m.the_geom,r.the_geom) \\      GROUP BY m.name ORDER BY roads_km; \\ \\      This query takes a while, because every road in the table is summarized into the final \\      result (about 250K roads for our particular example table). \\ \\      (My comment - NOTE all the candidates that return from && must get evaluated by contains. This is done much more optimally with Oracle Spatial's SDO_ANYINTERACT or SDO_CONTAINS).
              As mentioned above in the PostGIS documentation excerpts, bounding box overlap (specified as && in PostGIS) and secondary filters are always separate.

              In PostGIS, all the candidates that return from && get evaluated in a PostGIS secondary filter,
              like distance() or contains().

              Unlike PostGIS, Oracle Locator/Spatial operators like SDO_WITHIN_DISTANCE and SDO_ANYINTERACT are optimized
              to scale and run faster with larger query windows that return more candidates in the result set.

              As you pointed out, neither of the other solutions, PostGIS or ArcGIS take into account the Earth's curvature for CAG or Spatial query opeations.

              Oracle Locator and Oracle Spatial include spatial indexes and spatial operators that consider the Earth's curvature. Neither PostGIS or ArcGIS (without Oracle Locator/Oracle Spatial as the underlying spatial data store) do so.