This discussion is archived
1 2 Previous Next 20 Replies Latest reply: Apr 24, 2012 7:45 AM by Pleiadian RSS

SDO_AGGR_UNION very slow

Waldecir P. Junior Newbie
Currently Being Moderated
Hello friends,

I'm having problems with SDO_AGGR_UNION function in Oracle 11.2.0.3. The performance is very bad.

I need to aggregate several polygons (645) using SDO_AGGR_UNION.

My table is:

SQL> DESC GC_MUNICIPALITY;
Nome Nulo Tipo
-------------- --------------- ------
STATE_ID NOT NULL NUMBER(5)
GEOLOC SDO_GEOMETRY()

First I ran the following query:

SELECT SDO_AGGR_UNION(SDOAGGRTYPE(geoloc, 0.5)) geoloc
FROM GC_MUNICIPALITY
WHERE STATE_ID = 35

Execution time: 57 minutes

Then, I ran then aggregate union with groupings:

SELECT SDO_AGGR_UNION(SDOAGGRTYPE(geoloc,0.5)) geoloc
FROM
(SELECT SDO_AGGR_UNION(SDOAGGRTYPE(geoloc,0.5)) geoloc
FROM
(SELECT SDO_AGGR_UNION(SDOAGGRTYPE(geoloc,0.5)) geoloc
FROM
(SELECT SDO_AGGR_UNION(SDOAGGRTYPE(geoloc, 0.5)) geoloc
FROM GC_MUNICIPALITY
WHERE STATE_ID = 35
GROUP BY MOD(ROWNUM, 15))
GROUP BY MOD (ROWNUM, 7))
GROUP BY MOD (ROWNUM, 2)
);

Execution time: 15 minutes

The second execution was faster than the first, but still very slow.

If I use ArcGis, PostGis or JTS, the same aggregation is executed in few seconds.

Why is this function so slow in Oracle?

Is there another way to accomplish this aggregation with better performance?

I read in another forum that if I convert from SDO_GEOMETRY to SDO_TOPO_GEOMETRY the aggregation is faster. However, the SDO_AGGR_UNION works only with SDO_GEOMETRY type. Is there any way to aggregate SDO_TOPO_GEOMETRY?

Thanks!

Edited by: user12000327 on 03/04/2012 09:30

Edited by: user12000327 on 03/04/2012 09:33
  • 1. Re: SDO_AGGR_UNION very slow
    Paul Dziemiela Journeyer
    Currently Being Moderated
    Hello unnamed user12000327,

    What is the SRID of the data you are aggregating? If its a geodetic SRID then make sure your comparisons also do their aggregations on the ellipsoid and aren't just projecting the data.

    What is the average vertice count of one of your polygons? Are these things enormous?

    That being said SDO_AGGR_UNION is pretty pokey due to the overhead of the ODCIAggregate system. I assume that is what led the spatial team to create SDO_AGGR_SET_UNION for 11g. Have you tried your processing using the new function?

    There are a lot of things you can do to optimize your aggregation process. If you try it in several passes like your original query, the trick is to first aggregation geometries in proximity to each other. So if you have some kind of grouping key things will do much better. For example if the polygons are postal codes and you know the postal codes with similar prefixes are grouped spatially together then aggregating in passes on those prefixes will help a lot.

    The topology solution is rather extreme but as I think its my solution you are referring to, I will meekly defend it as a possibility if you cannot get anything else to provide the performance you need.

    Cheers,

    Paul
  • 2. Re: SDO_AGGR_UNION very slow
    Simon Greener Journeyer
    Currently Being Moderated
    The poor performance of Oracle's SDO_AGGR_UNION has been known for years.

    It is a pity because there must be ways around the sort context problem that would allow for more scalable passing of geometries to a union algorithm.

    PostGIS fixed this quite a few years ago - in version 1.4 to be precise - as can be seen from this posting by Paul Ramsey (http://blog.cleverelephant.ca/2009/01/must-faster-unions-in-postgis-14.html).

    I made an unpopular comment at the 2011 Oracle Spatial User Conference in Washington about this.

    Unpopular because I asked a question/made a comment in the question time of an excellent paper given by Dan Geringer on Spatial and Exadata. My comment was seen as being unhelpful and irrelevant.

    For those of us who cannot afford Exadata machines, we are left to wonder why it is that everyone else - in their base/free products - can produce a fast union but not Oracle. (Even SQL Server 2012's union is pretty good.)

    I don't like putting myself in the firing line, but the performance of SDO_AGGR_UNION is embarrassing - more people need to demand a solution.

    I played around with a custom aggregate over this year's Christmas period. I tried to use Java as that is what I know but the same sort context problem stopped my attempts to use the same JTS STR-Tree based cascading union to implement something faster for Oracle users. I failed but I believe - if all comments out there are to be believed - that the sort context problem can be solved in pure C. If this is the case, writing a C extension to use GEOS (the JTS port for C/C++ users) is definitely possible.

    With regards SDO_AGGR_SET_UNION, it is a bit faster but the fact that it will throw sort context errors when the collection gets too big indicates that it cannot be relied upon and so should be ignored until a decent - behind the scenes - replacement algorithm is implemented for SDO_AGGR_UNION.

    For all those others who have had the same experience with SDO_AGGR_UNION raise your voices via the this forum, via Oracle Support or at Oracle Spatial User Conferences.

    regards
    Simon
  • 3. Re: SDO_AGGR_UNION very slow
    John O'Toole Journeyer
    Currently Being Moderated
    Agreed. The performance of SDO_AGGR_UNION really is abysmal. Search this forum for "SDO_AGGR_UNION" and most of the results relate to performance issue.
    It'd be good to get some feedback from Oracle as to why they decided to introduce the very cumbersome (and not always working) SDO_AGGR_SET_UNION, rather than optimise SDO_AGGR_UNION to use a cascading union approach.

    John
  • 4. Re: SDO_AGGR_UNION very slow
    Waldecir P. Junior Newbie
    Currently Being Moderated
    Hello Paul and Simon,

    The SRID is 8307.

    The query returns 645 polygons, ranging from 66 vertices to 8030 vertices.
    SELECT sdo_util.getnumvertices (geoloc)
    FROM gc_municipality
    where state_id = 35
    ORDER BY 1;

    1 have 8330
    2 between 5001 and 6000 vertices
    5 between 4001 and 5000 vertices
    8 between 3001 and 4000 vertices
    33 BETWEEN 2001 AND 3000 vertices
    163 BETWEEN 1001 AND 2000 vertices
    433 between 1 and 1000 vertices

    I did some tests with SDO_AGGR_SET_UNION and it was a bit faster (10 minutes), but the performance was still very bad compared to other approaches (PostGis, ArcGis, JTS, ...) (few seconds)
    The performance difference is huge.

    I'm very disappointed with the performance of the SDO_AGGR_UNION. Even though most spatial operations are much faster on Oracle Spatial than on others approaches, the SDO_AGGR_UNION is a very useful function than in my opnion needs to be treated more carefully by the development team.
    I would not like to replace the SDO_AGGR_UNION by another non-Oracle tool. I really wished that the union algorithm were improved.

    I agree with Simon, more people should demand a solution to this problem.

    Regards!

    Waldecir
  • 5. Re: SDO_AGGR_UNION very slow
    yhu Journeyer
    Currently Being Moderated
    Hi Waldecir,

    Please contact oracle support and file a performance bug/enhancement.

    Thanks,
    Ying
  • 6. Re: SDO_AGGR_UNION very slow
    Waldecir P. Junior Newbie
    Currently Being Moderated
    Hi Ying,

    Ok, I will do it.

    Thanks,
    Waldecir
  • 7. Re: SDO_AGGR_UNION very slow
    Simon Greener Journeyer
    Currently Being Moderated
    I find it interesting that the 2012 User Conference - announced on LinkedIn and not on this Forum - has this:
    Subject: Executive Speakers Announced for Oracle Spatial User Conference
    ....
    ** New Performance Enhancements in Oracle Spatial — Siva Ravada
    The issues of performance are always openly discussed on the PostGIS, JTS etc discussion lists. I have even had honest discussion of performance with the SQL Server Spatial team.

    But with Oracle one always seems to come up against a communications black hole in which public discussion of performance by Oracle staff eg SDO_AGGR_UNION is not allowed. Perhaps
    there is a corporate firewall stopping the discussion of issues that may be deemed to affect sales and marketing. If that is the case, fair enough, but the current, very formal methods (eg metalink)
    of registering performance concerns don't the engender the open discussion that users expect nowadays when interacting with software vendors on the internet.

    So, Siva, will there be any posting on this forum with a hint of what might be in what will probably be a very good talk? (PS If your talk discusses Exadata for more than 10 minutes I for one will turn off.)

    regards
    Simon
  • 8. Re: SDO_AGGR_UNION very slow
    Siva Ravada Expert
    Currently Being Moderated
    Simon,

    This year there will be several tracks of talks and my talk won't mention Exadata.

    As for the sdo_aggr_union issue, as we talked about in the last conference, the frame work we have in Oracle for these kinds of aggregate operations
    is very severely restrictive against using a lot of memory. So within that framework, we don't have a way of collecting a set of geometries and doing
    the union operation once on the whole set. So we have to do it iteratively which leads to the performance cost.
    There is a reason for this restriction. Since Oracle process memory is shared by many users, it is not a good idea to allow one user to monopolize the memory
    causing problems to other users.

    That is the reason we introduced the sdo_aggr_set_union operation as an alternative way to do these union operations. Since the aggregate framework
    is not used in this case, we have more flexibility in how we collect all the geometries and do the union operation once.

    Even with this, in the 11gR2 release, a lot of time is spent in collecting the geometries and getting them into memory. So we are working on
    improving that part of the process now.

    siva
  • 9. Re: SDO_AGGR_UNION very slow
    Siva Ravada Expert
    Currently Being Moderated
    Simon,

    If you have cases where the sdo_aggr_set_union is throwing errors, we would like to know about them.

    siva
  • 10. Re: SDO_AGGR_UNION very slow
    Simon Greener Journeyer
    Currently Being Moderated
    Siva,

    Wrt examples for sdo_aggr_set_union, surely you have some reasonably large dataset you use for scalability testing?

    Any, here is an example from real data (names changed to protect owner):
    select sdo_aggr_set_union(CAST(COLLECT(a.geometry) AS mdsys.SDO_Geometry_Array),0.005) as aggrGeom
     from largedataset  a
     where a.filter_code = 'WA'
     group by a.group_code;
    -- Results 
    ORA-22813: operand value exceeds system limits
    22813. 00000 -  "operand value exceeds system limits"
    *Cause:    Object or Collection value was too large. The size of the value
               might have exceeded 30k in a SORT context, or the size might be
               too big for available memory.
    *Action:   Choose another value and retry the operation.
    If I remove the sdo_aggr_set_union to give you some numbers....
    select count(*) as geomCount, round(avg(sdo_util.getNumVertices(a.geometry)),1) as avgVertices
     from largedataset  a
     where a.filter_code = 'ME'
     group by a.group_code;
     -- Results
    GEOMCOUNT AVGVERTICES
    --------- -----------
    212       96.2
    22        27.1
    68        159
    191       116.4
    374       56.7
    58        135.4
    275       126.3
    65        112.2
    110       87
    206       143.3
    83        173.7
    32        142.1
    136       186.7
    94        151.3
    812       136.4
    184       119.6
    268       112.2
    51        137.2
    159       173.8
    198       160.5
    264       118.6
    114       103.5
    138       93.8
    96        86.6
    23        188.7
    136       199.9
    238       134.7
    58        142.5
    53        142.1
    131       179.4
    122       96.4
    42        133.3
    489       162.8
    
     33 rows selected
    Unless, of course, I am doing something wrong?

    Simon
  • 11. Re: SDO_AGGR_UNION very slow
    Simon Greener Journeyer
    Currently Being Moderated
    Siva,

    I would LOVE to be able to use EXADATA but the sort of customers using Spatial in Australia can't afford it.

    For me, as you know, there are more important and immediate issues that affect customers.
    Since Oracle process memory is shared by many users, it is not a good idea to allow one user to monopolize the memory causing problems to other users.
    Since when does any software company tell users what is a good idea or not? This sends a bad message to users about their communications with (all) their customers (not just the "big end of town").

    Oracle DB is highly configurable.

    That is one of its great technical strengths.

    So why is it that the sort context is fixed at 30K? Why is it not a parameter that the DBA can change to reflect the circumstances that the database finds itself in. If one needs to conduct aggregations using large geometries surely one should be able to change the 30k SORT context value? Or is this hard-coded into the kernel and can only be changed with difficulty?

    I have played around with native and Java based custom aggregates but I constantly run into the context problem. I have read that a custom aggregate written in C (extproc) might get around the 30k sort context problem. I am tempted to try myself but I would have assumed that this is how you developed your sdo_aggr_union in the first place such that after a lot of effort the aggregate would fail just like everything else (though ESRI seem to be able to do this OK with their SDE.ST_GEOMETRY's ST_Aggr_Union operator).

    Is there any light at the end of this very long tunnel, Siva?

    I hope your paper goes well - I can't wait to read about the new performance improvements your paper will talk on.

    regards
    Simon
  • 12. Re: SDO_AGGR_UNION very slow
    Siva Ravada Expert
    Currently Being Moderated
    Simon,

    This group by won't work as you hit that internal limit in the server.
    So try the other approach described in the user guide using a function that creates a geometry array and passes it to the union function.

    I am not an expert on this issue of that system limit of 30k sort area. When we talk to internal groups about it, they always
    ask us to show the customer filed ER.
    So it will really help if customers like you file an official ER for this.

    Having said that, the main problem with this approach is still the multiple copies of the data which tends to take up
    most of the time. So we are coming up with a better way to call sdo_aggr_set_union that avoid this multiple copies of the data.

    siva
  • 13. Re: SDO_AGGR_UNION very slow
    Simon Greener Journeyer
    Currently Being Moderated
    Siva,
    This group by won't work as you hit that internal limit in the server.
    Patently! I provided this example because this is what people want in an aggregate and it is the way most people think of it.
    So try the other approach described in the user guide using a function that creates a geometry array and passes it to the union function.
    OK, two more assertions:

    1. Working with users over the years I have discovered they can use SQL reasonably well but I am surprised how many simply don't want to go down a PL/SQL function route!
    2.The example given is very opaque as to how to do a SELECT ... GROUP BY aggregation using sdo_aggr_set_union.

    Personally, I would prefer to expose the grouping value....
    CREATE OR REPLACE FUNCTION Set_Geometry(p_value in varchar2)
    RETURN SDO_GEOMETRY_ARRAY 
    deterministic 
    AS
      c_query   SYS_REFCURSOR;
      v_g       sdo_geometry;
      v_GeomArr sdo_geometry_array;
    BEGIN
      v_GeomArr := SDO_GEOMETRY_ARRAY();
      OPEN c_query FOR 'select a.geometry from admin a where a.blockcode = :1'
                 USING p_value;
       LOOP
        FETCH c_query into v_g;
         EXIT when c_query%NOTFOUND ;
         v_GeomArr.extend;
         v_GeomArr(v_GeomArr.count) := v_g;
       END LOOP;
       RETURN v_GeomArr;
    END;
    /
    -- Which is called like so:
    --
    select a.group_code, count(*) as aggrCount, sdo_aggr_set_union(set_geometry(a.group_code),0.005) as geoms
     from admin a
     where a.hierarchy_code = 'WA'
     group by a.group_code;
    But this is quite inflexible because it really isn't a generic function but rather a function for a specific piece of SQL.

    Because of this I prefer something more generic:
    CREATE OR REPLACE FUNCTION Set_Geometry(p_cursor in SYS_REFCURSOR)
    RETURN SDO_GEOMETRY_ARRAY 
    DETERMINISTIC 
    AS
       v_geom    sdo_geometry;
       v_GeomArr sdo_geometry_array;
    BEGIN
       v_GeomArr := SDO_GEOMETRY_ARRAY();
       LOOP
        FETCH p_cursor INTO v_geom;
         EXIT when p_cursor%NOTFOUND ;
         v_GeomArr.extend;
         v_GeomArr(v_GeomArr.count) := v_geom;
       END LOOP;
       RETURN v_GeomArr;
    END;
    /
    -- Results
    FUNCTION Set_Geometry compiled
    --
    -- Now here's how to use it in a SELECT .... GROUP BY ...
    --
    select group_code, 
           count(*) as aggrCount, 
           sdo_aggr_set_union(set_geometry(CURSOR(SELECT b.geometry FROM test_table b WHERE b.group_code= a.group_code)),0.005) as geoms
     from test_table a
     where a.hierarchy_code = 'WA'
     group by a.group_code;
    Now I agree SDO_AGGR_SET_UNION is much faster and, with my function, more flexible but it is still ugly.
    I am not an expert on this issue of that system limit of 30k sort area. When we talk to internal groups about it, they always
    ask us to show the customer filed ER. So it will really help if customers like you file an official ER for this.
    Surely someone in the spatial team is an expert!

    I can't file an ER because my Oracle use is under an OTN license for development purposes. I have asked complaining customers to do so over the years but for some reason they don't so I am left to make a fool of myself on their behalf.
    Having said that, the main problem with this approach is still the multiple copies of the data which tends to take up
    most of the time. So we are coming up with a better way to call sdo_aggr_set_union that avoid this multiple copies of the data.
    In your UC presentation? It is good for the public to know that the OS team is always trying to make things better and faster.

    regards
    Simon
  • 14. Re: SDO_AGGR_UNION very slow
    Luc Van Linden Pro
    Currently Being Moderated
    Hi Siva

    I want to support Simon and the others on this.

    This topic is on the forum's agenda for years, without being properly addressed (performance and simplicity).
    We understand you have processes to follow, but on the other hand, one of the reasons why we contribute constructively here is also that we (maybe naively) hope it is picked by you and your team.

    That the aggregating of spatial data is mostly related to the 30k sort issue is also known for years.
    (I was surprised it is also more then 5 years ago I tried this : Re: Does anyone have an Aggregate APPEND function

    Before 2007 I was in a position to file ER theoretically, although I must say I never triggered it as it had to run through a couple of internal and external departments first.
    When you're in a time-pressure task, one will take it off-line (if feasible) and process it using other tools.
    At the same time, let's be honest, union-ing is one of the most basic and common task for dealing with spatial data. Also one of the reasons why at least I hoped and expected it will be tackled by Oracle in its next release.

    Looking at results Simon's time and effort there seems back some light in the tunnel for a workable approach on the unioning. Time to test that again.

    Constructively.

    Luc
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points