This content has been marked as final. Show 20 replies
Hello unnamed user12000327,
What is the SRID of the data you are aggregating? If its a geodetic SRID then make sure your comparisons also do their aggregations on the ellipsoid and aren't just projecting the data.
What is the average vertice count of one of your polygons? Are these things enormous?
That being said SDO_AGGR_UNION is pretty pokey due to the overhead of the ODCIAggregate system. I assume that is what led the spatial team to create SDO_AGGR_SET_UNION for 11g. Have you tried your processing using the new function?
There are a lot of things you can do to optimize your aggregation process. If you try it in several passes like your original query, the trick is to first aggregation geometries in proximity to each other. So if you have some kind of grouping key things will do much better. For example if the polygons are postal codes and you know the postal codes with similar prefixes are grouped spatially together then aggregating in passes on those prefixes will help a lot.
The topology solution is rather extreme but as I think its my solution you are referring to, I will meekly defend it as a possibility if you cannot get anything else to provide the performance you need.
The poor performance of Oracle's SDO_AGGR_UNION has been known for years.
It is a pity because there must be ways around the sort context problem that would allow for more scalable passing of geometries to a union algorithm.
PostGIS fixed this quite a few years ago - in version 1.4 to be precise - as can be seen from this posting by Paul Ramsey (http://blog.cleverelephant.ca/2009/01/must-faster-unions-in-postgis-14.html).
I made an unpopular comment at the 2011 Oracle Spatial User Conference in Washington about this.
Unpopular because I asked a question/made a comment in the question time of an excellent paper given by Dan Geringer on Spatial and Exadata. My comment was seen as being unhelpful and irrelevant.
For those of us who cannot afford Exadata machines, we are left to wonder why it is that everyone else - in their base/free products - can produce a fast union but not Oracle. (Even SQL Server 2012's union is pretty good.)
I don't like putting myself in the firing line, but the performance of SDO_AGGR_UNION is embarrassing - more people need to demand a solution.
I played around with a custom aggregate over this year's Christmas period. I tried to use Java as that is what I know but the same sort context problem stopped my attempts to use the same JTS STR-Tree based cascading union to implement something faster for Oracle users. I failed but I believe - if all comments out there are to be believed - that the sort context problem can be solved in pure C. If this is the case, writing a C extension to use GEOS (the JTS port for C/C++ users) is definitely possible.
With regards SDO_AGGR_SET_UNION, it is a bit faster but the fact that it will throw sort context errors when the collection gets too big indicates that it cannot be relied upon and so should be ignored until a decent - behind the scenes - replacement algorithm is implemented for SDO_AGGR_UNION.
For all those others who have had the same experience with SDO_AGGR_UNION raise your voices via the this forum, via Oracle Support or at Oracle Spatial User Conferences.
Agreed. The performance of SDO_AGGR_UNION really is abysmal. Search this forum for "SDO_AGGR_UNION" and most of the results relate to performance issue.
It'd be good to get some feedback from Oracle as to why they decided to introduce the very cumbersome (and not always working) SDO_AGGR_SET_UNION, rather than optimise SDO_AGGR_UNION to use a cascading union approach.
Hello Paul and Simon,
The SRID is 8307.
The query returns 645 polygons, ranging from 66 vertices to 8030 vertices.
SELECT sdo_util.getnumvertices (geoloc)
where state_id = 35
ORDER BY 1;
1 have 8330
2 between 5001 and 6000 vertices
5 between 4001 and 5000 vertices
8 between 3001 and 4000 vertices
33 BETWEEN 2001 AND 3000 vertices
163 BETWEEN 1001 AND 2000 vertices
433 between 1 and 1000 vertices
I did some tests with SDO_AGGR_SET_UNION and it was a bit faster (10 minutes), but the performance was still very bad compared to other approaches (PostGis, ArcGis, JTS, ...) (few seconds)
The performance difference is huge.
I'm very disappointed with the performance of the SDO_AGGR_UNION. Even though most spatial operations are much faster on Oracle Spatial than on others approaches, the SDO_AGGR_UNION is a very useful function than in my opnion needs to be treated more carefully by the development team.
I would not like to replace the SDO_AGGR_UNION by another non-Oracle tool. I really wished that the union algorithm were improved.
I agree with Simon, more people should demand a solution to this problem.
I find it interesting that the 2012 User Conference - announced on LinkedIn and not on this Forum - has this:
Subject: Executive Speakers Announced for Oracle Spatial User ConferenceThe issues of performance are always openly discussed on the PostGIS, JTS etc discussion lists. I have even had honest discussion of performance with the SQL Server Spatial team.
** New Performance Enhancements in Oracle Spatial — Siva Ravada
But with Oracle one always seems to come up against a communications black hole in which public discussion of performance by Oracle staff eg SDO_AGGR_UNION is not allowed. Perhaps
there is a corporate firewall stopping the discussion of issues that may be deemed to affect sales and marketing. If that is the case, fair enough, but the current, very formal methods (eg metalink)
of registering performance concerns don't the engender the open discussion that users expect nowadays when interacting with software vendors on the internet.
So, Siva, will there be any posting on this forum with a hint of what might be in what will probably be a very good talk? (PS If your talk discusses Exadata for more than 10 minutes I for one will turn off.)
This year there will be several tracks of talks and my talk won't mention Exadata.
As for the sdo_aggr_union issue, as we talked about in the last conference, the frame work we have in Oracle for these kinds of aggregate operations
is very severely restrictive against using a lot of memory. So within that framework, we don't have a way of collecting a set of geometries and doing
the union operation once on the whole set. So we have to do it iteratively which leads to the performance cost.
There is a reason for this restriction. Since Oracle process memory is shared by many users, it is not a good idea to allow one user to monopolize the memory
causing problems to other users.
That is the reason we introduced the sdo_aggr_set_union operation as an alternative way to do these union operations. Since the aggregate framework
is not used in this case, we have more flexibility in how we collect all the geometries and do the union operation once.
Even with this, in the 11gR2 release, a lot of time is spent in collecting the geometries and getting them into memory. So we are working on
improving that part of the process now.
Wrt examples for sdo_aggr_set_union, surely you have some reasonably large dataset you use for scalability testing?
Any, here is an example from real data (names changed to protect owner):
If I remove the sdo_aggr_set_union to give you some numbers....
select sdo_aggr_set_union(CAST(COLLECT(a.geometry) AS mdsys.SDO_Geometry_Array),0.005) as aggrGeom from largedataset a where a.filter_code = 'WA' group by a.group_code; -- Results ORA-22813: operand value exceeds system limits 22813. 00000 - "operand value exceeds system limits" *Cause: Object or Collection value was too large. The size of the value might have exceeded 30k in a SORT context, or the size might be too big for available memory. *Action: Choose another value and retry the operation.
Unless, of course, I am doing something wrong?
select count(*) as geomCount, round(avg(sdo_util.getNumVertices(a.geometry)),1) as avgVertices from largedataset a where a.filter_code = 'ME' group by a.group_code; -- Results GEOMCOUNT AVGVERTICES --------- ----------- 212 96.2 22 27.1 68 159 191 116.4 374 56.7 58 135.4 275 126.3 65 112.2 110 87 206 143.3 83 173.7 32 142.1 136 186.7 94 151.3 812 136.4 184 119.6 268 112.2 51 137.2 159 173.8 198 160.5 264 118.6 114 103.5 138 93.8 96 86.6 23 188.7 136 199.9 238 134.7 58 142.5 53 142.1 131 179.4 122 96.4 42 133.3 489 162.8 33 rows selected
I would LOVE to be able to use EXADATA but the sort of customers using Spatial in Australia can't afford it.
For me, as you know, there are more important and immediate issues that affect customers.
Since Oracle process memory is shared by many users, it is not a good idea to allow one user to monopolize the memory causing problems to other users.Since when does any software company tell users what is a good idea or not? This sends a bad message to users about their communications with (all) their customers (not just the "big end of town").
Oracle DB is highly configurable.
That is one of its great technical strengths.
So why is it that the sort context is fixed at 30K? Why is it not a parameter that the DBA can change to reflect the circumstances that the database finds itself in. If one needs to conduct aggregations using large geometries surely one should be able to change the 30k SORT context value? Or is this hard-coded into the kernel and can only be changed with difficulty?
I have played around with native and Java based custom aggregates but I constantly run into the context problem. I have read that a custom aggregate written in C (extproc) might get around the 30k sort context problem. I am tempted to try myself but I would have assumed that this is how you developed your sdo_aggr_union in the first place such that after a lot of effort the aggregate would fail just like everything else (though ESRI seem to be able to do this OK with their SDE.ST_GEOMETRY's ST_Aggr_Union operator).
Is there any light at the end of this very long tunnel, Siva?
I hope your paper goes well - I can't wait to read about the new performance improvements your paper will talk on.
This group by won't work as you hit that internal limit in the server.
So try the other approach described in the user guide using a function that creates a geometry array and passes it to the union function.
I am not an expert on this issue of that system limit of 30k sort area. When we talk to internal groups about it, they always
ask us to show the customer filed ER.
So it will really help if customers like you file an official ER for this.
Having said that, the main problem with this approach is still the multiple copies of the data which tends to take up
most of the time. So we are coming up with a better way to call sdo_aggr_set_union that avoid this multiple copies of the data.
This group by won't work as you hit that internal limit in the server.Patently! I provided this example because this is what people want in an aggregate and it is the way most people think of it.
So try the other approach described in the user guide using a function that creates a geometry array and passes it to the union function.OK, two more assertions:
1. Working with users over the years I have discovered they can use SQL reasonably well but I am surprised how many simply don't want to go down a PL/SQL function route!
2.The example given is very opaque as to how to do a SELECT ... GROUP BY aggregation using sdo_aggr_set_union.
Personally, I would prefer to expose the grouping value....
But this is quite inflexible because it really isn't a generic function but rather a function for a specific piece of SQL.
CREATE OR REPLACE FUNCTION Set_Geometry(p_value in varchar2) RETURN SDO_GEOMETRY_ARRAY deterministic AS c_query SYS_REFCURSOR; v_g sdo_geometry; v_GeomArr sdo_geometry_array; BEGIN v_GeomArr := SDO_GEOMETRY_ARRAY(); OPEN c_query FOR 'select a.geometry from admin a where a.blockcode = :1' USING p_value; LOOP FETCH c_query into v_g; EXIT when c_query%NOTFOUND ; v_GeomArr.extend; v_GeomArr(v_GeomArr.count) := v_g; END LOOP; RETURN v_GeomArr; END; / -- Which is called like so: -- select a.group_code, count(*) as aggrCount, sdo_aggr_set_union(set_geometry(a.group_code),0.005) as geoms from admin a where a.hierarchy_code = 'WA' group by a.group_code;
Because of this I prefer something more generic:
Now I agree SDO_AGGR_SET_UNION is much faster and, with my function, more flexible but it is still ugly.
CREATE OR REPLACE FUNCTION Set_Geometry(p_cursor in SYS_REFCURSOR) RETURN SDO_GEOMETRY_ARRAY DETERMINISTIC AS v_geom sdo_geometry; v_GeomArr sdo_geometry_array; BEGIN v_GeomArr := SDO_GEOMETRY_ARRAY(); LOOP FETCH p_cursor INTO v_geom; EXIT when p_cursor%NOTFOUND ; v_GeomArr.extend; v_GeomArr(v_GeomArr.count) := v_geom; END LOOP; RETURN v_GeomArr; END; / -- Results FUNCTION Set_Geometry compiled -- -- Now here's how to use it in a SELECT .... GROUP BY ... -- select group_code, count(*) as aggrCount, sdo_aggr_set_union(set_geometry(CURSOR(SELECT b.geometry FROM test_table b WHERE b.group_code= a.group_code)),0.005) as geoms from test_table a where a.hierarchy_code = 'WA' group by a.group_code;
I am not an expert on this issue of that system limit of 30k sort area. When we talk to internal groups about it, they alwaysSurely someone in the spatial team is an expert!
ask us to show the customer filed ER. So it will really help if customers like you file an official ER for this.
I can't file an ER because my Oracle use is under an OTN license for development purposes. I have asked complaining customers to do so over the years but for some reason they don't so I am left to make a fool of myself on their behalf.
Having said that, the main problem with this approach is still the multiple copies of the data which tends to take upIn your UC presentation? It is good for the public to know that the OS team is always trying to make things better and faster.
most of the time. So we are coming up with a better way to call sdo_aggr_set_union that avoid this multiple copies of the data.
I want to support Simon and the others on this.
This topic is on the forum's agenda for years, without being properly addressed (performance and simplicity).
We understand you have processes to follow, but on the other hand, one of the reasons why we contribute constructively here is also that we (maybe naively) hope it is picked by you and your team.
That the aggregating of spatial data is mostly related to the 30k sort issue is also known for years.
(I was surprised it is also more then 5 years ago I tried this : Re: Does anyone have an Aggregate APPEND function
Before 2007 I was in a position to file ER theoretically, although I must say I never triggered it as it had to run through a couple of internal and external departments first.
When you're in a time-pressure task, one will take it off-line (if feasible) and process it using other tools.
At the same time, let's be honest, union-ing is one of the most basic and common task for dealing with spatial data. Also one of the reasons why at least I hoped and expected it will be tackled by Oracle in its next release.
Looking at results Simon's time and effort there seems back some light in the tunnel for a workable approach on the unioning. Time to test that again.