This content has been marked as final. Show 8 replies
You are AND-ing everything, so use the least expensive expressions first. Secondly, you are using a table column as second geometry, and then it is recommended to use the ORDERED hint and specify that table first in your FROM-clause.
SELECT /*+ ORDERED */ a.rid, b.rid FROM totalrecords b, totalrecords a WHERE a.ida='CORD' AND b.idat='CORD' AND a.rid !=b.rid AND sdo_equal(a.geometry, b.geometry)='TRUE' ORDER BY 1,2;
Don't know if it will help, but give it a try. If it doesn't help, post your EXPLAIN PLAN's so we can see where the bottleneck is. I suspect you can't avoid a FTS here, which will happen twice because you're comparing the table with itself. Using indexes might actually slow things down in this case.
When you created the spatial index, did you set 'layer_gtype=point‘ in the parameters?
Have you tried using the SDO_JOIN? http://docs.oracle.com/cd/B28359_01/appdev.111/b28400/sdo_operat.htm#BGEDJIBF
SELECT a.rid, b.rid FROM table(sdo_join('TOTALRECORDS', 'GEOMETRY', 'TOTALRECORDS', 'GEOMETRY')) c, TOTALRECORDS a, TOTALRECORDS b WHERE c.rowid1 = a.rowid AND c.rowid2 = b.rowid AND a.rowid < b.rowid AND a.ida='CORD' and b.idat='CORD'; AND SDO_GEOM.RELATE (a.geometry, 'ANYINTERACT', b.geometry, .05) = 'TRUE';
Stefan / Luc
The SELECT query response is improved as I have removed some AND conditions and I have set layer_gtype=POINT.
Now i am facing the problem to DELETE duplicate point geometry. The following query is taking too much time.
SQL> delete from totalrecords a where rowid > (select /*+ ordered */ min(rowid) from totalrecords b where sdo_equal(b.geometry, a.geometry)='TRUE');
I have removed some AND conditionsThat was not necessary. It's just that Oracle can see for example that
if a.ida does not equal 'CORD', the whole set of conditions evaluates to FALSE, so Oracle will not bother evaluating the rest of the conditions because it's all AND'ed together, and TRUE AND FALSE = FALSE.
a.ida='CORD' AND b.idat='CORD' AND a.rid !=b.rid AND sdo_equal(a.geometry, b.geometry)='TRUE' ORDER BY 1,2;
So if you place your least expensive conditions first (even though the optimizer can and will reorder conditions) this will give you a small performance benefit. Too small to notice, but on 5.4 million records it should be noticable.
and I have set layer_gtype=POINT.Good, that will help. I forgot about that one (Thanks Luc!).
Now i am facing the problem to DELETE duplicate point geometry. The following query is taking too much time.What is too much time? Do you need to delete these duplicate points on a daily or hourly basis? Or is this a one-time cleanup action? If it's a one-time cleanup operation, does it really matter if it takes half an hour?
And if this is a daily or even hourly operation, then why don't you prevent the duplicates from entering the table in the first place? That will save you from having to clean up afterwards. Of course, this might not be possible with your business requirements.
Lastly: can you post an explain plan for your queries? Those might give us an idea of what is taking so much time. Please enclose the results of the explain plan with
[ c o d e ]
[ / c o d e ]
that way the original formatting is kept and it makes things much easier to read.
Have you read the part of the selfjoins of the SDO_JOIN documentation?
Additionally the part
will already return the redundant set of rows (it will return you a=b, but not a=a, and not b=a)
AND a.rowid < b.rowid
Also the SDO_JOIN here could be used without the mask as you are dealing with points here, although that might already be optimized by the index parameter layer_gtype =POINT'.
So you might want to try something like:
From your last example you are not using the
DELETE FROM TOTALRECORDS where rowid in( SELECT /*+ ordered use_nl (c,a) use_nl (c,b) */ a.rowid FROM table(sdo_join('TOTALRECORDS', 'GEOMETRY', 'TOTALRECORDS', 'GEOMETRY')) c, TOTALRECORDS a, TOTALRECORDS b WHERE c.rowid1 = a.rowid AND c.rowid2 = b.rowid AND a.rowid < b.rowid AND a.ida='CORD' and b.idat='CORD' );
anymore, but only checking for redundant point geometries a such.
This would mean that the above example could be simplified again:
This will delete all redundant points as opposed to redundant pairs of idat=CORD.
DELETE FROM TOTALRECORDS where rowid in( SELECT c.rowid1 FROM table(sdo_join('TOTALRECORDS', 'GEOMETRY', 'TOTALRECORDS', 'GEOMETRY')) c WHERE rowid1 < rowid2 );
Thanks for the great help. Your query is performing excellent. I red documentation on sdo_join suggested by you.
Still I am not clear how it selects duplicate points by using rowid1 and rowid2, can you explain this ?
Glad it helped, just do not forget to award points ;-).
The SDO_JOIN will use the spatial indexes to perform a spatial comparison.
First it will use the index to check which geometries MBR's do interact, this is the primary filter: http://docs.oracle.com/cd/B28359_01/appdev.111/b28400/sdo_intro.htm#g1000087
In a second step, should you apply a spatial MASK, the pair of geometries coming out of the primary filter are then compared according the MASK, for example MASK=EQUAL, will check those pair of geoms to be equal.
Now, because you have points, the MBRs are the points themself, so if their MBR interact, the points do interact, which means they are equal.
This means that the outcome of the SDO_JOIN on points, in this case even a self join, will give you all points that are equal.
But as points are also equal to them self, the self-join sdo join, will give you: a=b & a=a but also b=a.
Therefor you set the a.rowid < b.rowid, to avoid a=a but also b=a.
I hope this explains this a bit, again reading and reading over the great documentation (from the start!) will get you a better understanding.