This discussion is archived
8 Replies Latest reply: May 15, 2013 11:03 PM by Luc Van Linden RSS

query is taking too much time

don123 Newbie
Currently Being Moderated
hi

The following query is taking too much time (more than 30 minutes), working with 11g.
The table has three columns rid, ida, geometry and index has been created on all columns.
The table has around 5,40,000 records of point geometries.

Please help me with your suggestions. I want to select duplicate point geometry where ida=CORD.


SQL> select a.rid, b.rid from totalrecords a, totalrecords b where a.ida='CORD' and b.idat='CORD' and
sdo_equal(a.geometry, b.geometry)='TRUE' and a.rid !=b.rid order by 1,2;

regards
  • 1. Re: query is taking too much time
    Stefan Jager Journeyer
    Currently Being Moderated
    Try this:
    SELECT /*+ ORDERED */
      a.rid, 
      b.rid 
    FROM
      totalrecords b,
      totalrecords a
    WHERE 
      a.ida='CORD' AND
      b.idat='CORD' AND
      a.rid !=b.rid AND
      sdo_equal(a.geometry, b.geometry)='TRUE' 
      ORDER BY 1,2;
    You are AND-ing everything, so use the least expensive expressions first. Secondly, you are using a table column as second geometry, and then it is recommended to use the ORDERED hint and specify that table first in your FROM-clause.

    Don't know if it will help, but give it a try. If it doesn't help, post your EXPLAIN PLAN's so we can see where the bottleneck is. I suspect you can't avoid a FTS here, which will happen twice because you're comparing the table with itself. Using indexes might actually slow things down in this case.

    HTH,
    Stefan
  • 2. Re: query is taking too much time
    Luc Van Linden Pro
    Currently Being Moderated
    Hi


    When you created the spatial index, did you set 'layer_gtype=point‘ in the parameters?

    Have you tried using the SDO_JOIN? http://docs.oracle.com/cd/B28359_01/appdev.111/b28400/sdo_operat.htm#BGEDJIBF
    SELECT a.rid, b.rid FROM
    table(sdo_join('TOTALRECORDS', 'GEOMETRY', 'TOTALRECORDS', 'GEOMETRY')) c,
    TOTALRECORDS a, TOTALRECORDS b
    WHERE c.rowid1 = a.rowid AND c.rowid2 = b.rowid
    AND a.rowid < b.rowid
    AND a.ida='CORD' and b.idat='CORD';
    AND SDO_GEOM.RELATE (a.geometry, 'ANYINTERACT', b.geometry, .05) = 'TRUE';
    Luc
  • 3. Re: query is taking too much time
    don123 Newbie
    Currently Being Moderated
    Stefan / Luc

    Thanks.

    The SELECT query response is improved as I have removed some AND conditions and I have set layer_gtype=POINT.

    Now i am facing the problem to DELETE duplicate point geometry. The following query is taking too much time.

    SQL> delete from totalrecords a where rowid > (select /*+ ordered */ min(rowid) from totalrecords b where sdo_equal(b.geometry, a.geometry)='TRUE');


    regards
  • 4. Re: query is taking too much time
    Stefan Jager Journeyer
    Currently Being Moderated
    I have removed some AND conditions
    That was not necessary. It's just that Oracle can see for example that
    a.ida='CORD' AND
    b.idat='CORD' AND
    a.rid !=b.rid AND
    sdo_equal(a.geometry, b.geometry)='TRUE' 
    ORDER BY 1,2;
    if a.ida does not equal 'CORD', the whole set of conditions evaluates to FALSE, so Oracle will not bother evaluating the rest of the conditions because it's all AND'ed together, and TRUE AND FALSE = FALSE.
    So if you place your least expensive conditions first (even though the optimizer can and will reorder conditions) this will give you a small performance benefit. Too small to notice, but on 5.4 million records it should be noticable.
    and I have set layer_gtype=POINT.
    Good, that will help. I forgot about that one (Thanks Luc!).
    Now i am facing the problem to DELETE duplicate point geometry. The following query is taking too much time.
    What is too much time? Do you need to delete these duplicate points on a daily or hourly basis? Or is this a one-time cleanup action? If it's a one-time cleanup operation, does it really matter if it takes half an hour?
    And if this is a daily or even hourly operation, then why don't you prevent the duplicates from entering the table in the first place? That will save you from having to clean up afterwards. Of course, this might not be possible with your business requirements.

    Lastly: can you post an explain plan for your queries? Those might give us an idea of what is taking so much time. Please enclose the results of the explain plan with
    [ c o d e ]
    <code/results here>
    [ / c o d e ]
    that way the original formatting is kept and it makes things much easier to read.

    Regards,
    Stefan
  • 5. Re: query is taking too much time
    Luc Van Linden Pro
    Currently Being Moderated
    Hi

    Have you read the part of the selfjoins of the SDO_JOIN documentation?

    Additionally the part
    AND a.rowid < b.rowid
    will already return the redundant set of rows (it will return you a=b, but not a=a, and not b=a)

    Also the SDO_JOIN here could be used without the mask as you are dealing with points here, although that might already be optimized by the index parameter layer_gtype =POINT'.

    So you might want to try something like:
    DELETE FROM TOTALRECORDS 
    where rowid in(
    SELECT /*+ ordered use_nl (c,a) use_nl (c,b) */ a.rowid FROM
    table(sdo_join('TOTALRECORDS', 'GEOMETRY', 'TOTALRECORDS', 'GEOMETRY')) c,
    TOTALRECORDS a, TOTALRECORDS b
    WHERE c.rowid1 = a.rowid AND c.rowid2 = b.rowid
    AND a.rowid < b.rowid
    AND a.ida='CORD' and b.idat='CORD'
    );
    From your last example you are not using the
      idat='CORD' 
    anymore, but only checking for redundant point geometries a such.
    This would mean that the above example could be simplified again:
    DELETE FROM TOTALRECORDS 
    where rowid in(
    SELECT c.rowid1 FROM
    table(sdo_join('TOTALRECORDS', 'GEOMETRY', 'TOTALRECORDS', 'GEOMETRY')) c
    WHERE rowid1 < rowid2
    );
    This will delete all redundant points as opposed to redundant pairs of idat=CORD.

    Luc
  • 6. Re: query is taking too much time
    don123 Newbie
    Currently Being Moderated
    Luc

    Thanks for the great help. Your query is performing excellent. I red documentation on sdo_join suggested by you.

    Still I am not clear how it selects duplicate points by using rowid1 and rowid2, can you explain this ?

    regards
  • 7. Re: query is taking too much time
    Luc Van Linden Pro
    Currently Being Moderated
    Hi

    Glad it helped, just do not forget to award points ;-).

    The SDO_JOIN will use the spatial indexes to perform a spatial comparison.
    First it will use the index to check which geometries MBR's do interact, this is the primary filter: http://docs.oracle.com/cd/B28359_01/appdev.111/b28400/sdo_intro.htm#g1000087

    In a second step, should you apply a spatial MASK, the pair of geometries coming out of the primary filter are then compared according the MASK, for example MASK=EQUAL, will check those pair of geoms to be equal.

    Now, because you have points, the MBRs are the points themself, so if their MBR interact, the points do interact, which means they are equal.
    This means that the outcome of the SDO_JOIN on points, in this case even a self join, will give you all points that are equal.

    But as points are also equal to them self, the self-join sdo join, will give you: a=b & a=a but also b=a.
    Therefor you set the a.rowid < b.rowid, to avoid a=a but also b=a.

    I hope this explains this a bit, again reading and reading over the great documentation (from the start!) will get you a better understanding.

    Good luck

    Luc
  • 8. Re: query is taking too much time
    Luc Van Linden Pro
    Currently Being Moderated
    Also

    Read this post: how to find dangles

    and search the 10th message in there, that explains also a bit on the rowids.

    Luc

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points