8 Replies Latest reply: May 16, 2013 1:03 AM by Luc Van Linden RSS

    query is taking too much time

    don123
      hi

      The following query is taking too much time (more than 30 minutes), working with 11g.
      The table has three columns rid, ida, geometry and index has been created on all columns.
      The table has around 5,40,000 records of point geometries.

      Please help me with your suggestions. I want to select duplicate point geometry where ida=CORD.


      SQL> select a.rid, b.rid from totalrecords a, totalrecords b where a.ida='CORD' and b.idat='CORD' and
      sdo_equal(a.geometry, b.geometry)='TRUE' and a.rid !=b.rid order by 1,2;

      regards
        • 1. Re: query is taking too much time
          Stefan Jager
          Try this:
          SELECT /*+ ORDERED */
            a.rid, 
            b.rid 
          FROM
            totalrecords b,
            totalrecords a
          WHERE 
            a.ida='CORD' AND
            b.idat='CORD' AND
            a.rid !=b.rid AND
            sdo_equal(a.geometry, b.geometry)='TRUE' 
            ORDER BY 1,2;
          You are AND-ing everything, so use the least expensive expressions first. Secondly, you are using a table column as second geometry, and then it is recommended to use the ORDERED hint and specify that table first in your FROM-clause.

          Don't know if it will help, but give it a try. If it doesn't help, post your EXPLAIN PLAN's so we can see where the bottleneck is. I suspect you can't avoid a FTS here, which will happen twice because you're comparing the table with itself. Using indexes might actually slow things down in this case.

          HTH,
          Stefan
          • 2. Re: query is taking too much time
            Luc Van Linden
            Hi


            When you created the spatial index, did you set 'layer_gtype=point‘ in the parameters?

            Have you tried using the SDO_JOIN? http://docs.oracle.com/cd/B28359_01/appdev.111/b28400/sdo_operat.htm#BGEDJIBF
            SELECT a.rid, b.rid FROM
            table(sdo_join('TOTALRECORDS', 'GEOMETRY', 'TOTALRECORDS', 'GEOMETRY')) c,
            TOTALRECORDS a, TOTALRECORDS b
            WHERE c.rowid1 = a.rowid AND c.rowid2 = b.rowid
            AND a.rowid < b.rowid
            AND a.ida='CORD' and b.idat='CORD';
            AND SDO_GEOM.RELATE (a.geometry, 'ANYINTERACT', b.geometry, .05) = 'TRUE';
            Luc
            • 3. Re: query is taking too much time
              don123
              Stefan / Luc

              Thanks.

              The SELECT query response is improved as I have removed some AND conditions and I have set layer_gtype=POINT.

              Now i am facing the problem to DELETE duplicate point geometry. The following query is taking too much time.

              SQL> delete from totalrecords a where rowid > (select /*+ ordered */ min(rowid) from totalrecords b where sdo_equal(b.geometry, a.geometry)='TRUE');


              regards
              • 4. Re: query is taking too much time
                Stefan Jager
                I have removed some AND conditions
                That was not necessary. It's just that Oracle can see for example that
                a.ida='CORD' AND
                b.idat='CORD' AND
                a.rid !=b.rid AND
                sdo_equal(a.geometry, b.geometry)='TRUE' 
                ORDER BY 1,2;
                if a.ida does not equal 'CORD', the whole set of conditions evaluates to FALSE, so Oracle will not bother evaluating the rest of the conditions because it's all AND'ed together, and TRUE AND FALSE = FALSE.
                So if you place your least expensive conditions first (even though the optimizer can and will reorder conditions) this will give you a small performance benefit. Too small to notice, but on 5.4 million records it should be noticable.
                and I have set layer_gtype=POINT.
                Good, that will help. I forgot about that one (Thanks Luc!).
                Now i am facing the problem to DELETE duplicate point geometry. The following query is taking too much time.
                What is too much time? Do you need to delete these duplicate points on a daily or hourly basis? Or is this a one-time cleanup action? If it's a one-time cleanup operation, does it really matter if it takes half an hour?
                And if this is a daily or even hourly operation, then why don't you prevent the duplicates from entering the table in the first place? That will save you from having to clean up afterwards. Of course, this might not be possible with your business requirements.

                Lastly: can you post an explain plan for your queries? Those might give us an idea of what is taking so much time. Please enclose the results of the explain plan with
                [ c o d e ]
                <code/results here>
                [ / c o d e ]
                that way the original formatting is kept and it makes things much easier to read.

                Regards,
                Stefan
                • 5. Re: query is taking too much time
                  Luc Van Linden
                  Hi

                  Have you read the part of the selfjoins of the SDO_JOIN documentation?

                  Additionally the part
                  AND a.rowid < b.rowid
                  will already return the redundant set of rows (it will return you a=b, but not a=a, and not b=a)

                  Also the SDO_JOIN here could be used without the mask as you are dealing with points here, although that might already be optimized by the index parameter layer_gtype =POINT'.

                  So you might want to try something like:
                  DELETE FROM TOTALRECORDS 
                  where rowid in(
                  SELECT /*+ ordered use_nl (c,a) use_nl (c,b) */ a.rowid FROM
                  table(sdo_join('TOTALRECORDS', 'GEOMETRY', 'TOTALRECORDS', 'GEOMETRY')) c,
                  TOTALRECORDS a, TOTALRECORDS b
                  WHERE c.rowid1 = a.rowid AND c.rowid2 = b.rowid
                  AND a.rowid < b.rowid
                  AND a.ida='CORD' and b.idat='CORD'
                  );
                  From your last example you are not using the
                    idat='CORD' 
                  anymore, but only checking for redundant point geometries a such.
                  This would mean that the above example could be simplified again:
                  DELETE FROM TOTALRECORDS 
                  where rowid in(
                  SELECT c.rowid1 FROM
                  table(sdo_join('TOTALRECORDS', 'GEOMETRY', 'TOTALRECORDS', 'GEOMETRY')) c
                  WHERE rowid1 < rowid2
                  );
                  This will delete all redundant points as opposed to redundant pairs of idat=CORD.

                  Luc
                  • 6. Re: query is taking too much time
                    don123
                    Luc

                    Thanks for the great help. Your query is performing excellent. I red documentation on sdo_join suggested by you.

                    Still I am not clear how it selects duplicate points by using rowid1 and rowid2, can you explain this ?

                    regards
                    • 7. Re: query is taking too much time
                      Luc Van Linden
                      Hi

                      Glad it helped, just do not forget to award points ;-).

                      The SDO_JOIN will use the spatial indexes to perform a spatial comparison.
                      First it will use the index to check which geometries MBR's do interact, this is the primary filter: http://docs.oracle.com/cd/B28359_01/appdev.111/b28400/sdo_intro.htm#g1000087

                      In a second step, should you apply a spatial MASK, the pair of geometries coming out of the primary filter are then compared according the MASK, for example MASK=EQUAL, will check those pair of geoms to be equal.

                      Now, because you have points, the MBRs are the points themself, so if their MBR interact, the points do interact, which means they are equal.
                      This means that the outcome of the SDO_JOIN on points, in this case even a self join, will give you all points that are equal.

                      But as points are also equal to them self, the self-join sdo join, will give you: a=b & a=a but also b=a.
                      Therefor you set the a.rowid < b.rowid, to avoid a=a but also b=a.

                      I hope this explains this a bit, again reading and reading over the great documentation (from the start!) will get you a better understanding.

                      Good luck

                      Luc
                      • 8. Re: query is taking too much time
                        Luc Van Linden
                        Also

                        Read this post: how to find dangles

                        and search the 10th message in there, that explains also a bit on the rowids.

                        Luc