## Forum Stats

• 3,836,878 Users
• 2,262,201 Discussions

Discussions

# Performance of querying two lat-long positions

Member Posts: 10 Green Ribbon

Hi,

I have a table with 800k records that has one lat-long positions per record . I am looking for the best approach to find related records that meet the following conditions:

• within 28 days of each other (start new <= end old + 28 days)
• within the same category (there are 110 categories)
• within 20 meters distance of each other

So far I've tried spatial joins but Oracle wants to do the spatial join before the other conditions are checked, resulting in a runawy query.

My best guess is using the lat-long positions and what's called the haversin formula. But it still takes too long but maybe I'm missing something. My query and explain below:

Any tips, solutions?

Regards,

Martin

WITH X AS

(  SELECT M.MELDINGSNUMMER

,      M.CATEGORIE_VID

FROM   MORA_MELDING       M

JOIN   MORA_DIM_LOCATIE   LOC

WHERE  1=1

)

SELECT A.MELDINGSNUMMER       MELDINGSNUMMER_VERLEDEN

,      B.MELDINGSNUMMER       MELDINGSNUMMER_HEDEN

,      DECODE(A.MELDINGSNUMMER, NULL, 0, 1) IND_HERHAAL

,      ROUND( 1000 * 6367 * 2 *

ATAN2(

SQRT(

)

,  SQRT(1-

)

)

)                                  AFSTAND

FROM   X      A      -- VERLEDEN

JOIN   X      B      -- HEDEN

ON   A.CATEGORIE_VID = B.CATEGORIE_VID

AND   A.MELDINGSNUMMER < B.MELDINGSNUMMER

AND   ROUND( 1000 * 6367 * 2 *

ATAN2(

SQRT(

)

,  SQRT(1-

)

)

) <= 20

WHERE  1=1

;

This query results in the following explain plan:

Plan hash value: 477534478

----------------------------------------------------------------------------------------------------------

| Id | Operation                 | Name                       | Rows | Bytes | Cost (%CPU)| Time    |

----------------------------------------------------------------------------------------------------------

|  0 | SELECT STATEMENT          |                            |    1 |  122 | 36730  (1)| 00:00:02 |

|  1 | TEMP TABLE TRANSFORMATION |                            |      |      |           |         |

|  2 |  LOAD AS SELECT          | SYS_TEMP_0FD9D6651_28B8758F |      |      |           |         |

|* 3 |   HASH JOIN              |                            | 36416 | 3947K| 36503  (1)| 00:00:02 |

|* 4 |    TABLE ACCESS FULL     | MORA_MELDING               | 36416 | 1244K| 21840  (1)| 00:00:01 |

|* 5 |    TABLE ACCESS FULL     | MORA_DIM_LOCATIE           |  879K|   63M| 14660  (1)| 00:00:01 |

|* 6 |  HASH JOIN               |                            |    1 |  122 |  227  (1)| 00:00:01 |

|  7 |   VIEW                   |                            | 27275 | 1624K|  113  (0)| 00:00:01 |

|  8 |    TABLE ACCESS FULL     | SYS_TEMP_0FD9D6651_28B8758F | 27275 | 1864K|  113  (0)| 00:00:01 |

|  9 |   VIEW                   |                            | 27275 | 1624K|  113  (0)| 00:00:01 |

| 10 |    TABLE ACCESS FULL     | SYS_TEMP_0FD9D6651_28B8758F | 27275 | 1864K|  113  (0)| 00:00:01 |

----------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):

-------------------------------------------------------------

1 - SEL\$5BF935F8

2 - SEL\$F1D6E378

4 - SEL\$F1D6E378 / [email protected]\$1

5 - SEL\$F1D6E378 / [email protected]\$1

7 - SEL\$ECCE8CA4 / [email protected]\$2

8 - SEL\$ECCE8CA4 / [email protected]\$ECCE8CA4

9 - SEL\$ECCE8CA5 / [email protected]\$2

10 - SEL\$ECCE8CA5 / [email protected]\$ECCE8CA5

Outline Data

-------------

/*+

BEGIN_OUTLINE_DATA

USE_HASH(@"SEL\$F1D6E378" "LOC"@"SEL\$1")

FULL(@"SEL\$F1D6E378" "LOC"@"SEL\$1")

FULL(@"SEL\$F1D6E378" "M"@"SEL\$1")

FULL(@"SEL\$ECCE8CA4" "T1"@"SEL\$ECCE8CA4")

FULL(@"SEL\$ECCE8CA5" "T1"@"SEL\$ECCE8CA5")

USE_HASH(@"SEL\$5BF935F8" "B"@"SEL\$2")

NO_ACCESS(@"SEL\$5BF935F8" "B"@"SEL\$2")

NO_ACCESS(@"SEL\$5BF935F8" "A"@"SEL\$2")

OUTLINE(@"SEL\$2")

OUTLINE(@"SEL\$4")

MERGE(@"SEL\$1")

OUTLINE(@"SEL\$F1D6E378")

OUTLINE(@"SEL\$1")

OUTLINE(@"SEL\$3")

MERGE(@"SEL\$2")

OUTLINE_LEAF(@"SEL\$5BF935F8")

OUTLINE_LEAF(@"SEL\$ECCE8CA5")

OUTLINE_LEAF(@"SEL\$ECCE8CA4")

MERGE(@"SEL\$1")

OUTLINE_LEAF(@"SEL\$F1D6E378")

ALL_ROWS

DB_VERSION('12.1.0.2')

OPTIMIZER_FEATURES_ENABLE('12.1.0.2')

IGNORE_OPTIM_EMBEDDED_HINTS

END_OUTLINE_DATA

*/

Predicate Information (identified by operation id):

---------------------------------------------------

4 - filter("M"."ADWH_DT_MELDING_GEMAAKT">=TO_DATE(' 2020-12-01 00:00:00', 'syyyy-mm-dd

hh24:mi:ss') AND "M"."ADWH_DT_MELDING_GEMAAKT"<=TO_DATE(' 2020-12-31 23:59:59', 'syyyy-mm-dd

hh24:mi:ss'))

5 - filter("LOC"."LATITUDE" IS NOT NULL AND "LOC"."LONGITUDE" IS NOT NULL)

6 - access("A"."CATEGORIE_VID"="B"."CATEGORIE_VID")

Column Projection Information (identified by operation id):

-----------------------------------------------------------

1 - "A"."CATEGORIE_VID"[NUMBER,22], "B"."CATEGORIE_VID"[NUMBER,22],

2 - SYSDEF[4], SYSDEF[0], SYSDEF[1], SYSDEF[112], SYSDEF[0]

"M"."MELDINGSNUMMER"[NUMBER,22], "M"."CATEGORIE_VID"[NUMBER,22],

4 - (rowset=200) "M"."MELDINGSNUMMER"[NUMBER,22], "M"."CATEGORIE_VID"[NUMBER,22],

"M"."LOCATIE_VID"[NUMBER,22]

6 - (#keys=1) "A"."CATEGORIE_VID"[NUMBER,22], "B"."CATEGORIE_VID"[NUMBER,22],

7 - "A"."MELDINGSNUMMER"[NUMBER,22], "A"."CATEGORIE_VID"[NUMBER,22],

8 - "C0"[NUMBER,22], "C1"[NUMBER,22], "C2"[DATE,7], "C3"[DATE,7], "C4"[VARCHAR2,64],

"C5"[NUMBER,22], "C6"[NUMBER,22], "C7"[NUMBER,22]

9 - "B"."MELDINGSNUMMER"[NUMBER,22], "B"."CATEGORIE_VID"[NUMBER,22],

10 - "C0"[NUMBER,22], "C1"[NUMBER,22], "C2"[DATE,7], "C3"[DATE,7], "C4"[VARCHAR2,64],

"C5"[NUMBER,22], "C6"[NUMBER,22], "C7"[NUMBER,22]

Note

-----

- dynamic statistics used: dynamic sampling (level=2)

- 1 Sql Plan Directive used for this statement

Tagged:

• Member Posts: 9,995 Blue Diamond

The plan isn't completely consistent with the query - there's a date restriction in the plan (predicate on operation 4) and a pair of not null restrictions (predicates on operation 5) that aren't in the query. Without those predicates I assume both tables have similar numbers of rows and the 800K in your description is for MORE_MELDING.

The key threat in the plan is the HASH JOIN which joins the materialized CTE on category_id. On average a ctegory has about = 7,000 rows (800K / 110). If you join this to itself you get 49M rows per category where you have to check the date restriction before calling all the trig functions. I'd check the time it takes to do just that and count the size of the result set (before doing the 20m check) to see what proportion of your current time is due to the data growh and date comparison.

You can't logically avoid the 7K -> 49M expansion, but you may be able to minimise the physical expansion and cost of elimination if (a) you get Oracle to use a MERGE JOIN, and (b) make sure that Oracle is using the "band join" strategy that appeared in 12c. You may need to modify your join predicates conditions slightly to do this because you have an equality and two inequalities (ignoring the trig predicates) - but at present the predicates are running in the right order, so the optimizer use the preidcates in the right order as soon as you force the merge join.

Regards

Jonathan Lewis

P,.S. Work on optimizing the category/date join first - then worry about the distance. Once you've got the category/date sorted you may find that you can use a spatial predicate the way you originally wanted if you use the "associate statistics" mechanism to say "this function is very expensive".

• Member Posts: 8,717 Blue Diamond
edited Apr 21, 2021 10:07PM

On top of Jonathan's advice, if you refactor the trig, sqrt and power calls, you might see some improvements. I'm thinking something like this...

```WITH X AS
( SELECT M.MELDINGSNUMMER
,   M.CATEGORIE_VID
FROM  MORA_MELDING    M
JOIN  MORA_DIM_LOCATIE  LOC
WHERE 1=1
)
, Y as (
SELECT A.MELDINGSNUMMER    MELDINGSNUMMER_VERLEDEN
,   B.MELDINGSNUMMER    MELDINGSNUMMER_HEDEN
,   DECODE(A.MELDINGSNUMMER, NULL, 0, 1) IND_HERHAAL
FROM  X   A   -- VERLEDEN
JOIN  X   B   -- HEDEN
ON  A.CATEGORIE_VID = B.CATEGORIE_VID
AND  A.MELDINGSNUMMER < B.MELDINGSNUMMER
)
, Z as (
select Y.*,
ROUND( 1000 * 6367 * 2 *
ATAN2(