Discussions
Categories
- 196.9K All Categories
- 2.2K Data
- 239 Big Data Appliance
- 1.9K Data Science
- 450.3K Databases
- 221.7K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 550 MySQL Community Space
- 478 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3K ORDS, SODA & JSON in the Database
- 545 SQLcl
- 4K SQL Developer Data Modeler
- 187K SQL & PL/SQL
- 21.3K SQL Developer
- 295.9K Development
- 17 Developer Projects
- 138 Programming Languages
- 292.6K Development Tools
- 107 DevOps
- 3.1K QA/Testing
- 646K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 155 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.1K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 18 Java Essentials
- 160 Java 8 Questions
- 86K Java Programming
- 80 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 204 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 440 LiveLabs
- 38 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 171 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 232 Portuguese
Performance of querying two lat-long positions

Hi,
I have a table with 800k records that has one lat-long positions per record . I am looking for the best approach to find related records that meet the following conditions:
- within 28 days of each other (start new <= end old + 28 days)
- within the same category (there are 110 categories)
- within 20 meters distance of each other
So far I've tried spatial joins but Oracle wants to do the spatial join before the other conditions are checked, resulting in a runawy query.
My best guess is using the lat-long positions and what's called the haversin formula. But it still takes too long but maybe I'm missing something. My query and explain below:
Any tips, solutions?
Regards,
Martin
WITH X AS
( SELECT M.MELDINGSNUMMER
, M.CATEGORIE_VID
, M.ADWH_DT_MELDING_GEMAAKT
, M.ADWH_DT_1E_AFGEHANDELD
, LOC.LATITUDE_RADIALEN LATRAD
, LOC.LONGITUDE_RADIALEN LONRAD
FROM MORA_MELDING M
JOIN MORA_DIM_LOCATIE LOC
ON M.LOCATIE_VID = LOC.ADWH_VERSIE_ID
WHERE 1=1
)
SELECT A.MELDINGSNUMMER MELDINGSNUMMER_VERLEDEN
, B.MELDINGSNUMMER MELDINGSNUMMER_HEDEN
, DECODE(A.MELDINGSNUMMER, NULL, 0, 1) IND_HERHAAL
, ROUND( 1000 * 6367 * 2 *
ATAN2(
SQRT(
POWER(SIN( ( B.LATRAD - A.LATRAD ) /2) ,2) +
COS( A.LATRAD ) * COS( B.LATRAD ) * POWER(SIN( ( B.LONRAD - A.LONRAD ) /2) ,2)
)
, SQRT(1-
POWER(SIN( ( B.LATRAD - A.LATRAD ) /2) ,2) +
COS( A.LATRAD ) * COS( B.LATRAD ) * POWER(SIN( ( B.LONRAD - A.LONRAD ) /2) ,2)
)
)
) AFSTAND
FROM X A -- VERLEDEN
JOIN X B -- HEDEN
ON A.CATEGORIE_VID = B.CATEGORIE_VID
AND B.ADWH_DT_MELDING_GEMAAKT BETWEEN A.ADWH_DT_1E_AFGEHANDELD AND (A.ADWH_DT_1E_AFGEHANDELD + 28)
AND A.MELDINGSNUMMER < B.MELDINGSNUMMER
AND ROUND( 1000 * 6367 * 2 *
ATAN2(
SQRT(
POWER(SIN( ( B.LATRAD - A.LATRAD ) /2) ,2) +
COS( A.LATRAD ) * COS( B.LATRAD ) * POWER(SIN( ( B.LONRAD - A.LONRAD ) /2) ,2)
)
, SQRT(1-
POWER(SIN( ( B.LATRAD - A.LATRAD ) /2) ,2) +
COS( A.LATRAD ) * COS( B.LATRAD ) * POWER(SIN( ( B.LONRAD - A.LONRAD ) /2) ,2)
)
)
) <= 20
WHERE 1=1
;
This query results in the following explain plan:
Plan hash value: 477534478
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 122 | 36730 (1)| 00:00:02 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6651_28B8758F | | | | |
|* 3 | HASH JOIN | | 36416 | 3947K| 36503 (1)| 00:00:02 |
|* 4 | TABLE ACCESS FULL | MORA_MELDING | 36416 | 1244K| 21840 (1)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | MORA_DIM_LOCATIE | 879K| 63M| 14660 (1)| 00:00:01 |
|* 6 | HASH JOIN | | 1 | 122 | 227 (1)| 00:00:01 |
| 7 | VIEW | | 27275 | 1624K| 113 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6651_28B8758F | 27275 | 1864K| 113 (0)| 00:00:01 |
| 9 | VIEW | | 27275 | 1624K| 113 (0)| 00:00:01 |
| 10 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6651_28B8758F | 27275 | 1864K| 113 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$5BF935F8
2 - SEL$F1D6E378
4 - SEL$F1D6E378 / [email protected]$1
5 - SEL$F1D6E378 / [email protected]$1
7 - SEL$ECCE8CA4 / [email protected]$2
8 - SEL$ECCE8CA4 / [email protected]$ECCE8CA4
9 - SEL$ECCE8CA5 / [email protected]$2
10 - SEL$ECCE8CA5 / [email protected]$ECCE8CA5
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
USE_HASH(@"SEL$F1D6E378" "LOC"@"SEL$1")
LEADING(@"SEL$F1D6E378" "M"@"SEL$1" "LOC"@"SEL$1")
FULL(@"SEL$F1D6E378" "LOC"@"SEL$1")
FULL(@"SEL$F1D6E378" "M"@"SEL$1")
FULL(@"SEL$ECCE8CA4" "T1"@"SEL$ECCE8CA4")
FULL(@"SEL$ECCE8CA5" "T1"@"SEL$ECCE8CA5")
USE_HASH(@"SEL$5BF935F8" "B"@"SEL$2")
LEADING(@"SEL$5BF935F8" "A"@"SEL$2" "B"@"SEL$2")
NO_ACCESS(@"SEL$5BF935F8" "B"@"SEL$2")
NO_ACCESS(@"SEL$5BF935F8" "A"@"SEL$2")
OUTLINE(@"SEL$2")
OUTLINE(@"SEL$4")
MERGE(@"SEL$1")
OUTLINE(@"SEL$F1D6E378")
OUTLINE(@"SEL$1")
OUTLINE(@"SEL$3")
MERGE(@"SEL$2")
OUTLINE_LEAF(@"SEL$5BF935F8")
OUTLINE_LEAF(@"SEL$ECCE8CA5")
OUTLINE_LEAF(@"SEL$ECCE8CA4")
MERGE(@"SEL$1")
OUTLINE_LEAF(@"SEL$F1D6E378")
ALL_ROWS
DB_VERSION('12.1.0.2')
OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("M"."LOCATIE_VID"="LOC"."ADWH_VERSIE_ID")
4 - filter("M"."ADWH_DT_MELDING_GEMAAKT">=TO_DATE(' 2020-12-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "M"."ADWH_DT_MELDING_GEMAAKT"<=TO_DATE(' 2020-12-31 23:59:59', 'syyyy-mm-dd
hh24:mi:ss'))
5 - filter("LOC"."LATITUDE" IS NOT NULL AND "LOC"."LONGITUDE" IS NOT NULL)
6 - access("A"."CATEGORIE_VID"="B"."CATEGORIE_VID")
filter("B"."ADWH_DT_MELDING_GEMAAKT">="A"."ADWH_DT_1E_AFGEHANDELD" AND
"B"."ADWH_DT_MELDING_GEMAAKT"<=INTERNAL_FUNCTION("A"."ADWH_DT_1E_AFGEHANDELD")+28 AND
"A"."MELDINGSNUMMER"<"B"."MELDINGSNUMMER" AND ROUND(12734000*ATAN2(SQRT(POWER(SIN(("B"."LATRAD"-"A
"."LATRAD")/2),2)+COS("A"."LATRAD")*COS("B"."LATRAD")*POWER(SIN(("B"."LONRAD"-"A"."LONRAD")/2),2))
,SQRT(1-POWER(SIN(("B"."LATRAD"-"A"."LATRAD")/2),2)+COS("A"."LATRAD")*COS("B"."LATRAD")*POWER(SIN(
("B"."LONRAD"-"A"."LONRAD")/2),2))))<=20)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "A"."CATEGORIE_VID"[NUMBER,22], "B"."CATEGORIE_VID"[NUMBER,22],
"A"."MELDINGSNUMMER"[NUMBER,22], "A"."LONRAD"[NUMBER,22], "A"."ADWH_DT_1E_AFGEHANDELD"[DATE,7],
"A"."LATRAD"[NUMBER,22], "B"."MELDINGSNUMMER"[NUMBER,22], "B"."LONRAD"[NUMBER,22],
"B"."ADWH_DT_MELDING_GEMAAKT"[DATE,7], "B"."LATRAD"[NUMBER,22]
2 - SYSDEF[4], SYSDEF[0], SYSDEF[1], SYSDEF[112], SYSDEF[0]
3 - (#keys=1) "M"."LOCATIE_VID"[NUMBER,22], "LOC"."ADWH_VERSIE_ID"[NUMBER,22],
"M"."MELDINGSNUMMER"[NUMBER,22], "M"."CATEGORIE_VID"[NUMBER,22],
"M"."ADWH_DT_MELDING_GEMAAKT"[DATE,7], "M"."ADWH_DT_1E_AFGEHANDELD"[DATE,7],
"LOC"."LONGITUDE_RADIALEN"[NUMBER,22], "LOC"."LATITUDE"[VARCHAR2,32],
"LOC"."LONGITUDE"[VARCHAR2,32], "LOC"."ADWH_STRAAT"[VARCHAR2,64],
"LOC"."ADWH_HUISNUMMER"[NUMBER,22], "LOC"."LATITUDE_RADIALEN"[NUMBER,22]
4 - (rowset=200) "M"."MELDINGSNUMMER"[NUMBER,22], "M"."CATEGORIE_VID"[NUMBER,22],
"M"."ADWH_DT_MELDING_GEMAAKT"[DATE,7], "M"."ADWH_DT_1E_AFGEHANDELD"[DATE,7],
"M"."LOCATIE_VID"[NUMBER,22]
5 - (rowset=200) "LOC"."ADWH_VERSIE_ID"[NUMBER,22], "LOC"."LATITUDE"[VARCHAR2,32],
"LOC"."LONGITUDE"[VARCHAR2,32], "LOC"."ADWH_STRAAT"[VARCHAR2,64],
"LOC"."ADWH_HUISNUMMER"[NUMBER,22], "LOC"."LATITUDE_RADIALEN"[NUMBER,22],
"LOC"."LONGITUDE_RADIALEN"[NUMBER,22]
6 - (#keys=1) "A"."CATEGORIE_VID"[NUMBER,22], "B"."CATEGORIE_VID"[NUMBER,22],
"A"."MELDINGSNUMMER"[NUMBER,22], "A"."LONRAD"[NUMBER,22], "A"."ADWH_DT_1E_AFGEHANDELD"[DATE,7],
"A"."LATRAD"[NUMBER,22], "B"."MELDINGSNUMMER"[NUMBER,22], "B"."LONRAD"[NUMBER,22],
"B"."ADWH_DT_MELDING_GEMAAKT"[DATE,7], "B"."LATRAD"[NUMBER,22]
7 - "A"."MELDINGSNUMMER"[NUMBER,22], "A"."CATEGORIE_VID"[NUMBER,22],
"A"."ADWH_DT_1E_AFGEHANDELD"[DATE,7], "A"."LATRAD"[NUMBER,22], "A"."LONRAD"[NUMBER,22]
8 - "C0"[NUMBER,22], "C1"[NUMBER,22], "C2"[DATE,7], "C3"[DATE,7], "C4"[VARCHAR2,64],
"C5"[NUMBER,22], "C6"[NUMBER,22], "C7"[NUMBER,22]
9 - "B"."MELDINGSNUMMER"[NUMBER,22], "B"."CATEGORIE_VID"[NUMBER,22],
"B"."ADWH_DT_MELDING_GEMAAKT"[DATE,7], "B"."LATRAD"[NUMBER,22], "B"."LONRAD"[NUMBER,22]
10 - "C0"[NUMBER,22], "C1"[NUMBER,22], "C2"[DATE,7], "C3"[DATE,7], "C4"[VARCHAR2,64],
"C5"[NUMBER,22], "C6"[NUMBER,22], "C7"[NUMBER,22]
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
- 1 Sql Plan Directive used for this statement
Answers
-
The plan isn't completely consistent with the query - there's a date restriction in the plan (predicate on operation 4) and a pair of not null restrictions (predicates on operation 5) that aren't in the query. Without those predicates I assume both tables have similar numbers of rows and the 800K in your description is for MORE_MELDING.
The key threat in the plan is the HASH JOIN which joins the materialized CTE on category_id. On average a ctegory has about = 7,000 rows (800K / 110). If you join this to itself you get 49M rows per category where you have to check the date restriction before calling all the trig functions. I'd check the time it takes to do just that and count the size of the result set (before doing the 20m check) to see what proportion of your current time is due to the data growh and date comparison.
You can't logically avoid the 7K -> 49M expansion, but you may be able to minimise the physical expansion and cost of elimination if (a) you get Oracle to use a MERGE JOIN, and (b) make sure that Oracle is using the "band join" strategy that appeared in 12c. You may need to modify your join predicates conditions slightly to do this because you have an equality and two inequalities (ignoring the trig predicates) - but at present the predicates are running in the right order, so the optimizer use the preidcates in the right order as soon as you force the merge join.
Regards
Jonathan Lewis
P,.S. Work on optimizing the category/date join first - then worry about the distance. Once you've got the category/date sorted you may find that you can use a spatial predicate the way you originally wanted if you use the "associate statistics" mechanism to say "this function is very expensive".
-
On top of Jonathan's advice, if you refactor the trig, sqrt and power calls, you might see some improvements. I'm thinking something like this...
WITH X AS ( SELECT M.MELDINGSNUMMER , M.CATEGORIE_VID , M.ADWH_DT_MELDING_GEMAAKT , M.ADWH_DT_1E_AFGEHANDELD , LOC.LATITUDE_RADIALEN LATRAD , LOC.LONGITUDE_RADIALEN LONRAD FROM MORA_MELDING M JOIN MORA_DIM_LOCATIE LOC ON M.LOCATIE_VID = LOC.ADWH_VERSIE_ID WHERE 1=1 ) , Y as ( SELECT A.MELDINGSNUMMER MELDINGSNUMMER_VERLEDEN , B.MELDINGSNUMMER MELDINGSNUMMER_HEDEN , DECODE(A.MELDINGSNUMMER, NULL, 0, 1) IND_HERHAAL , POWER(SIN( ( B.LATRAD - A.LATRAD ) /2) ,2) LATPOWSIN2 , POWER(SIN( ( B.LONRAD - A.LONRAD ) /2) ,2) LONPOWSIN2 , COS( A.LATRAD ) * COS( B.LATRAD ) COSLATRAD FROM X A -- VERLEDEN JOIN X B -- HEDEN ON A.CATEGORIE_VID = B.CATEGORIE_VID AND B.ADWH_DT_MELDING_GEMAAKT BETWEEN A.ADWH_DT_1E_AFGEHANDELD AND (A.ADWH_DT_1E_AFGEHANDELD + 28) AND A.MELDINGSNUMMER < B.MELDINGSNUMMER ) , Z as ( select Y.*, ROUND( 1000 * 6367 * 2 * ATAN2( SQRT(LATPOWSIN2 + COSLATRAD * LONPOWSIN2) , SQRT(1-LATPOWSIN2 + COSLATRAD * LONPOWSIN2) ) ) TRIGF from Y ) SELECT MELDINGSNUMMER_VERLEDEN, MELDINGSNUMMER_HEDEN, IND_HERHAAL, TRIGF FROM Z WHERE 1=1 and TRIGF <= 20 ;