This discussion is archived
5 Replies Latest reply: Nov 1, 2012 8:18 AM by mburazor RSS

Load millions of records into spatial table

mburazor Newbie
Currently Being Moderated
Hello,
I would appreciate some help on how to organize data loading process for a really big spatial table (sdo_point).
The table gets 10 million new records every morning. The table is partitioned by a date column, so it is one partition for one month of data. Spatial index is also partitioned the same way. Oracle 11.2 database and sqlldr from 10.2 and 11.2 oracle clients.
I'm looking for a good way to use SQL*Loader to load these daily records in an optimal way.
I tried to set up the process this way:
1) Drop spatial index.
2) Run sqlldr to load text file data into the spatial table. It takes about 3 minutes. It is not "direct path".
3) Re-create spatial index that takes about 45-60 minutes.
I'm looking for a way to make the whole loading process even faster. Maybe someone can suggest a better way?

Thank you,
Milan
  • 1. Re: Load millions of records into spatial table
    AdamMartin Pro
    Currently Being Moderated
    Don't drop the index. Use direct path inserts (append or parallel).
  • 2. Re: Load millions of records into spatial table
    John O'Toole Journeyer
    Currently Being Moderated
    Hi Milan,

    RTree spatial index updates are pretty quick, so I would first try inserting the 10 million points with the spatial index in place - experiment with higher sdo_dml_batch_size values to see if it helps.

    If that proves too slow, then I would try the following:
    1) Create a staging table with the contents of the last partition
    2) Do a SQL*Loader direct path insert of the 10 million points into that staging table
    3) Build a spatial index on that table
    4) Use Partition Exchange to swap the last partition for your staging table, exchanging the spatial index too

    Have a look at using External Tables rather than SQL*Loader - one advantage is that it makes it easier to script the whole thing in pl/sql.
    Also, make sure you are using the "layer_gtype=point" parameter on your RTree spatial indexes.

    Regards,
    John
  • 3. Re: Load millions of records into spatial table
    mburazor Newbie
    Currently Being Moderated
    Thank you Adam and John. I tried to use direct path and didn't drop spatial index before the load. I got the following error:

    ORA-39776: fatal Direct Path API error loading table CISDWSYS.VEHICLE_LOCATION_FACT
    ORA-29875: failed in the execution of the ODCIINDEXINSERT routine
    ORA-22054: underflow error
    ORA-06512: at "MDSYS.SDO_INDEX_METHOD_10I", line 720
    ORA-06512: at "MDSYS.SDO_INDEX_METHOD_10I", line 225
    SQL*Loader-2026: the load was aborted because SQL Loader cannot continue.

    Before the load, I modified the spatial index to have sdo_dml_batch_size=20000.
    Below is the output .log file that I got:
    SQL*Loader: Release 10.2.0.1.0 - Production on Wed Jul 18 10:09:34 2012
    
    Copyright (c) 1982, 2005, Oracle.  All rights reserved.
    
    Control File:   Load_Vehicle_Locations_2.ctl
    Data File:      CIS_VehLoc_L05120602gps.txt
      Bad File:     CIS_VehLoc_L05120602gps.bad
      Discard File:  none specified
     
     (Allow all discards)
    
    Number to load: ALL
    Number to skip: 0
    Errors allowed: 50
    Continuation:    none specified
    Path used:      Direct
    
    Table VEHICLE_LOCATION_FACT, loaded from every logical record.
    Insert option in effect for this table: APPEND
    
       Column Name                  Position   Len  Term Encl Datatype
    ------------------------------ ---------- ----- ---- ---- ---------------------
    VEHICLE_LOCATION_MESSAGE_ID                               CONSTANT
        Value is '111'
    MESSAGE_DATETIME                    FIRST     *   ,       DATE YYYY-MM-DD HH24:MI:SS
    ROUTE_NUMBER                         NEXT     *   ,       CHARACTER            
    RUN_NUMBER                           NEXT     *   ,       CHARACTER            
    VEHICLE_NUMBER                       NEXT     *   ,       CHARACTER            
    BADGE_NUMBER                         NEXT     *   ,       CHARACTER            
    TRIP_DIRECTION                       NEXT     *   ,       CHARACTER            
    GPS_LONGITUDE_NUMBER                 NEXT     *   ,       CHARACTER            
    GPS_LATITUDE_NUMBER                  NEXT     *   ,       CHARACTER            
    DIVISION_NUMBER                      NEXT     *   ,       CHARACTER            
    VEHICLE_LOCATION                  DERIVED     *           COLUMN OBJECT        
    
    *** Fields in VEHICLE_LOCATION
    SDO_GTYPE                            NEXT     *   ,       CHARACTER            
    SDO_SRID                                                  CONSTANT
        Value is '8265'
    SDO_POINT                         DERIVED     *           COLUMN OBJECT        
    
    *** Fields in VEHICLE_LOCATION.SDO_POINT
    X                                    NEXT     *   ,       CHARACTER            
    Y                                    NEXT     *   ,       CHARACTER            
    *** End of fields in VEHICLE_LOCATION.SDO_POINT
    
    *** End of fields in VEHICLE_LOCATION
    
    
    ORA-39776: fatal Direct Path API error loading table CISDWSYS.VEHICLE_LOCATION_FACT
    ORA-29875: failed in the execution of the ODCIINDEXINSERT routine
    ORA-22054: underflow error
    ORA-06512: at "MDSYS.SDO_INDEX_METHOD_10I", line 720
    ORA-06512: at "MDSYS.SDO_INDEX_METHOD_10I", line 225
    
    SQL*Loader-2026: the load was aborted because SQL Loader cannot continue.
    
    Table VEHICLE_LOCATION_FACT:
      0 Rows successfully loaded.
      0 Rows not loaded due to data errors.
      0 Rows not loaded because all WHEN clauses were failed.
      0 Rows not loaded because all fields were null.
    
      Date cache:
       Max Size:      1000
       Entries :        23
       Hits    :      4977
       Misses  :         0
    
    Bind array size not used in direct path.
    Column array  rows :    5000
    Stream buffer bytes:  256000
    Read   buffer bytes:20971520
    
    Total logical records skipped:          0
    Total logical records rejected:         0
    Total logical records discarded:        0
    Total stream buffers loaded by SQL*Loader main thread:        0
    Total stream buffers loaded by SQL*Loader load thread:        1
    
    Run began on Wed Jul 18 10:09:34 2012
    Run ended on Wed Jul 18 10:09:49 2012
    
    Elapsed time was:     00:00:15.52
    CPU time was:         00:00:00.15
    And here is sample data from the data input file:
    2012-06-02 3:31:2, 0,0,1125,0, 99, -79.323532, 43.663567, 05, 2001, -79.323532, 43.663567
    2012-06-02 3:31:2, 0,0,1116,0, 99, -79.447197, 43.639065, 05, 2001, -79.447197, 43.639065
    2012-06-02 3:31:2, 0,0,1132,0, 99, -79.447365, 43.639065, 05, 2001, -79.447365, 43.639065
    2012-06-02 3:31:2, 0,0,1129,0, 99, -79.323753, 43.664185, 05, 2001, -79.323753, 43.664185
    2012-06-02 3:31:2, 0,0,1139,0, 99, -79.323631, 43.663483, 05, 2001, -79.323631, 43.663483
    Thank you,
    Milan
  • 4. Re: Load millions of records into spatial table
    mburazor Newbie
    Currently Being Moderated
    I couldn't find more information on this "underflow error" with spatial index that I got in previous load.
    After I got an error when loading with spatial index present, I tried to do another load without index and then recreate index. The process was completed without any errors and the spatial index was recreated without any errors, so I'm not sure what kind of "underflow error" happened in the previous load (with index present) because all data records have valid values.
    The steps were:
    1. Drop the spatial index
    2. Load 5 million records, Direct_path=yes, Append mode. It took 3 minutes.
    3. Recreate the spatial index. It took about 70 minutes and the statement is here:
    CREATE INDEX "CISDWSYS"."VEHICLELOCATIONFACT_SP1" ON "CISDWSYS"."VEHICLE_LOCATION_FACT"
    ("VEHICLE_LOCATION")
    INDEXTYPE IS "MDSYS"."SPATIAL_INDEX" PARAMETERS ('sdo_indx_dims=2, layer_gtype=point, sdo_dml_batch_size=20000');

    Can you please help with explaining the reason for "underflow error" with spatial index that I reported in previous comment here? I don't understand what is wrong when all data records have correct values.

    Thank you,
    Milan
  • 5. Re: Load millions of records into spatial table
    mburazor Newbie
    Currently Being Moderated
    An Oracle document says that SQL*Loader direct path loading can't be used on a table with a spatial index (domain index), because spatial index is defined on a column with spatial data type and that is LOB type of column.

    The document is "Oracle® Database Data Cartridge Developer's Guide 11g Release 2 (11.2)" and the link is:

    http://docs.oracle.com/cd/E11882_01/appdev.112/e10765/dom_idx.htm#ADDCI290

    Paragraph from this book about this data loading limitation is:

    >
    Domain Indexes and SQL Loader

    SQL*Loader conventional path loads and direct path loads are supported for tables on which domain indexes are defined, with two limitations:

    •The table must be heap-organized.

    •The domain index cannot be defined on a LOB column.

    To do a direct path load on a domain index defined on an IOT or on a LOB column, perform these tasks:

    1.Drop the domain index

    2.Do the direct path load in SQL*Loader.

    3.Re-create the domain indexes.
    >
    A solution in my case is to create daily partitions for this huge table with spatial (points) locations, since re-create spatial index partition in this case takes about 15 minutes and this is ok. That's much better than re-creating spatial index monthly partition which takes hours, or loading millions of records without direct path method (and no need to drop spatial index partition) which takes even more hours. Our process is based on daily data loading anyway, so it is ok for us.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points