5 Replies Latest reply: Nov 1, 2012 10:18 AM by mburazor RSS

    Load millions of records into spatial table

    mburazor
      Hello,
      I would appreciate some help on how to organize data loading process for a really big spatial table (sdo_point).
      The table gets 10 million new records every morning. The table is partitioned by a date column, so it is one partition for one month of data. Spatial index is also partitioned the same way. Oracle 11.2 database and sqlldr from 10.2 and 11.2 oracle clients.
      I'm looking for a good way to use SQL*Loader to load these daily records in an optimal way.
      I tried to set up the process this way:
      1) Drop spatial index.
      2) Run sqlldr to load text file data into the spatial table. It takes about 3 minutes. It is not "direct path".
      3) Re-create spatial index that takes about 45-60 minutes.
      I'm looking for a way to make the whole loading process even faster. Maybe someone can suggest a better way?

      Thank you,
      Milan
        • 1. Re: Load millions of records into spatial table
          AdamMartin
          Don't drop the index. Use direct path inserts (append or parallel).
          • 2. Re: Load millions of records into spatial table
            John O'Toole
            Hi Milan,

            RTree spatial index updates are pretty quick, so I would first try inserting the 10 million points with the spatial index in place - experiment with higher sdo_dml_batch_size values to see if it helps.

            If that proves too slow, then I would try the following:
            1) Create a staging table with the contents of the last partition
            2) Do a SQL*Loader direct path insert of the 10 million points into that staging table
            3) Build a spatial index on that table
            4) Use Partition Exchange to swap the last partition for your staging table, exchanging the spatial index too

            Have a look at using External Tables rather than SQL*Loader - one advantage is that it makes it easier to script the whole thing in pl/sql.
            Also, make sure you are using the "layer_gtype=point" parameter on your RTree spatial indexes.

            Regards,
            John
            • 3. Re: Load millions of records into spatial table
              mburazor
              Thank you Adam and John. I tried to use direct path and didn't drop spatial index before the load. I got the following error:

              ORA-39776: fatal Direct Path API error loading table CISDWSYS.VEHICLE_LOCATION_FACT
              ORA-29875: failed in the execution of the ODCIINDEXINSERT routine
              ORA-22054: underflow error
              ORA-06512: at "MDSYS.SDO_INDEX_METHOD_10I", line 720
              ORA-06512: at "MDSYS.SDO_INDEX_METHOD_10I", line 225
              SQL*Loader-2026: the load was aborted because SQL Loader cannot continue.

              Before the load, I modified the spatial index to have sdo_dml_batch_size=20000.
              Below is the output .log file that I got:
              SQL*Loader: Release 10.2.0.1.0 - Production on Wed Jul 18 10:09:34 2012
              
              Copyright (c) 1982, 2005, Oracle.  All rights reserved.
              
              Control File:   Load_Vehicle_Locations_2.ctl
              Data File:      CIS_VehLoc_L05120602gps.txt
                Bad File:     CIS_VehLoc_L05120602gps.bad
                Discard File:  none specified
               
               (Allow all discards)
              
              Number to load: ALL
              Number to skip: 0
              Errors allowed: 50
              Continuation:    none specified
              Path used:      Direct
              
              Table VEHICLE_LOCATION_FACT, loaded from every logical record.
              Insert option in effect for this table: APPEND
              
                 Column Name                  Position   Len  Term Encl Datatype
              ------------------------------ ---------- ----- ---- ---- ---------------------
              VEHICLE_LOCATION_MESSAGE_ID                               CONSTANT
                  Value is '111'
              MESSAGE_DATETIME                    FIRST     *   ,       DATE YYYY-MM-DD HH24:MI:SS
              ROUTE_NUMBER                         NEXT     *   ,       CHARACTER            
              RUN_NUMBER                           NEXT     *   ,       CHARACTER            
              VEHICLE_NUMBER                       NEXT     *   ,       CHARACTER            
              BADGE_NUMBER                         NEXT     *   ,       CHARACTER            
              TRIP_DIRECTION                       NEXT     *   ,       CHARACTER            
              GPS_LONGITUDE_NUMBER                 NEXT     *   ,       CHARACTER            
              GPS_LATITUDE_NUMBER                  NEXT     *   ,       CHARACTER            
              DIVISION_NUMBER                      NEXT     *   ,       CHARACTER            
              VEHICLE_LOCATION                  DERIVED     *           COLUMN OBJECT        
              
              *** Fields in VEHICLE_LOCATION
              SDO_GTYPE                            NEXT     *   ,       CHARACTER            
              SDO_SRID                                                  CONSTANT
                  Value is '8265'
              SDO_POINT                         DERIVED     *           COLUMN OBJECT        
              
              *** Fields in VEHICLE_LOCATION.SDO_POINT
              X                                    NEXT     *   ,       CHARACTER            
              Y                                    NEXT     *   ,       CHARACTER            
              *** End of fields in VEHICLE_LOCATION.SDO_POINT
              
              *** End of fields in VEHICLE_LOCATION
              
              
              ORA-39776: fatal Direct Path API error loading table CISDWSYS.VEHICLE_LOCATION_FACT
              ORA-29875: failed in the execution of the ODCIINDEXINSERT routine
              ORA-22054: underflow error
              ORA-06512: at "MDSYS.SDO_INDEX_METHOD_10I", line 720
              ORA-06512: at "MDSYS.SDO_INDEX_METHOD_10I", line 225
              
              SQL*Loader-2026: the load was aborted because SQL Loader cannot continue.
              
              Table VEHICLE_LOCATION_FACT:
                0 Rows successfully loaded.
                0 Rows not loaded due to data errors.
                0 Rows not loaded because all WHEN clauses were failed.
                0 Rows not loaded because all fields were null.
              
                Date cache:
                 Max Size:      1000
                 Entries :        23
                 Hits    :      4977
                 Misses  :         0
              
              Bind array size not used in direct path.
              Column array  rows :    5000
              Stream buffer bytes:  256000
              Read   buffer bytes:20971520
              
              Total logical records skipped:          0
              Total logical records rejected:         0
              Total logical records discarded:        0
              Total stream buffers loaded by SQL*Loader main thread:        0
              Total stream buffers loaded by SQL*Loader load thread:        1
              
              Run began on Wed Jul 18 10:09:34 2012
              Run ended on Wed Jul 18 10:09:49 2012
              
              Elapsed time was:     00:00:15.52
              CPU time was:         00:00:00.15
              And here is sample data from the data input file:
              2012-06-02 3:31:2, 0,0,1125,0, 99, -79.323532, 43.663567, 05, 2001, -79.323532, 43.663567
              2012-06-02 3:31:2, 0,0,1116,0, 99, -79.447197, 43.639065, 05, 2001, -79.447197, 43.639065
              2012-06-02 3:31:2, 0,0,1132,0, 99, -79.447365, 43.639065, 05, 2001, -79.447365, 43.639065
              2012-06-02 3:31:2, 0,0,1129,0, 99, -79.323753, 43.664185, 05, 2001, -79.323753, 43.664185
              2012-06-02 3:31:2, 0,0,1139,0, 99, -79.323631, 43.663483, 05, 2001, -79.323631, 43.663483
              Thank you,
              Milan
              • 4. Re: Load millions of records into spatial table
                mburazor
                I couldn't find more information on this "underflow error" with spatial index that I got in previous load.
                After I got an error when loading with spatial index present, I tried to do another load without index and then recreate index. The process was completed without any errors and the spatial index was recreated without any errors, so I'm not sure what kind of "underflow error" happened in the previous load (with index present) because all data records have valid values.
                The steps were:
                1. Drop the spatial index
                2. Load 5 million records, Direct_path=yes, Append mode. It took 3 minutes.
                3. Recreate the spatial index. It took about 70 minutes and the statement is here:
                CREATE INDEX "CISDWSYS"."VEHICLELOCATIONFACT_SP1" ON "CISDWSYS"."VEHICLE_LOCATION_FACT"
                ("VEHICLE_LOCATION")
                INDEXTYPE IS "MDSYS"."SPATIAL_INDEX" PARAMETERS ('sdo_indx_dims=2, layer_gtype=point, sdo_dml_batch_size=20000');

                Can you please help with explaining the reason for "underflow error" with spatial index that I reported in previous comment here? I don't understand what is wrong when all data records have correct values.

                Thank you,
                Milan
                • 5. Re: Load millions of records into spatial table
                  mburazor
                  An Oracle document says that SQL*Loader direct path loading can't be used on a table with a spatial index (domain index), because spatial index is defined on a column with spatial data type and that is LOB type of column.

                  The document is "Oracle® Database Data Cartridge Developer's Guide 11g Release 2 (11.2)" and the link is:

                  http://docs.oracle.com/cd/E11882_01/appdev.112/e10765/dom_idx.htm#ADDCI290

                  Paragraph from this book about this data loading limitation is:

                  >
                  Domain Indexes and SQL Loader

                  SQL*Loader conventional path loads and direct path loads are supported for tables on which domain indexes are defined, with two limitations:

                  •The table must be heap-organized.

                  •The domain index cannot be defined on a LOB column.

                  To do a direct path load on a domain index defined on an IOT or on a LOB column, perform these tasks:

                  1.Drop the domain index

                  2.Do the direct path load in SQL*Loader.

                  3.Re-create the domain indexes.
                  >
                  A solution in my case is to create daily partitions for this huge table with spatial (points) locations, since re-create spatial index partition in this case takes about 15 minutes and this is ok. That's much better than re-creating spatial index monthly partition which takes hours, or loading millions of records without direct path method (and no need to drop spatial index partition) which takes even more hours. Our process is based on daily data loading anyway, so it is ok for us.