    Extremely poor load performance into RDF store

      We are seeing unacceptably poor performance loading our RDF models. We have triedloading with Jena adaptor as well as loading using the BatchLoader class. Either way, the absolute best rate we have seen is about 20000/minute. Here are the specs:

      Loading a triples file from disk which contains 5.08 million triples
      Oracle 11gr2 running on Enterprise Linux64 with 16 GB RAM
      Oracle 11gr2 running on Solaris 10 with all the latest patches and packages...64 bit with 8GB RAM

      Sun is a bit slower than linux, but both are quite poor. I can load the identical triples file into Allegro running on my desktop PC and consistently get load rates of 1 million per minute.

      Can anyone suggest what could be causing this performance issue? Oracle is a very recent install on both the sun and linux boxes, and there is nothing whatsoever running on either one other than oracle, and I am the only oracle user. All install and parameter settings are oracle default.
          First of all, what is your database configuration? (SGA, PGA, etc.)

          Second, the BatchLoader class is doing batch loading. You said you tried Jena Adaptor.
          Did you try the bulk load function in Jena Adaptor? If you like, you can cut & paste the code snippet you used to do the data loading.


          Zhe Wu
            An update on the stats. I changed the way models are created, doing it manually from sqlplus first (vs letting the Jena stuff create the model on the fly), and that made things run about 3 times faster, but I still am not seeing anything better than triple load times of 79K/minute.
            Here is the java that runs the load:

            String model = "GMI";

            oracleGraph = oracleTools.makeGraph(model);
            oracleModel = new ModelOracleSem(oracleGraph);
            InputStream is = FileManager.get().open(props.getProperty(model +".NTRIPLES_IND_FILE"));
            StopWatch timer = new StopWatch();
            // read file contents into oracleModel
            oracleModel.read(is, "", "N-TRIPLE");
            System.out.println("\nLoad time into Oracle: " +timer.getElapsedTimeSecs());

              You may want to change these two things.

              1) your SGA setting is too low. Is there a particular reason for using only 512M?

              sga_max_size big integer 512M
              sga_target big integer 512M

              2) For the data loading, you are actually using incremental API. Can you please try
              the bulk load API?

              Please search for addInBulk on page 6 of the following document (Jena Adaptor v2.0).

              Also, you can look at Test7 in the following document (Jena Adapter for Release 11.2).

              You can also try adjusting the filesystemio_options parameter setting.


              Zhe Wu
                Thanks for the advice and quick reply. I implemented the changes you suggested. By using the bulk update methods, it made the load run quite a bit faster. My load rates are now 215K triples/minute. With the incremental load methods, I was getting 70K-80K/minute. However, this is still magnitudes short of the LUBM benchmark numbers. I bumped up both sga_max_size and sga_target to 4G (from 512M), but that actually made the load run slower (down to 177K triples/minute).

                Can you think of any other things I can change to make these loads run faster. I believe I should see load rates at least 4x faster than I am seeing right now.

                  215k triples/minute => 12.9 Million triples per hour.

                  What kind of disk storage do you have for this system? Do you have a single disk, or multiple disks with ASM, or multiple disks with RAID (and ASM)?


                    We are running single disk. No RAID or ASM.
                      It is very likely that your system is I/O bound. For obvious reasons, data loading is very I/O intensive (Oracle persists data).
                      Oracle Database semantic technologies is not an in-memory solution. Having a good I/O subsystem is therefore critical to the overall
                      performance including loading, query, and inference.

                      Is it possible to add a few (say two) more physical disks to your compute and start using ASM? Alternatively, a single SSD
                      (solid state disk) can help too.

                      Now you have 4 cores but just a single disk in your computer. Increasing the I/O capacity will lead to a more balanced hardware.


                      Zhe Wu