2 Replies Latest reply on Jul 11, 2008 5:37 AM by 649337

    Segmentation fault during bulk loading LUBM(8000) data.

    649337
      Hello, everybody.
      I were loading LUBM(8000) data into Oracle 11g R1 by using bulk load, the .nt file's size is over 80GB. Segmentation fault comes out while the sql*load is processing, does anybody could kindly tell me why and how to overcome it?

      Many thanks in advance.

      PS.
      I had configured Oracle follow the "Best Practices with RDFS/OWL" document described.

      Platform information like this:
      OS: Fedora 6
      CPU: Intel Xeon (4-core), 2 GHz. 2 pieces
      Memory:     4GB
      Hard Disk:     250GB+75GB, 7,200rpm, SATA. 2 piece
        • 1. Re: Segmentation fault during bulk loading LUBM(8000) data.
          Sdas-Oracle
          We have not seen such errors during sqlldr (SQL*Loader) execution. Not sure if this could be related to disk space availability.

          You can re-try with the following settings (also see http://www.oracle.com/technology/tech/semantic_technologies/htdocs/performance.html):

          - to reduce space requirement for staging table, create it with COMPRESS option
          - to reduce space requirement for the application table, create it with COMPRESS option
          - use the 75G disk for storing the staging table and the application table (it is a bit tight)
          - use the 250G disk for temp tablespace and for the Semantic Network

          - to make sqlldr faster, bypass strict parsing by using the simpler .ctl file shown below
          - you may want to use some of the following parameter settings to get faster loading:
          o sga_target=1800M
          o pga_aggregate_target=1800M
          o db_file_multiblock_read_count=128
          o filesystemio_options='SETALL'

          If you are starting with an empty network or if size of RDF_VALUE$ is much much smaller than the load you are attempting, you could use the optional parameter "flags" when invoking the bulk-load API: flags => ' VALUES_TABLE_INDEX_REBUILD ' to reduce the time needed for indexing the RDF_VALUE$ table during bulk-load.

          Please note that
          o the temp tablespace will grow to about 100GB-120GB
          o separating the temp tablespace from others probably would make loading faster
          o but that is not possible here due to disk sizes being 250GB and 75GB

          simplified control file (checks only length of RDF values):
          =========================================
          UNRECOVERABLE
          LOAD DATA
          TRUNCATE
          into table stable
          when (1) <> '#'
          (
          RDF$STC_sub CHAR(4000) terminated by whitespace,
          RDF$STC_pred CHAR(4000) terminated by whitespace,
          RDF$STC_obj CHAR(5000) "rtrim(:RDF$STC_obj,'. '||CHR(9)||CHR(10)||CHR(13))"
          )

          Hope this helps.
          • 2. Re: Segmentation fault during bulk loading LUBM(8000) data.
            649337
            Hello, thanks for your respones. I'm sorry for the mistake about my hardware configuration.
            The hard disk is 250GB+750GB not 250GB+75GB, so disk space may not be the main cause. But I will definitely try the approach you specified.

            Another thing I would like you to know that is, because we're benchmarking it, we need to capture the CPU, Memory and Hard Disk I/O information during the bulkload. So I used some shells(for example, cpu.csh, memory.csh and diskio.csh corresponding to the different needs) to do this. Please let show you one of the shells(cpu.csh), others are similar to it:

            #!/bin/csh
            set count=1;
            set sum=0;
            set temp=0;
            #echo $BenchmarkFlag
            while $count
            set a=`find ./bulkload/bul*|wc -l`;
            if ($a == 1) then
            set count = `expr $count - 1`;
            if ($count == 0) then
            break
            endif
            #accurate to 0.01
            set average=`echo "scale=2;$sum / $count"|bc`;
            echo "cpu: sum="$sum,"count="$count,"average="$average >> result.log;
            break
            endif
            #get data of row4,#%user column
            set temp=`mpstat|sed -n '4,'4'p'|awk '{print $4}'`;
            #set sum=`expr $sum + 0 + $temp`;
            set sum=`echo "scale=2;$sum + 0.00 + $temp"|bc`;
            set count=`expr $count + 1`;
            echo $temp >> cpu.log

            sleep 0.001;

            end

            exit


            Until now, we have made two assumptions which may cause the "Segmentation Fault" error, but not yet to be confirmed:
            1. Caused by the shell command mentioned above. I'm not sure is there any better ways to do it. :(
            2. The limitation of the 32-bit platform. But according to the link you posted, it doesn't mention which platform is used.

            If you have any opnions on that, could you please me know?
            Thanks for your attention!