6 Replies Latest reply on Jun 30, 2017 9:17 AM by 3274437

    Duplicates on table RDF_DATA

    3274437

      Dear all,

       

      I would like to remove the duplicates from my RDF_DATA table.

       

      If I do a count like the following:

       

      select count(*) from rdf_data;
      

       

      I get 215,743,205 rows.

       

      But if I count like:

       

      SELECT num FROM TABLE(SEM_MATCH(
          'SELECT (count(*) AS ?num) 
           WHERE { ?s ?p ?o }',
      SEM_Models('imdb'),null, null, null));
      

       

      I get 172,009,124 triples

       

      There is only model "imdb" on table RDF_DATA.

       

      So, I guess that are a lot of duplicated triples in RDF_DATA table. I tried to execute SEM_APIS.REMOVE_DUPLICATES but it does not remove the duplicates from RDF_DATA. And I also think that these duplicates are consuming extra unnecessary space. Is this right?

       

      So my question is, can I remove the duplicates from RDF_DATA?

       

      Thanks in advance,

      Elisa.

        • 1. Re: Duplicates on table RDF_DATA
          Matperry-Oracle

          Elisa,

           

          The default threshold of 0.3 is probably too high for your data. Please re-try with a lower threshold.

           

          alter session force parallel query parallel n;

          alter session force parallel dml parallel n;

          alter session force parallel ddl parallel n;

          begin

            sem_apis.remove_duplicates(

              model_name=>'MY_MODEL',

              threshold=>0.05,

              rebuild_apptab_index=>TRUE);

          end;

          /

           

          Thanks,
          Matt

          • 2. Re: Duplicates on table RDF_DATA
            3274437

            Hi Matt,

             

            Thanks for the quick reply. I got this error when changing the threshold:

             

            begin

              sem_apis.remove_duplicates(

                model_name=>'imdb',

                threshold=>0.05,

                rebuild_apptab_index=>TRUE);

            end;

            Relatório de erros -

            ORA-06502: PL/SQL: erro: character to number conversion error numérico ou de valor

            ORA-06512: em line 2

            06502. 00000 -  "PL/SQL: numeric or value error%s"

            *Cause:    An arithmetic, numeric, string, conversion, or constraint error

                       occurred. For example, this error occurs if an attempt is made to

                       assign the value NULL to a variable declared NOT NULL, or if an

                       attempt is made to assign an integer larger than 99 to a variable

                       declared NUMBER(2).

            *Action:   Change the data, how it is manipulated, or how it is declared so

                       that values do not violate constraints.

            • 3. Re: Duplicates on table RDF_DATA
              Matperry-Oracle

              Please try alter session set nls_numeric_characters='.,'; before the call to remove_duplicates.

              1 person found this helpful
              • 4. Re: Duplicates on table RDF_DATA
                3274437

                Hi Matt,

                 

                I got the following error now, I guess I don't have enough space to delete the duplicates. Is there a work around on that?

                 

                SQL> begin
                  sem_apis.remove_duplicates(
                    model_name=>'imdb',
                    threshold=> 0.05,
                    rebuild_apptab_index=>TRUE);
                end;
                /  2    3    4    5    6    7 
                begin
                *
                ERROR at line 1:
                ORA-01652: unable to extend temp segment by 8192 in tablespace SIG_RI
                ORA-06512: at "MDSYS.SDO_RDF", line 4930
                ORA-06512: at "MDSYS.RDF_APIS", line 1363
                ORA-06512: at line 2
                

                 

                Thanks in advance,

                Elisa.

                • 5. Re: Duplicates on table RDF_DATA
                  Matperry-Oracle

                  Elisa,

                   

                  remove_duplicates uses quite a bit of temporary space to make the operation fast. If you can, it's best to add some more space to your temporary tablespace and try again.

                   

                  I think an in-place incremental delete of 40 million rows will be too much. It would be better to just live with the duplicates in the application table. There are no duplicates in RDF_LINK$. There will just be a single row per triple with a count value equal to the number of duplicates. You could check the size of the application table to get an estimate of the space you are wasting on duplicates. It may not be a huge amount.

                   

                  Thanks,
                  Matt

                  • 6. Re: Duplicates on table RDF_DATA
                    3274437

                    Hi Matt,

                     

                    Okay, thanks anyway!

                     

                    Elisa.