3 Replies Latest reply on Mar 9, 2015 7:56 PM by alwu-Oracle

    model1.difference(model2) very slow in Oracle vs Jena

    arnaudz

             I have two models and want to find the difference/intersection between the two. When I use Jena in memory and do :

                  Model model1 = ModelFactory.createDefaultModel();

                   Model model = ModelFactory.createDefaultModel();

                   InputStream in = FileManager.get().open(path);

                    model.read(in, null, "N-TRIPLE");

                    System.out.println(model.size());

                    InputStream in2 = FileManager.get().open(path2);

                    model1.read(in2, null, "N-TRIPLE");

                    System.out.println(model1.size());

                    Model supp = model.difference(model1);

                    Model add = model1.difference(model);

        everithing work well but as soon I load my model in Oracle and try :

                ModelOracleSem model = ModelOracleSem.createOracleSemModel(oracle, modelName1);

                 ModelOracleSem model1 = ModelOracleSem.createOracleSemModel(oracle, modelName2);

                  Model supp = model.difference(model1);

                   Model add = model1.difference(model);

      it's very very slow and I dont get the results... each model is nearly 1 000 000 triples.

      What should I do to get the same result with Oracle?

        • 1. Re: model1.difference(model2) very slow in Oracle vs Jena
          alwu-Oracle

          Hi,

           

          ModelOracleSem is a proxy to a semantic model in Oracle Database. For scalability, we don't cache triple/quad data in ModelOracleSem.

          This difference() method will cause many, many round trips against the Oracle Database, and that is not efficient.

           

          To achieve what you want, you can create an empty in memory Jena model, add that ModelOracleSem instance to the in memory Jena model,

          and then run the difference() method. Basically, we run the comparison in memory.

           

          If there are no blank nodes, then an efficient alternative is to perform a SQL "diff"

           

          e.g.

           

          select  start_node_id, p_value_id, canon_end_node_id from mdsys.rdfm_model1

          minus

          select  start_node_id, p_value_id, canon_end_node_id from mdsys.rdfm_model2;

           

          You will need to do model2 minus model1 as well.

           

          Hope it helps,

           

          Zhe Wu

          • 2. Re: model1.difference(model2) very slow in Oracle vs Jena
            arnaudz

            Hi

            Thanks for the answer. I have two more question:

            Can perform the diff/intersection using sem_match?

            How should I proceed to store the result of SQL "diff" in a new semantic model?.

             

            thanks

            • 3. Re: model1.difference(model2) very slow in Oracle vs Jena
              alwu-Oracle

              Yes, you can. SEM_MATCH returns you a "table" which can be used to perform a diff or intersection, or whatever other SQL operations.

               

              Here is an example on storing SQL "diff" in a new model. (Note that the assumption here is that there is no

              blank Nodes in either models).

               

              -- First create three models

              --

              create table b1_tpl (triple sdo_rdf_triple_s) compress;

              exec sem_apis.create_sem_model('b1', 'b1_tpl', 'triple');

               

              create table b2_tpl (triple sdo_rdf_triple_s) compress;

              exec sem_apis.create_sem_model('b2', 'b2_tpl', 'triple');

               

              create table b3_tpl (triple sdo_rdf_triple_s) compress;

              exec sem_apis.create_sem_model('b3', 'b3_tpl', 'triple');

               

               

              --

              -- Get the model ID of B3, you may see something different

              --

              select model_id from mdsys.rdf_model$ where model_name='B3';

              ==>

              55

               

               

              -- Add two triples to b1

              insert into b1_tpl(triple) values(sdo_rdf_triple_s('b1','<urn:s1>','<urn:p1>','<urn:o1>'));

              insert into b1_tpl(triple) values(sdo_rdf_triple_s('b1','<urn:s2>','<urn:p2>','<urn:o2>'));

               

              -- add two triples to b2

              insert into b2_tpl(triple) values(sdo_rdf_triple_s('b2','<urn:s1>','<urn:p1>','<urn:o1>'));

              insert into b2_tpl(triple) values(sdo_rdf_triple_s('b2','<urn:s3>','<urn:p3>','<urn:o3>'));

              commit;

               

               

              --

              -- Now add the diff (B1 - B2) into a new model B3. Note that 55 is the model ID of B3

              --

              insert into b3_tpl(triple) select sdo_rdf_triple_s(cid, mid, sid, pid, oid) from (

                select cid, 55 mid, sid, pid, cid oid

                 from (

                    select START_NODE_ID sid, p_value_id pid, CANON_END_NODE_ID cid

                      from mdsys.rdfm_b1

                    minus

                    select START_NODE_ID sid, p_value_id pid, CANON_END_NODE_ID cid

                      from mdsys.rdfm_b2

                 )

              )

              ;

               

              commit;

              ==>

              1 row created.

               

               

              select s || ' ' || p || ' ' || o

              from table(sem_match('{?s ?p ?o}', sem_models('b3'), null, null, null, null, null));

              ==>

              S||''||P||''||O

              --------------------------------------------------------------------------------

              urn:s2 urn:p2 urn:o2

               

              You can add B2-B1 into B3 as well.

               

              If the "difference" is very big, then we can dump the diff into a staging table and use the bulk loader API.

               

              Hope it helps,

               

              Zhe Wu