8 Replies Latest reply on Jun 3, 2008 1:51 PM by alwu-Oracle

    Creating indexes

    610533
      I've got a database of 100 million triples and I have decided that it is about time I created some indexes.
      Based on the docs I have tried to create indexes similar (except for name) to the following.
      -- Create indexes on the subjects, properties, and objects
      -- in the FAMILY_RDF_DATA table.
      CREATE INDEX family_sub_idx ON family_rdf_data (triple.GET_SUBJECT());
      CREATE INDEX family_prop_idx ON family_rdf_data (triple.GET_PROPERTY());
      CREATE INDEX family_obj_idx ON family_rdf_data (TO_CHAR(triple.GET_OBJECT()));

      The first one ran for two days before giving an error about rollback not big enough.
      Firstly Can you tell me.
      How long you would expect the creation of an index to take?

      Secondly what sort of size do I need the rollback segment to be able do this?

      Thirdly. Is it better to create the indexes before adding data?
      If so what impact will it have on the loading of data?

      Fourthly
      Is it good practice to create a compound index on multiple things as well?

      Cheers

      Phil
        • 1. Re: Creating indexes
          alwu-Oracle
          Let us take a step back and ask one question first "What is the purpose of creating all these indexes on application table?"

          Note that none of these indexes you created on application table will be used when SEM_MATCH (or SDO_RDF_MATCH) is executed. Unless you want to combine SEM_MATCH results with some other data in your application table.
          • 2. Re: Creating indexes
            610533
            I use Topbraid composer as my interface.

            So if I click on a class I want the info about it displayed as fast as possible.
            If I click on an instance I want to see the data as fast as possible.

            So creating these indexes helps.

            I did this with another database with 7 million triples and it speeded things up considerably.
            Therfore I want to do the same now but, my 100 million triples datastore is having problems.

            If this is the wrong appraoch to speeding things up then please tell me the correct approach., because at the moment waiting 45 seconds for the information about a resources to be retrieved is just too long and painful.


            Cheers

            Phil
            • 3. Re: Creating indexes
              alwu-Oracle
              I don't think TBC uses indexes on application table for queries. TBC uses Oracle Jena adatpor to talk to 11g. And Oracle Jena Adaptor uses sem_match to answer queries that are posed against GraphOracleSem or ModelOracleSem.

              I am a bit unsure about "speeded up things considerably." What kind of queries about we talking about?

              From what you described, it seems that you are sending a query about a subject. I will debug this case and let you know the results.
              • 4. Re: Creating indexes
                alwu-Oracle
                BTW, which version of TBC are you using?
                • 5. Re: Creating indexes
                  610533
                  Hi there
                  I'm using version 2.5.4 the latest version.
                  I have no idea what sparql queries are issued from TBC, but I guess you are right it is likely to be using a sem match query.
                  I see I misread the docs and that it says that these indexes will NOT effect semmatch operations. I read it as it will effect it... so my fault.

                  But I guess in TBC if you click on a class you are issuing a sparql select for P and O where the S is that class type.
                  When I click on the instances tab you are issuing a select s,p,o where the S is of the class type.
                  And when clicking on an instance to display in the form I guess you are issuing a select P, O where s is that instance.


                  Ok then so what is the best practice for speeding up queries.

                  The triple store I am creating is an rdf replica of a traditional oracle rdbms db.

                  By creating indexes in the normal db I can get queries that involve many joins to return in just a few milliseconds. In the
                  If there is no way to do the same with an oracle RDF database I'm not sure I can move forward with this approach using oracle, which would be a great shame.
                  Speed is everything and waiting 45 seconds for data to come back is way to long.

                  Is there a doc or webpage that discusses optimising query speed?

                  I am currently loading another 50 million triples but when it has finished I will put together some simple queries to imitate retrieving the data that would be displayed in the TBC forms tab, and let you know what they are and how long they take.

                  Any info gratefully received in the meantime.

                  Cheers

                  Phil
                  • 6. Re: Creating indexes
                    610533
                    Hi There

                    Here is a list of queries I have put togther.
                    I have added text at the top to describe what the query is asking.
                    At the bottom of each query I have put the rows returned and the timings for 3 sequential searches using the query.

                    At the beginning I retsarted the instance and then ran all the queries.

                    Any help to improve performance would be gratefully recieved.

                    Cheers

                    Phil

                    -- Tell me what classes are in the db
                    SELECT s
                    FROM TABLE(SEM_MATCH(
                    '(?s rdf:type <http://www.w3.org/2002/07/owl#Class>)',
                    SEM_Models('INCHIKEY'),
                    SEM_Rulebases(NULL),
                    SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                    null
                    ));
                    -- 3 rows 5secs, 1 millisecs, 1 millisecs

                    -- tell me what datatype properties are in the db
                    SELECT s
                    FROM TABLE(SEM_MATCH(
                    '(?s rdf:type <http://www.w3.org/2002/07/owl#DatatypeProperty>)',
                    SEM_Models('INCHIKEY'),
                    SEM_Rulebases(NULL),
                    SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                    null
                    ));
                    -- 6 rows 203 millisecs, 15 mililsecs, 16 millisecs

                    -- Tell me what properties exist for each datatypeproperty
                    SELECT s, p, o
                    FROM TABLE(SEM_MATCH(
                    '(?s rdf:type <http://www.w3.org/2002/07/owl#DatatypeProperty>)
                    (?s ?p ?o)',
                    SEM_Models('INCHIKEY'),
                    SEM_Rulebases(NULL),
                    SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                    null
                    ));
                    -- 12 rows 52secs, 48 secs, 53 secs


                    -- Tell me what objectproperties exist
                    SELECT s
                    FROM TABLE(SEM_MATCH(
                    '(?s rdf:type <http://www.w3.org/2002/07/owl#ObjectProperty>)',
                    SEM_Models('INCHIKEY'),
                    SEM_Rulebases(NULL),
                    SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                    null
                    ));

                    -- 2 rows 141 millisecs, 1 mililsecs, 1 millisecs


                    -- Tell me what properties exist for each objectroperty
                    SELECT s, p, o
                    FROM TABLE(SEM_MATCH(
                    '(?s rdf:type <http://www.w3.org/2002/07/owl#ObjectProperty>) (?s ?p ?o)',
                    SEM_Models('INCHIKEY'),
                    SEM_Rulebases(NULL),
                    SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                    null
                    ));

                    -- 2 rows 59 secs, 51 secs, 50 secs

                    -- tell me all of the properties for a particular class
                    SELECT p, o
                    FROM TABLE(SEM_MATCH(
                    '(<http://www.ucb-group.com/idac/supplier#Supplier> ?p ?o)',
                    SEM_Models('INCHIKEY'),
                    SEM_Rulebases(NULL),
                    SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                    null
                    ));

                    -- 1 row 49 secs, 48 secs. 48 secs

                    -- tell me all the entries for a particular class
                    SELECT s
                    FROM TABLE(SEM_MATCH(
                    '(?s rdf:type <http://www.ucb-group.com/idac/supplier#Supplier>)',
                    SEM_Models('INCHIKEY'),
                    SEM_Rulebases(NULL),
                    SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                    null
                    ));

                    -- 63 rows 16 secs, 1 millisec, 16 millisecs,

                    -- tel me all the properties for an entry
                    SELECT p, o
                    FROM TABLE(SEM_MATCH(
                    '(<http://www.ucb-group.com/idac/supplier#61> ?p ?o)',
                    SEM_Models('INCHIKEY'),
                    SEM_Rulebases(NULL),
                    SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                    null
                    ));
                    -- 7 rows 49 secs, 48 secs, 51 secs


                    -- tel me all the properties for an entry with property info included
                    SELECT p, o, pp, oo
                    FROM TABLE(SEM_MATCH(
                    '(<http://www.ucb-group.com/idac/supplier#61> ?p ?o) (?p ?pp ?oo)',
                    SEM_Models('INCHIKEY'),
                    SEM_Rulebases(NULL),
                    SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                    null
                    ));
                    -- 9 rows 1 min 44secs, 1 min 40 secs, 1 min 43 secs
                    • 7. Re: Creating indexes
                      610533
                      HI There
                      I ran the SEM_PERF.GATHER_STATS(); procedure and then redid the timings.
                      There is some improvement, but I would be grateful if you could suggest how to optimise the speeds further.
                      I have included the new timings below.

                      Cheers

                      Phil

                      -- Tell me what classes are in the db
                      SELECT s
                      FROM TABLE(SEM_MATCH(
                      '(?s rdf:type <http://www.w3.org/2002/07/owl#Class>)',
                      SEM_Models('INCHIKEY'),
                      SEM_Rulebases(NULL),
                      SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                      null
                      ));
                      -- 3 rows 5 secs, 1 millisecs, 1 millisecs
                      -- 15 secs, 1 millisecs, 16 millisecs

                      -- tell me what datatype properties are in the db
                      SELECT s
                      FROM TABLE(SEM_MATCH(
                      '(?s rdf:type <http://www.w3.org/2002/07/owl#DatatypeProperty>)',
                      SEM_Models('INCHIKEY'),
                      SEM_Rulebases(NULL),
                      SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                      null
                      ));
                      -- 6 rows 203 millisecs, 15 mililsecs, 16 millisecs
                      -- 243 millisecs, 16 millisecs, 1 millisecs

                      -- Tell me what properties exist for each datatypeproperty
                      SELECT s, p, o
                      FROM TABLE(SEM_MATCH(
                      '(?s rdf:type <http://www.w3.org/2002/07/owl#DatatypeProperty>)
                      (?s ?p ?o)',
                      SEM_Models('INCHIKEY'),
                      SEM_Rulebases(NULL),
                      SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                      null
                      ));
                      -- 12 rows 52 secs, 48 secs, 53 secs
                      -- 13 secs, 13 secs, 13 secs


                      -- Tell me all the object properties int he database
                      SELECT s
                      FROM TABLE(SEM_MATCH(
                      '(?s rdf:type <http://www.w3.org/2002/07/owl#ObjectProperty>)',
                      SEM_Models('INCHIKEY'),
                      SEM_Rulebases(NULL),
                      SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                      null
                      ));

                      -- 2 rows 141 millisecs, 1 mililsecs, 1 millisecs
                      -- 141 millisecs, 1 mililsecs, 1 millisecs


                      -- Tell me what properties exist for each objectroperty
                      SELECT s, p, o
                      FROM TABLE(SEM_MATCH(
                      '(?s rdf:type <http://www.w3.org/2002/07/owl#ObjectProperty>) (?s ?p ?o)',
                      SEM_Models('INCHIKEY'),
                      SEM_Rulebases(NULL),
                      SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                      null
                      ));

                      -- 2 rows 59 secs, 51 secs, 50 secs
                      -- 13 secs, 13 secs, 13 secs

                      -- tell me all of the properties for a particular class
                      SELECT p, o
                      FROM TABLE(SEM_MATCH(
                      '(<http://www.ucb-group.com/idac/supplier#Supplier> ?p ?o)',
                      SEM_Models('INCHIKEY'),
                      SEM_Rulebases(NULL),
                      SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                      null
                      ));

                      -- 1 row 49 secs, 48 secs. 48 secs
                      -- 12 secs, 12 secs. 12 secs

                      -- tell me all the entries for a particular class
                      SELECT s
                      FROM TABLE(SEM_MATCH(
                      '(?s rdf:type <http://www.ucb-group.com/idac/supplier#Supplier>)',
                      SEM_Models('INCHIKEY'),
                      SEM_Rulebases(NULL),
                      SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                      null
                      ));

                      -- 63 rows 16 secs, 1 millisec, 16 millisecs,
                      -- 1 secs, 1 millisec, 16 millisecs,

                      -- tel me all the properties for an entry
                      SELECT p, o
                      FROM TABLE(SEM_MATCH(
                      '(<http://www.ucb-group.com/idac/supplier#61> ?p ?o)',
                      SEM_Models('INCHIKEY'),
                      SEM_Rulebases(NULL),
                      SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                      null
                      ));
                      -- 7 rows 49 secs, 48 secs, 51 secs
                      -- 13 secs, 12 secs, 12 secs

                      -- tel me all the properties for an entry with property info included
                      SELECT p, o, pp, oo
                      FROM TABLE(SEM_MATCH(
                      '(<http://www.ucb-group.com/idac/supplier#61> ?p ?o) (?p ?pp ?oo)',
                      SEM_Models('INCHIKEY'),
                      SEM_Rulebases(NULL),
                      SEM_ALIASES(SEM_ALIAS(NULL,NULL)),
                      null
                      ));
                      -- 9 rows 1 min 44 secs, 1 min 40 secs, 1 min 43 secs
                      -- 25 secs, 28 secs, 25 secs
                      • 8. Re: Creating indexes
                        alwu-Oracle
                        Please send an email to alan dot wu at oracle dot com. I will help you offline with these queries. We can put a summary back here later.