Forum Stats

  • 3,781,161 Users
  • 2,254,485 Discussions
  • 7,879,597 Comments

Discussions

Minor performance improvement after indexing a container

673401
673401 Member Posts: 19
edited Dec 3, 2008 8:37AM in Berkeley DB XML
I'm using a container with 100 documents (300 Mb) with a deepth XML structure and I'm having troubles with the performance of simple queries, since indexing the container only means a sligh improvement on the overall performance.

The structure of the documents is similar to this (xxx represents text):
<pre>
<DOC>
<S>
<LU>
<STR>xxx</STR>
<MORF>
<CAT>xxx</CAT>
<LEM>xxx</CAT>
</MORF>
</LU>
<LU>
....
</LU>
</S>
<S>
...
</S>
</DOC>
</pre>
I have tried simple queries like "collection("container")/DOC/S/LU and using the command lind and without indexes it takes 200.182 seconds and found 2992 objects.

I've created the following index - addindex "" STR edge-element-equality-string - since i want to make comparisions between the STR node and a given string and the path is long.

The same querie with this index takes 166.016 seconds!!

I have also tried node-element-equality-string for STR node but the difference is minimal. This is a scary result since I'm expecting to use a container with 1,5 million documents and about 125Gb.



Any ideas what might be the problem? I'm using the latest dbxml version under Linux Ubuntu 7.10 Gutsy Gibbon (32bits) and a intel quadcore Q9300 @ 2.5Ghz and 4Gb RAM.

Edited by: user10669738 on 02-Dec-2008 02:08

Edited by: user10669738 on 02-Dec-2008 02:10

Best Answer

  • 597600
    597600 Member Posts: 250
    Accepted Answer
    "I cannot use the method set_lk_max_locks to se a higher number of locks since I'm using Perl. How can I solve this problem?"

    You can specify that parameter in your DB_CONFIG before creating the environment.

    See [Berkeley DB: DB_ENV->set_lk_max_locks|http://www.oracle.com/technology/documentation/berkeley-db/xml/api_c/env_set_lk_max_locks.html].

    Michael Ludwig
«1

Answers

  • 522770
    522770 Member Posts: 728
    Hi,
    user10669738 wrote:
    I have tried simple queries like "collection("container")/DOC/S/LU and using the command lind and without indexes it takes 200.182 seconds and found 2992 objects.

    I've created the following index - addindex "" STR edge-element-equality-string - since i want to make comparisions between the STR node and a given string and the path is long.

    The same querie with this index takes 166.016 seconds!!
    That index won't make any difference to your query, since your query doesn't access any elements named "STR". Try looking at the query plans for your queries to check if your indexes are being used (the "queryPlan" command in the DB XML shell).

    For the query you gave, I suggest that you use a "node-element-presence" index on "LU".

    John
  • 673401
    673401 Member Posts: 19
    Hi,

    I've created the index you suggested, but the query time increased comparing to the query time without indexes...

    the queryplan follows bellow:

    dbxml&gt; time queryPlan '
    collection("cont")/DOC/S/LU[STR="muito"]'
    &lt;XQuery&gt;
    &lt;QueryPlanToAST&gt;
    &lt;NodePredicateFilterQP uri="" name="#tmp9"&gt;
    &lt;StepQP axis="child" name="LU" nodeType="element"&gt;
    &lt;StepQP axis="child" name="S" nodeType="element"&gt;
    &lt;StepQP axis="child" name="DOC" nodeType="element"&gt;
    &lt;SequentialScanQP container="/big/tmp/test_container" nodeType="document"/&gt;
    &lt;/StepQP&gt;
    &lt;/StepQP&gt;
    &lt;/StepQP&gt;
    &lt;ValueFilterQP comparison="eq" general="true"&gt;
    &lt;StepQP axis="child" name="STR" nodeType="element"&gt;
    &lt;VariableQP name="#tmp9"/&gt;
    &lt;/StepQP&gt;
    &lt;Sequence&gt;
    &lt;AnyAtomicTypeConstructor value="muito" typeuri="http://www.w3.org/2001/XMLSchema" typename="string"/&gt;
    &lt;/Sequence&gt;
    &lt;/ValueFilterQP&gt;
    &lt;/NodePredicateFilterQP&gt;
    &lt;/QueryPlanToAST&gt;
    &lt;/XQuery&gt;
    Time in seconds for command 'queryPlan': 0.004972
  • 673401
    673401 Member Posts: 19
    I've noticed that the index is not being used!

    I'm using the command line in order to avoid any other problems, and I simply type:

    addindex "LU" LU node-element-presence

    and then I execute the same query using the queryplan but the index wasn't used at all...
  • 522770
    522770 Member Posts: 728
    Hi,

    1) Correct - it's not using the index, because you added the index incorrectly. You need to use:
    addindex "" LU node-element-presence
    2) That's not the query plan for the query you appeared to have posted. Has the forum software eaten some of your query? You need to backslash ("\") escape your open square brackets ("\["), apparently. From the query plan, I think your query was:
    collection("cont")/DOC/S/LU\[STR = "muito"]
    Can I see the output from the "listIndexes" command, to see what indexes DB XML thinks you have on your container?

    John
  • 673401
    673401 Member Posts: 19
    Hi,

    The list of indexes is:

    listindexes
    Index: node-element-presence-none for node {}:LU
    Index: unique-node-metadata-equality-string for node {[http://www.sleepycat.com/2002/dbxml]}:name
    2 indexes found.

    as expected. The query plan is:

    time queryplan '
    collection("cont")/DOC/S/LU\[STR="muito"\]'
    &lt;XQuery&gt;
    &lt;QueryPlanToAST&gt;
    &lt;NodePredicateFilterQP uri="" name="#tmp9"&gt;
    &lt;StepQP axis="child" name="LU" nodeType="element"&gt;
    &lt;StepQP axis="child" name="S" nodeType="element"&gt;
    &lt;StepQP axis="child" name="DOC" nodeType="element"&gt;
    &lt;DocumentPresenceQP container="/big/tmp/test_container" index="node-element-presence-none" operation="eq" child="LU"/&gt;
    &lt;/StepQP&gt;
    &lt;/StepQP&gt;
    &lt;/StepQP&gt;
    &lt;ValueFilterQP comparison="eq" general="true"&gt;
    &lt;StepQP axis="child" name="STR" nodeType="element"&gt;
    &lt;VariableQP name="#tmp9"/&gt;
    &lt;/StepQP&gt;
    &lt;Sequence&gt;
    &lt;AnyAtomicTypeConstructor value="muito" typeuri="http://www.w3.org/2001/XMLSchema" typename="string"/&gt;
    &lt;/Sequence&gt;
    &lt;/ValueFilterQP&gt;
    &lt;/NodePredicateFilterQP&gt;
    &lt;/QueryPlanToAST&gt;
    &lt;/XQuery&gt;
    Time in seconds for command 'queryplan': 0.004827

    and now the index is working!

    However the performance is very low (query time = 192.576 seconds)...
    the container type is set to nodecontainer.
    Are there any extra configurations I need to do?
  • 522770
    522770 Member Posts: 728
    Hi,

    The query plan shows me that you are using document indexes - which is almost certainly the reason that your query is slow. Node indexes are much faster (they're the default), and you can change to use them with this DB XML shell command:
    reindex /big/tmp/test_container n
    My previous advice was based on the forum-eaten query - for your actual query I suggest that you add an index like this:
    addIndex "" STR node-element-equality-string
    I hope that works better for you.

    John
    522770
  • 673401
    673401 Member Posts: 19
    Actually it was my fault... i was defining the container as wholedoccontainer and i wanted NodeContainer!

    Now the performance is as I was expecting.

    Thanks for the support.

    ---
    Jorge Filipe Teixeira
  • Lucas Vogel
    Lucas Vogel Member Posts: 87
    Jorge,

    Out of curiosity,what kind of performance are you getting now after your changes?

    Thanks!
  • 673401
    673401 Member Posts: 19
    For the same query, miliseconds! :)

    I'm not sure if I can continue this thread, however I faced another problem.

    When I try to load several documents, I get an error message saying I'm out of locks. (This never happens before because I was wrongly using WholeDocContainer instead of NodeContainer).

    I've read the FAQs but I cannot use the method set_lk_max_locks to se a higher number of locks since I'm using Perl. How can I solve this problem?


    ---
    Jorge Filipe Teixeira
  • 673401
    673401 Member Posts: 19
    I posted other question that is connected with this problem, so I think there is no need to create another thread.
This discussion has been closed.