This discussion is archived
4 Replies Latest reply: Sep 4, 2012 8:44 AM by Jim Song RSS

issue with partial update in linux

User387251 Newbie
Currently Being Moderated
Hi,

when a partial update is run each time, the siz of the dgaph process in memory increases by around 200mb. Initially when just ran the baseline update, the siz of the dgraph was around 1gb but after a few partial updates the dgraph memory gone upto 2.5gb. However there are not much records inserted/updated during the partial run, each updates might have brought in 50 records. If the dgraph is restarted, then siz of the process comes down to 1gb. Also after the partial update, the query response time is also getting degraded.

Please let me know anyone has come across this issue and also solution you can recommend.

Thanks,
RP
  • 1. Re: issue with partial update in linux
    959178 Newbie
    Currently Being Moderated
    We had the similar issue with our indexes, though the numbers were a bit higher. Our baselines would deliver indexes around 20G, that would grow to over 50G each in the course of a week of partial updates.

    We opened a ticket with Oracle support about this, and we were told that it is expected to see a slow growth in memory utilization over time. As a Dgraph runs throughout the week (we do weekly baselines), it adds different pieces from the index into memory. In addition, each partial update that is added contributes to the memory of the process as different generations files are added and loaded. These, combined with any sorts that are specified for the data set, and the cache, will cause the memory to grow over time.

    We were also told that, if the memory usage of the dgraph is predictable, then make sure to have enough physical RAM on the machine to accommodate the entire dgraph process (or processes). We have found this to be really important. If you don't have sufficient RAM to hold the dgraph(s), then they will begin to swap to disk when they hit the RAM limit, and your search performance will go through the floor.

    HTH, MP
  • 2. Re: issue with partial update in linux
    Jim Song Newbie
    Currently Being Moderated
    I just wanted to add that the query performance degradation was partly due to the dgraph cache being cleared after partial update. You will want to run a dgraph warming script after each partial update to get your engine to a steady state quickly.

    The spike in memory surge for partial update is partly due to new generation index being created and the subsequent merging. The memory usage should come down after the generation merge completes. Assuming no other processes running on your machine, the second line for 'free' command (used: -/+ buffers/cache) tells you the RAM consumption (RSS) for your dgraphs and the first line 'cached' tells you your file system cache consumption. Make sure your total physical RAM is big enough to maintain a healthy balance between the two.
  • 3. Re: issue with partial update in linux
    User387251 Newbie
    Currently Being Moderated
    Thanks MP and Jim. We will use this information while sourcing the RAM. We are doing partial every 20 minutes and what sort of dgraph warming script should I use?

    Thanks once again,
    RP

    Edited by: 924798 on 02-Sep-2012 17:15
  • 4. Re: issue with partial update in linux
    Jim Song Newbie
    Currently Being Moderated
    The warming script is basically a script replaying your commonly used queries to your dgraph to warm the cache. One way of getting the commonly used queries is to parse your production dgraph request logs. You would want to experiment the number of queries you use in your warming script to fit them in your 20 minute partial window.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points