when a partial update is run each time, the siz of the dgaph process in memory increases by around 200mb. Initially when just ran the baseline update, the siz of the dgraph was around 1gb but after a few partial updates the dgraph memory gone upto 2.5gb. However there are not much records inserted/updated during the partial run, each updates might have brought in 50 records. If the dgraph is restarted, then siz of the process comes down to 1gb. Also after the partial update, the query response time is also getting degraded.
Please let me know anyone has come across this issue and also solution you can recommend.
We had the similar issue with our indexes, though the numbers were a bit higher. Our baselines would deliver indexes around 20G, that would grow to over 50G each in the course of a week of partial updates.
We opened a ticket with Oracle support about this, and we were told that it is expected to see a slow growth in memory utilization over time. As a Dgraph runs throughout the week (we do weekly baselines), it adds different pieces from the index into memory. In addition, each partial update that is added contributes to the memory of the process as different generations files are added and loaded. These, combined with any sorts that are specified for the data set, and the cache, will cause the memory to grow over time.
We were also told that, if the memory usage of the dgraph is predictable, then make sure to have enough physical RAM on the machine to accommodate the entire dgraph process (or processes). We have found this to be really important. If you don't have sufficient RAM to hold the dgraph(s), then they will begin to swap to disk when they hit the RAM limit, and your search performance will go through the floor.
I just wanted to add that the query performance degradation was partly due to the dgraph cache being cleared after partial update. You will want to run a dgraph warming script after each partial update to get your engine to a steady state quickly.
The spike in memory surge for partial update is partly due to new generation index being created and the subsequent merging. The memory usage should come down after the generation merge completes. Assuming no other processes running on your machine, the second line for 'free' command (used: -/+ buffers/cache) tells you the RAM consumption (RSS) for your dgraphs and the first line 'cached' tells you your file system cache consumption. Make sure your total physical RAM is big enough to maintain a healthy balance between the two.
The warming script is basically a script replaying your commonly used queries to your dgraph to warm the cache. One way of getting the commonly used queries is to parse your production dgraph request logs. You would want to experiment the number of queries you use in your warming script to fit them in your 20 minute partial window.