We have to update data frequently and for that we use Partial update to update data on production. But for intermediate(not for each) Partial updates MDEX loosing performance. Can any one please help us in understanding what could be causes for this performance degradation of MDEX?
Cache is cleared after a partial update (as the data changes, so the cache is no longer valid). If you are using 6.3.0, you can avoid this by using cache warming via the --warmupseconds flag (see here for details: http://www.oracle.com/ocom/groups/public/@otn/documents/digitalasset/1704701.pdf ). This will pre-load the cache before any requests are actually sent.
Alternatively, if you are on an older version you can write your own cache warming via extending the beanshell scripting - look at the pre-shutdown-script and post-startup-script attributes for <dgraph /> documented in the Deployment Template Guide, and write something that:
1. Removes each dgraph (or cluster) from the load balancer
2. Updates the dgraph (or cluster)
3. Invokes dgraph queries via wget (or curl, or similar) to warm the cache
4. Adds each dgraph (or cluster) back into the load balancer
a complementary comment to that (we also are considering performance issues following partial updates) :
- if you have installed 6.3.0 release : I did not see the size of the sample queries that are re run on Dgraphs (see Michael's link to the documentation)
- if you are using prior version of MDEXes :
* for F5 LB users, there is a toggleF5 script available that helps you toggle (ie add/remove) pool member from LB and run offline Eneperf queries (based on queries you'll have prepared/parsed using for eg ReqLogParser (perl script) from a recent ReqLog file) => same question here about sample size as above for 6.3.0 release (ie number of queries you'll include in your file). run Eneperf stating the file, the number of threads (I'd read an official endeca doc that recommends a 25% overhead on threads but not sure this is appropriate) and the number of times the file is read/queries played against the pool member (MDEX).
* a thought about F5 algorithm, Endeca recommends using the "Least connection" method, but this means that the pool member with least connections will receive requests from the LB. Once you have done cache warm up with eneperf using recent queries file, check your LB value of ramp up time (time it takes between pool member being back in LB pool and the time LB redirects queries to it).
If it is too short your warmed up Dgraph will receive too many queries (depending on traffic of course) and suffer a little bit...you have to extend ramp up time to get around this, but then when you do that you also extend overall update time for your Dgraphs ...guess you have to try and find your own appropriate values.