At a first glance, I do see that the number of threads assigned to each Dgraph is too high. Since your MDEX servers are using a quad-core processor, it has a maximum of four threads available. And you have two instances of Dgraphs running on each MDEX server with 8 threads each, which means that the Dgraphs will try to use 16 threads from a total of only four threads available. This will cause CPU contention and requests will queue up.
Also, please make sure that there are no other processes trying to utilize the CPU.
I would suggest using one Dgraph per MDEX server using 4 threads. If there are other processes also running on the MDEX server which would likely contest for the CPU, then reduce the number of threads to 3.
When you run the performance tests, please keep an eye on the CPU utilization and load average.
How much cache memory have you assigned to each Dgraph? And what is the amount of RAM on the MDEX servers?
AdityaKar's point on threading is valid, although it is hard to imagine a server build with 24Gb RAM but only a single quad core processor.
I'd suggest reading the Performance Tuning Guide, then setting up a load test and running the MDEX Request Analyzer afterwards - that will give you a clearer picture with min/max/average engine time, min/max/average response time, details on any queuing, request profiling etc.
I performed a load test on the MDEX engines using eneperf tool ,the input for this was filtered from the DGraph logs using the reqlogparser and then used them to perform load tests.I was able to notice improvement in the overall throughput after changing the no of threads to 8 from 4 hence decided to go with 8,but I would change it as you have suggested . However before that let me explain the scenario in a detailed manner
These MDEX engines communicates to an ATG Server via a Load Balancer (f5 ) which balances the load between the two MDEX Engines.
The reported issue is that the HTTP response time under the Server subsection of details Tab shows a Max response time of around 14000 -16000 ms ,but the average is very low ,my assumption is that the MDEX engine is not responsible for the slow HTTP response since the load test with eneperf gave positive results-the throughput was very good infact.
Could the issue be due to network problems or issues with MDEX<-> F5 Load Balancer<-> ATG communication?
Please correct me if I'm wrong
There are a couple of tests that you can run to identify if the root cause is the MDEX server's capacity or network latency.
Before you start, please verify the number of cores on your MDEX server. Use only one Dgraph on each MDEX server. And set the Dgraph to use no more than the number of cores available on the server.
Use the request log analyzer (available with Platform Services) to analyse the Dgraph logs.
- Clear out the Dgraph logs. You can use "admin?op=logroll" to quickly do this.
- Run the load test as before from the front-end.
- Capture the Dgraph1.reqlog from Dgraph1.
- Run the reqloganalyzer on this log file.
- Compare the MDEX Engine Only Processing time against the Total Round Trip Response time in the log analysis results. If the difference between the two metrics is very high, then there maybe a network issue.
Next, you can take out network latency from the equation and run the test again.
- Parse the reqlog captured in the previous test using the Request Log Parser.
- Clear out the Dgraph1 logs.
- Use Eneperf on the MDEX server and the parsed log file to run a load test against Dgraph1.
- Capture the Dgraph1.reqlog and run the reqloganalyzer on this log file.
- Check the MDEX Engine Only Processing time and the Total Round Trip Response time again in the log analysis results.
I hope this helps identify the root cause. Let us know how it goes.