We keep hitting this error at random points when going from our coded API to the MDEX Engine:
com.endeca.navigation.ENEConnectionException: Connection unable to determine response code from navigation engine.
Now, people have attempted to answer this question in the past on the Eden site, and luckily, we pulled information before it was shut down.
We have ruled out the following:
1. The version of the API libraries used by the application code does not match the version of the MDEX - generates this error: com.endeca.navigation.ENEException: Navigation Engine not able to process request.
2. Querying an engine that isn't running or doesn't exist - generates this error: com.endeca.navigation.ENEConnectionException: Error establishing connection to retrieve Navigation Engine request
Does anyone have any idea why we might be receiving this response? It seems to hang up connections from our API and is causing extreme slowdowns. Any information would be fantastic!
The errors are random, or are consistent for specific requests? Are you able to recreate the issue using a different client - like the reference application? Are you logging the http GET which is resulting in the error?
No - the errors are random and not consistently reproducible. We actually are NEVER able to repro it with the reference application and it is a normal GET request that often times works. The only difference between the Ref UI and the website API which we've written is that we do not go directly to the MDEX Engine - we go through another box that passes on that request. We can look and see if the request is getting corrupted in that central box, but I think it's unlikely.
There was one gentleman that said he fixed the issue, but his response was indecipherable:
*"I got the solution for this. I was using the wrong api to get the response.*
*Now I got the Navigation object from ENEQueryResults. ERecList from Navigation. ERec by iterating through ERecList .ERec.getProperties() gives me the response dataset."*
But our application essentially does what the above does.
You say that the only difference between the ref UI and your app, is that your app sends to some sort of pass-through program. Can't you point the ref UI to that same pass-through program?
Set up, or enable logging, in that pass-through program.
Include the http headers in the logging. Put in more logging.
Capture all the logs when the issue does happen.
Figure out how to reproduce the issue. Have some one sit down and bang away at the app.
Read the "Endeca MDEX Engine: Performance Tuning Guide", to look for more hints.
I realize this is a long while later, EndecaJoe - but we are encountering the issue still. Here is our setup. We have created a VIP and are trying to access the boxes / ports through that VIP.
So essentially, we have a box called EndecaTierBox port 8010 that lives behind the Big IP. Our commerce setup makes an ENEConnection through the Big IP which is like ENDECABIGIP port 9000.
The Big IP and redirect are set up correctly, because sometimes, I can access the data through the reference app using the host as ENDECABIGIP / 9000. But most times, it throws me an error about Unable to determine Response Code from Navigation Engine.
When I type in the direct box (EndecaTierBox / 8010) that the dgraphs live on, the data resolves 100% of the time.
I've read in other places that there may be an issue when trying to connect through a load balancer with http 1.1, but I really have no other information about this. Would anyone happen to know more about this?
Can you provide more details on your VIP setup? Is it a pool, how many nodes? Does it have healthcheck/monitoring via an active or passive check, if so, what is the check? Could the service be flapping - have the logs been checked for the VIP, does it think it's 100% up?
If you aren't making any forward progress in identifying the issue with current testing/logging, it might be time to reach for a protocol analyzer, such as Wireshark, and sniff/analyze your Endeca API requests. You'll then be able to compare the traffic for direct/successful, indirect-BIGIP/LTM/successful and indirect-BIGIP/LTM/failed request/responses. Hopefully this shows-up a more-or-less obvious different with the traffic going through the F5.
The HTTP port can be fussy about some aspects of the protocol. Many folks have run into such problems when setting-up load-balancer checks for their VIPs, especially with things such as extra CRLFs at the end of a HTTP header block. The settings for these have varied from version to version of LTM, so it's worth checking if any part of your VIP is doing header-manipulation (such as sticky session headers etc.). Check that you don't have any passwords or other authentication enabled.
Finally, it's worthwhile setting-up another clean-room environment, to see if you can reproduce the problem. This can usually be done fairly cheaply with VMs for your webapp server and Endeca server, and F5 have a Virtual Edition of their F5 LTM , which has proved useful in the past for setting-up/testing and troubleshooting VIPs for Endeca and other services. That way you can crank-up logging and try all sorts of tests without impacting what may be a production router.