I have given a task to identify clients making heavy queries on directory server, but I don't understand where to start. Could you help me that how can I identify or list out such clients that connect to my directory server (SunOne DS 5.2)? Is it possible to list them out using access logs and etime (elapsed time) values? Please suggest some way. Many thanks!
A good start would be parsing the access log files with the 'logconv.pl' utility, that with 5.2 was bundled within the DSRK (Directory Server Resource Kit) as a separate package; in the latest releases (11gR1PS1) DSRK is bundled with the product itself.
Basically, once you identify a line in your log file that has an high 'etime' (maybe with an awk script), you should be able to track back which "connection" (conn=X) generated that "operation" (op=Y).
Thanks for your answer. I can see every line that contains an etime value, also contains some "conn=X" and "op=Y" values. But on basis of this, how can we identify the client? I means how can we find the hostname or IP of the clients and what query it made?
Hello everyone! Thanks for your responses & I am really sorry for a late reply.
Well, I have wrote an awk code to extract conn=, op= and etime= entries from access file. But now on the basis of conn= value or op= how can I identify the client (i.e. hostname or IP)? Is anywhere any list available where I can find meanings of conn= and op= code values?
If you look through the access log you will see entries for a connection, followed shortly (usually) by a BIND.
The CONNECT line contains the IP address of the incoming connection, and the BIND the DN used to bind.
There are several potential problems you will have to face:
* If the connections are long-lived (days/weeks) the logs will have rolled, and unless you keep your logs for a long time, finding the original connection and bind might not be possible. Even if it is possible, you may have to look through several GB of data to find it.
* If you have a load balancer/proxy, you may see the address of the load balancer/proxy rather than that of the client system.
* There is often a tendency to re-use BIND credentials, so multiple applications may use the same bind DN making it impossible to identify the application by its bind DN.
* Application servers often use a common connection pool for multiple applications, so even knowing the IP address of the system may not pinpoint the application.
If you can do it - no proxy/load balancer hiding the source IP, it may be easier to snoop the network and look at the addresses on the packets.
Also look at what sorts of requests are being made, that can help to identify the application.
There is no easy way to do this unless some thought went into the original configuration: choosing load balancer which pass the source IP, using unique bind credentials for each application, logfile preservation etc.
All that said, with perseverence it is usually possible to track down the culprit.
Whether it is possible to fix the culprit is usually a political question and often the end result is having to work on the LDAP systems to have them support the heavy clients (using load balance rules to send these requests/applications to dedicated servers etc).
It's usually more interesting getting at the nature of what's breaking the server than exactly who is doing it. Root cause may be something easy to fix on the server like an unindexed attribute, or something more elusive. It's quite possible that the "heavy" queries are something the server should be able to handle, and not an unreasonable client behavior at all.
Once root cause is understood on the server side, any number of options may open up for resolving the issue. As a last resort, the problematic operation/s can even be throttled or disallowed, slowing or breaking whatever client is making them. If that doesn't flush the responsible clients out of the woodwork, nothing will. That last option might have non-technical ramifications that would warrant consideration.
Thanks everyone for your responses!
I can now fetch a list of clients connecting to the server and dn's of applications/clients with heavey elapsed time. On basis of those dn's I wil find out the applications or their owners, but one more thing:
After listing the applications/clients, on basis of their dn's, which are having high etime values, can I know what query they run? Can op= values help identify type of query? I read some articles on Internet, but couldn't understand the meaning of op= codes. Could anyone suggest how to identify such op= codes and queries the applications/clients are running?
Thanks for your time!
You could also do some pre-filtering using lgrep to select the high etimes you are interested in. Often, after I've got the output from lgrep, I'll use grep, sed, awk, sort, and/or uniq to get a tidy little report of what I'm really looking for.
Hi Chris, Thanks for your response.
I have already written an awk code to fetch data and I am now able to fetch any type of data. But I want that how to identify operation/query type i.e. what query or operation was run by the client/application on my ldap server?
Is it possible to find using op=<code> ? Is their any list available which explain the operation against specified op=<code> ?
Your etime numbers come off a RESULT line. You use the conn and op numbers to correlate that RESULT with another line that exists, earlier in the log, with the details of the request. You need both lines to make sense of the operation. Perhaps you can share what your script is showing you and we can work through some examples.