The simplest way is to use the TMTRACE environment variable(e.g. export
TMTRACE="*:ulog:dye:"). Shutdown Tuxedo, set TMTRACE, boot Tuxedo. The servers will pick up the environment variable on boot up and add trace messages to the ulog.
You can play around with the booting sequence to make it more selective if you wish. Otherwise you can try the tmadmin
changetrace command to turn tracing on/off selectively.
If you come across situations where you get TPESYSTEM errors and there is no additional information in the ULOG by default, please let us know. We're trying to clean those up as tracking down TPESYSTEM errors can be difficult. Our goal is that anywhere TPESYSTEM is generated causes a ULOG entry explaining why TPESYSTEM was being returned.
So are you asking which trace categories to enable for TMTRACE or something else? "atmi" for the trace category, "ulog" for the receiver, and ask for dyeing in order to correlate requests. Or were you asking for something else?
How often are you getting TPESYSTEM errors and from what call are they being reported?
Oracle Tuxedo Chief Architect
PS Congratulations on surviving the polar plunge!
Another option is an environment variable, TM_LOG_ESYS, that was added in Tuxedo 9.1.
This environment variable is used to add detailed information to ULOG when
TM_LOG_ESYSis limited to ATMI calls and can be set as follow:
(other settings are mentioned on the link if you are interested)
It covers most of Tuxedo (except for JRAD and JRLY I think)
Bob and Todd
Thanks for your follow up. Your details have been helpful.
(Bob) I will look further into this last reference which you have given to me although I am confident this is not in the atmi logging. Concerning TRACE setting, I have seen many references to GWWS_TRACE_LEVEL but no explanation. If there is a link that tell me how to use this, that would be great.
(Todd) These errors are from tpcall() and we have not seen anything in our logs from these errors but as you will see in my notes below, we have often had to filter out TRACE messages so we may have trashed ourselves. We are hoping to give you more complete info after what we complete our plans described below.
While trying to start a discussion on this topic, I was also trying to work with our OPS to find alternatives in testing to get better detail. We came up with a partial solution, so that we can better debug.
Firstly, our greatest issue has been that we have been trying reduce TRACE messages due to the large number of messages which were are already capturing. The overhead is great when we cannot replicate the issue at will and we are forced to sit and wait. Therefore, we often had indicators but not a complete ULOG excerpt.
To solve this, we have decided to work with another bulletin board (GWWS server only) which is dedicated to this issue, spilling out the TRACE messages separately until we catch this error. Because we are changing variables, we may no longer see these errors but we are hopeful. "Information drives good business" so before I trouble you further, let me get you more specific info.
With great regards.
"The water is fine. Come on in!"
OK, so let me understand what you are doing and where you are getting these errors. If I understood you correctly, you are making outbound web service calls via the GWWS and in some cases you are getting TPESYSTEM error returns. Is that correct? If so, you might want to eliminate any TMTRACE tracing that you are currently doing and enable the environment variable Bob mentioned to get more details about why TPESYSTEM is being returned.
As for GWWS_TRACE_LEVEL, as far as I can tell with a quick look at the code values of 1,10,100 seem to make sense. The higher the value, the more tracing detail you will get from the SALT gateway.
Oracle Tuxedo Chief Architect
PS The water looked pretty cold!
I finally have some results from ULOGs but they still baffle me. The cause of the TPESYSTEM seems to be an OBFail error. There is not much immediate detail which I can see for this particular call but I see that the response to the preceding outgoing webservice was odd in each case. While the preceding call completed and returned successfully, the logs do not show a complete response. It appears as if memory may have been corrupted. This happens in each of the three case we have found and it appears to be related to the TPESYSTEM error.
As before, this may be isolated to our use and I am searching for leaks, however, it seems odd that we did not get an error back when the soap response "appears" bad.
A modified snippet of what the logs shows for the previous response is below
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" >
PS - next time, maybe we will have to see what :Lake Michigan has to offer.
Bob and Todd:
I have now identified the problem. This is memory related and it is also clear to me that there was no issue with the handling of the salt preceding message. You have both helped me greatly.
Thanks to our Tuxedo and SALT heroes.
Just curious, could you expand on what sort of memory related problems you were encountering? It might help someone else in the future in a similar situation.
Oracle Tuxedo Chief Architect
PS At the moment it has lots of ice to offer!
I have been reading a lot of logs and xml to find patterns and I have found a few currious things. First of all, I misspoke earlier when I spoke of not having received any response back. I misinterpretted the flow of the logs. However, I had referred to bad xml in a response. This is still the case.
I have determined that our processing has followed this basic pattern:
1) basic salt request and response of webservice request_1 - completion is successful
2) subsequent requests and responses - also, all successful
3) one more request is made for of webservice request_x
4) the response to webservice request_x looks mostly good, however, the </Envelope> tag is corrupted so that it seems that ParseBodyElement is failing.
5) upon folow-up, I have found that a piece of the garbage which corrupted this last response came from the response to request_1.
- Please see the brief example (above) of xml like we are seeing.
Whether this is coming from our webservice or internally, we are digging into this now. My reference to memory was based upon the fact that I had identified exact locations in preceeding xml where this garbage is coming from so it seems as if something is not being properly initialized "somewhere".
Thanks for keeping me honest and I hope this clears things up about our status.
PS. If you know of a polar event in your area, I may have to see about driving up. I have a bad ankle which could use a little ice
To further help us in our analysis, I have a new flavor of this same question.
When a webservice response gives us a ulog trace message of
"HTTP Parsing: Buffer length 367 --"
is the 367 length a calulated length or is it a length received with the xml, in a wrapper from somewhere. I ask that because of the following.
Previously, I have described what appears to be corrupted webservice responses which we have received. These corrupted reponses fall in three different categories (based upon the xml string logged in ULOG TRACE).
1 - where there is something which appears to have overwritten the end of the response xml;
2 - where there has been a truncation of the response xml; or
3 - where there has been something appended after the response xml
However, when I look at the "Buffer length", this length does not match the length of the string which was logged. In each case, it appears that the Buffer length is actually the correct length for what would have been the correct xml. Therefore, if this is being calculated, I am trying to find out where this was calculated. That would be the last place where the correct xml was handled.
(side note - categories 1 & 2 fail with OBFail due to bad parsing. 3 seems to work, ignoring the garbage because it is beyond the Buffer length of the actual xml)
We are still trying to trace this data but this detail can help greatly.
- additional curiousity: I am seeing extra references to
FSM State Transition: --WAKEUP-->EnTuxedoQ
FSM State Transition: --WAKEUP-->ReadyDeletion
FSM State Transition: --OK-->Delete
These appear to be outside of the usual activity. Is this some form of garbage collection similar to javas?
We have verified that the incoming stream is complete, and does not match what the ULOG is showing us. Do you have any advise for a next step?
A question from one of our leads (Babu Khan): Is there an API which we can use which can give us the soap message which is being returned?
Thank you for your help.
Faith R. And Babu K.
I don't think there is any API to do as you ask. What I can suggest and would ask that you do is to capture a wire trace using a tool such as Wireshark or something similar. If you could capture the SOAP request and SOAP response and send them to us that would help immensely. Or if you have any kind of simple reproducer, that would also help.
Oracle Tuxedo Chief Architect
We could upload the response by openning a service request in the oracle support site. Do you have access to that if we provide a service request number?