This content has been marked as final. Show 3 replies
You did not say if you are only changing hardware,
or if you are changing versions of Solaris and/or the compilers.
when the whole application is migrated to the new chipset with SPARC IV and above running Solaris 10,So, maybe it is a thread synchronization or a timing issue.
the application blows up with CPU utilizationation peaking at 100% occupancy
and threads queuing to the point where the whole application crashes.
You could try using some tools, like pstack or dbx,
to see where in the code
the threads are queuing or where the crash happens.
The application overall was design to run in a non thread safe way, and certainly on on a CMT chipset.That seems to be garbled.
The application is written in C and each component is started within the C Application Container.I'm not sure what "C Application Container" is.
Is some native C code called from a Java application?
1. Has anyone seen this type of behaviour before.No, not that I remember.
2. If so, was it possbile to migrate the application to newer hardware, and with what fixes.Most applications written in C will port to new hardware with no changes.
The clues above suggest looking for timing loops or timing routines
and checking how thread creation and synchronization is done.
3. If someone has successfully done this, can the application run in a virtualized environment?Once you get it running, it should also run in a virtualized environment.
If you have source for the application, it would be worth recompiling with the latest version of Studio and using the thread analyzer to investigate the code for possible data races, and also discover to investigate the code for memory errors.
There are two possible explanations that spring to mind. One is that the code was written with some idea of how long each action would take, and newer hardware completes the work faster, and this causes the weird behaviour. Alternatively, the code scales according to the number of hardware threads, and newer systems with greater numbers of virtual CPUs cause the code to do strange things.
If it is purely on the CPU count, then running the app in a zone might be sufficient to fix the problem. If the code has a timing issue it is going to be much harder to fix.
To investigate the 100% CPU issue I'd use the performance analyzer (the version from the latest studio will be fine) and that will tell you where the problem is, and perhaps give the developers sufficient information to root cause it.
The basic point is that an application should work unmodified on new hardware. The fact that this application has problems implies that there is an issue with the application.
I did find out yesterday that the code was compiled outside of Studio, and just ported across from an older version of Solaris. I am first going to get the developer to recompile using studio and will then perform traces on the execution. This does seem like a more standard path, and from what I have researched, it appears that unless one is very careful compiling outside of studio, it is possible to compile and miss some specific libraries relation to threading in Solaris 10 and above.
Thanks to you and Peter for your replies.