In our production environment we are observing some unhealthy long running transactions which actually causing the total server to crash. After some investigation we have identified a filer (A remote NFS) server to be root cause of the long running transaction. We have replaced the filer and now things are working okay.
The problem I have noticed that each transactions are taking much higher time than in lower instance like (Performance, Testing, development etc). So I have tried to understand if by any change our production server is slow or not. I have run the below program in all Production, performance, testing and development environments and observe red that production server is taking 3 times than it is taking in other environment.
for ((ii=0; ii<=2000; ii++))
a=`expr $a + 1`
If I compare the environment they will be like below
1) Production: Solaris 10 on SPARC Enterprise T5220. With one physical processor, 8 core and 8 thread in each core. Total 64 threads. Total RAM 64 GB
2) Performance Environment: Solaris 9 sparc SUNW,Sun-Fire-V440. With 4 physical processor, 0 core and 16 GB memory
3) Development Box: GNU/Linux, kernel 188.8.131.52-4-smp. 2 processor intel Pentium III with 2 GB memory
4) Testing environment: GNU/Linux 64 bit kernel 2.6.18-128.el5. Intel(R) Xeon(R) CPU with 2 core of 2.53 GHz speed . 4 GB memory
From the above system details it looks like that production server is hugely capable of processing any number of thread very fast but in actual scenario it is not able to do that. Even a simple ls command on a empty file is taking 4 times slower in production when compare to the performance environment.
Can anybody help me tuning our production server or at least point me to some documents which might help me.
This appears to be related to single versus multi-threaded processinng performance. Check out item #2 on this oracle blog:
I believe the other systems you have listed perform single threaded processing.
I think it may be one of the issue. But looking at ldd command output I think much more libarary getting called for a simple command in Solaris 10 (production env) then to the Solaris 9 (Test env).
Prompt> ldd /usr/bin/ls
libsec.so.1 => /lib/libsec.so.1
libc.so.1 => /lib/libc.so.1
libavl.so.1 => /lib/libavl.so.1
libm.so.2 => /lib/libm.so.2
prompt> ldd /usr/bin/ls
libc.so.1 => /usr/lib/libc.so.1
libdl.so.1 => /usr/lib/libdl.so.1
In solaris 10, I can see two library has been added to call ls command itself.
I have done truss on the program (In my original post) and observed that the times is taking after the system call fork abd it returns from it. And at the sample test environment does not take time.
Does this mean Solaris 10 (production env) trying to do something extra then test environment while forking the child process?
Is there a reason you are trying to compare metrics between 4 differing environments? They are all running different hardware with single and multi-threaded processing, so you will see different results. Regarding solaris 9 versus solaris 10 command library calls, command functionality can change between releases. Solaris 10 might have a more robust "ls" command which in turn calls more libraries (just a guess). I believe the commands you are using for your performance testing (incrementing an arithmetic variable in a loop, ls, etc) are single threaded.
I agree that these machines are with different hardware as well as of different version of OS. But these basic operations(expr etc) should have similar response time, isn't it? I mean I am trying to understand if by any change production environment has any glitch which caused this huge response time difference.
I can understand that SPARC T2 CMT technology would surely improve throughput in a multi-threaded environment (Which we really wants) but these basic simple commands response time must not changed for that.
Or can you suggest me any other way to find out if our production box is well tuned for a multi-theaded multi JVM application.
The v440 has 4 physical processors each running at 1593 MHz.
The t5220 has 1 physical processor (with up to 8 cores and 8 threads/core) running at 1165 MHz.
I have no experience with application performance tuning. Checking "prstat" output may be a good start to see what is being utilized on each box.