This content has been marked as final. Show 3 replies
I'll do my best to respond, but like you say - I'm not exactly an Informix guy. Let me get this straight, the system boot sequence launches 16 Informix processes during the init phase. Unless the Informix code is specifically binding those processes to a particular CPU via processor_bind(2), then the thread execution context for all of those 16 processes can block and be rescheduled on a different physical CPU (context switching). Although threads have an affinity toward a CPU they can be forced over to the second CPU if necessary. Ideally, you would want to fix this using the shell command: pbind(1M). Note that context switching across cores within the same physical CPU has minimal performance impact (even across cores).
Do you know how many threads these Informix processes spawn? If they execute via a single thread each then you're starving yourself - to resolve this you would need to initialize 128 Informix processes. And to minimize context switching, bind the first 64 to processor #1 and the remaining 64 to processor #2. You could also split the current 16 processes between the two CPUs since I'd bet all 16 are running on CPU #1. You could also (but I don't recommend it) carve up resources and assign each Informix process an even fraction of the system memory (say 64GB / 128 processes).
The next item you may want to look into are the process page sizes. Solaris defaults to 8 Kbytes (8192 bytes). This sucks for a database system. Change this to a reasonably large number. For example:
Back to your question, (paraphrased) "can processes swap threads". The answer is no. Threads share their parent process execution context. The only way for threads across multiple processes to communicate is with a named pipe, shared memory, doors, memory mapped global memory, sockets, etc. What you describe sounds like process-level (main thread) context switching either across cores or processors, or both. It also could be serialized execution where threads execute in serial FIFO (yikes).
(tbrown@bullshark) [tbrown]: pagesize -a 8192 65536 4194304 <-- (I usually use this one) 268435456 (tbrown@bullshark) [tbrown]: ppgsz -o heap=4M,stack=512K -p <Informix process pid>
I'm battling database performance myself on our T5240 (same config, max memory, different db). Hope I answered at least some aspect of your question.
Tracy S. Brown
Thanks for the reply...
We do bind our oninit processes to the Sun virtual CPU's when the database server starts up. I am not sure what system call Informix makes to perform the bind but from looking at 'top' output I never see an oninit process run on a CPU other than the one it was "bound" to at startup.
Informix will spawn any number of threads that execute on the oninit processes. The primary thread that runs on these oninit processes are "sqlexec" threads - which like their name implies performs sql work. The other main thread that runs on the oninits are kaio threads, which perform all I/O requests. The # of kaio threads is equal to the # of oninit processes that we configure to start when the engine starts, while the # of sqlexec threads is determined entirely by user activity. It is not unusual to see, on a busy Informix server, several hundred sqlexec threads. While Informix binds their oninit processes to physical or virtual Sun CPU's they do not bind their sqlexec or kaio threads to the oninit processes. An sqlexec thread who "comes to life" on an oninit process can migrate and run on another available oninit process if it is switched off the oninit it is on by the internal Informix scheduler. The context for all of the Informix threads is stored in shared memory that all of the oninit processes can access, so it is a seemless migration when an sqlexec or kaio thread migrates between oninits.
I wonder how the kernal/cpu's handle this scenario:
An sqlexec thread is running - executing its code on Oninit#1. The Oninit #1 process is bound to virtual/logical cpu #1. The sqlexec thread reaches a point where it needs to yield to another waiting thread so it is switched off Oninit #1 and goes to sleep. When the thread comes back to life it begins running on a different oninit process- Oninit#7, which is bound to cpu #7. How would that cpu know anything about the context relative to the sqlexec thread that was running on a different cpu?
I know this scenario occurs in situations where there are, for example, 8 physical cpus - not logical but real cpus. When the Informix threads migrate between oninit processes there is a cost involved:
sqlexec1 ----> oninit#1 -----> CPU1 (This cpu has some context information relative to the sqlexec thread)
sqlexec1 yields. wakes up...
sqlexec1 ----> oninit#7 -------> CPU7 (This cpu has information relative to whatever was running in this process prior to the yield-switch)
I hope that makes some sense...
I suspect that your Oninit LWPs are being starved of resources because of thread priorities. As demand increases for these LWPs to accept a socket connection and spawn an sqlexec thread, the Oninit LWPs will be forced to yield processor time. It's worse than that because the default thread priority (Timeshare: TS) utilizes a sliding scale quantum factor that maximizes each thread execution interval (as in maximum amount of time the thread can execute before blocking). I'd bet willing to be this is my problem. I've seen upwards of 640 threads per second on my database at peak load with only 128 virtual processors from which the scheduler can choose. And performs like a drunk dog skydiving - not pretty. The following example shows how to view thread priority levels and their associated quantum interval:
The TS thread priorities range between -60 to +60 with 0 being middle of the road. I'm guessing that the default priority is 0. Each time a thread is forced off the processor it's priority gets bumped up 1. This is all fine, but your Oninit processes shouldn't be attempting to fare against however many hundreds or thousands of threads eager for execution time. I'd try setting the 16 Oninit LWPs to priority 60 if your Oninit LWPs are actually running as TS threads. For example:
[tbrown]: dispadmin -g -c TS # Time Sharing Dispatcher Configuration RES=1000 # ts_quantum ts_tqexp ts_slpret ts_maxwait ts_lwait PRIORITY LEVEL 200 0 50 0 50 # 0 <... snip ...> 160 0 51 0 51 # 10 <... snip ...> 120 10 52 0 52 # 20 <... snip ...> 80 20 53 0 53 # 30 <... snip ...> 40 30 55 0 55 # 40 <... snip ...> 40 48 58 0 59 # 58 20 49 59 32000 59 # 59
However, I'm running my db on a zone that is not scheduling my database threads as TS threads. Instead these threads are being scheduled as FSS threads (fair share: FSS) as in round-robin execution intervals of up to the quantum interval before blocking (it's more complicated due to projects and tasks and other resource allocation admin stuff). I need to investigate the commands for changing thread priorities on zones. You can find out what class of threads are being scheduled using ps. For example:
[tbrown]: priocntl -s -c TS -i pid -p 60 <Oninit pid number>
As you can see, my shell is running on a fair share thread at priority 59 (equiv. to 60 since 0 is counted as number 1). If for some reason you cannot change the Oninit LWP priorities, then you can at least give them more chances at handling an inbound request by changing the thread quantum level. The quantum interval is based on something called the resolution time (reported as RES). The command to find this out is:
[uid=0(root)@sevengill] [tbrown]: ps -Lce | grep ksh 15808 1 FSS 59 pts/1 0:00 ksh 16269 1 FSS 59 pts/1 0:00 ksh 16142 1 FSS 59 pts/1 0:00 ksh
Convoluted explanation skipped - the quantum for FSS threads is 110 milliseconds. This means that for my FSS threads, they can have a maximum execution interval of 110 milliseconds which is a fairly large value when cycles are computed in nanoseconds. You might try reducing this interval so your Oninit LWPs get more activity on the processor. For example:
[uid=0(root)@sevengill] [tbrown]: dispadmin -g -c FSS # # Fair Share Scheduler Configuration # RES=1000 # # Time Quantum # QUANTUM=110
Due to more convoluted stuff, don't use 100 because that's a special number for Solaris - here's why: the new value of 11 equals 11 clock ticks (not helpful since a tick happens every 10 milliseconds, 11*10 = 110 which is the value when set to 1000). You might try using 500 to start, at least that's where I'm going to start my testing.
(tbrown@nickle) [tbrown]: dispadmin -g -c FSS -r 100 # # Fair Share Scheduler Configuration # RES=100 # # Time Quantum # QUANTUM=11
I haven't experimented with any of this yet on the T5240 - just my workstation. I'm just now getting up to speed on thread classes and class priorities (not to mention the convoluted quantum interval).
(not sure how much this actually helps)
Tracy S Brown
Edited by: pthread_mutex_impl on Jul 14, 2009 7:15 PM