Let's say I want to find out which processes might be getting most of the TSB misses. Any ideas of how to attribute a TSB miss to the process whose access triggered it? Specifically, I want a count or rate, so I can see which is highest...so that I only have to use ppgsz or mpss.so.1 with those few. I think maybe determining what triggers a call to sfmmu_tsbmiss_exception() may be what I want, although whether that could be related to a process, I have no idea. But if anything short of an exotic DIY loadable kernel hack could be used to figure that out, I'd suppose it would be DTrace.
(I may also cut tsb_rss_factor to 256 from the default, but that's another story)
This is for Solaris 10 (SPARC - a 24 CPU 6800 with 192GB RAM). I've seen trapstat -t overhead percent for some CPUs as high as 20%, and that wasn't when the system was really bad - I wasn't running trapstat from cron yet last time that happened. The system was nearly frozen for some minutes, but there were no I/O errors or anything like that. An xterm displayed back from there would have its scroll bar work fine (no context switch), but even tty driver level echoing was incredibly slow for the shell within that xterm...suggesting that even with very little swap in use, SOMETHING was bogging the system down to the point that a context switch was exceptionally onerous.