This discussion is archived
1 Reply Latest reply: Sep 1, 2009 7:18 AM by 807567 RSS

Understanding open syscall on FIFO with dtrace

807567 Newbie
Currently Being Moderated
Hi all,

I'm trying to understand open syscall behaviour with respect to FIFOs using dtrace. We have a very intermittent failure with FIFOs, when writer process fails in opening pipe, but reader succeeds. I tried debugging with dtrace and found some weirdness in the failing condition. I'd appreciate if you can throw some light.


Program desc:
READER opens in blocking mode (O_RDONLY = 0)
WRITER opens in non-blocking mode (O_WRONLY | O_NONBLOCK = 1 + 128 = 129)

During failure conditions open() returns -1 to writer (non-blocking call) but reader gets a valid file descriptor.

Please find below the filtered dtrace output (sorted by timestamp - I have stripped off some trivial information)

{color:#999999}timestamp walltimestamp pid tid exe probe func fifo flag/return-value errno
===========================================================
Failure:
1103747969678386 2009 Aug 12 06:22:56 20656 1182 reader open entry /data/fifo1 0 0
1103747970243629 2009 Aug 12 06:22:56 6702 14 writer open entry /data/fifo1 129 0
1103747970401069 2009 Aug 12 06:22:56 20656 1182 reader open return /data/fifo1 48 0
1103747970413489 2009 Aug 12 06:22:56 6702 14 writer open return /data/fifo1 -1 6{color}



{color:#999999}In a successful open, dtrace outs like below:
1103747860071143 2009 Aug 12 06:22:56 20656 1235 reader open entry /data/fifo2 0 0
1103747880222083 2009 Aug 12 06:22:56 6702 14 writer open entry /data/fifo2 129 0
1103747880376943 2009 Aug 12 06:22:56 6702 14 writer open return /data/fifo2 23 0
1103747880394596 2009 Aug 12 06:22:56 20656 1235 reader open return /data/fifo2 49 0{color}



I saw that errno=6 is the expected error returned by a non-blocking fifo open when there's no reader on pipe.



Our assumption is Synchronization by open(FIFO) by reader and writer should be taken care by the OS, which implies when writer gets a valid FD, the reader must also get a valid fd. On this assumption, our guess is there's an intermittent problem with open() implementation.

I'd appreciate if someone can confirm above, or if our assumption is wrong, or any points on further debugging.



Thanks,


KR




Other details:


*uname -a</s
<p>SunOS xxxx 5.10 Generic_125100-09 sun4v sparc SUNW,Sun-Fire-T200



D script:*


syscall::open:entry, syscall::creat:entry,
syscall::open64:entry, syscall::creat64:entry,
syscall::unlink:entry, syscall::rename:entry
{
printf("%u %Y %5d %5d %5d %5d %5d %-12s %-10s %-10s %25s %d %s %d %d\n",
timestamp,walltimestamp, pid, tid, execname, probefunc,probename, copyinstr(arg0), arg1, errno);
self->file=arg0;


}


syscall::open:return, syscall::creat:return,
syscall::open64:return, syscall::creat64:return,
syscall::unlink:return, syscall::rename:return
{
printf("%u %Y %5d %5d %5d %5d %5d %-12s %-10s %-10s %25s %d %s %d\n",
timestamp,walltimestamp, pid, tid, execname, probefunc,probename, stringof(copyinstr(self->file)),arg1,errno);
self->file = 0;
}