2 Replies Latest reply: Sep 11, 2012 8:50 AM by 877841 RSS

    CRS-4639 when "crsctl" with a different user

    877841
      Hi,
      I'm deploying OMS 12c agents on several two-node RAC database, 11.2.0.2 on RedHat x86_64.
      The agent will run with a different user then "root" or "oracle" (owner of GRID) but the agent user ("agentusr") has the secondary group "oinstall".

      All works well but on a RAC installation one node agent see cluster resources, the other not.
      After a deep analysis we find that "agentusr" on one node can run "crsctl check cluster" without problem but on the other node it got:

      CRS-4639: Could not contact Oracle High Availability Services
      CRS-4000: Command Check failed, or completed with errors.

      The "crsctl check cluster" works well with "oracle" user.

      We checked some file permission but we can't find any issue:

      $ ll /var/tmp/.oracle/sCRSD*
      srwxrwxrwx 1 root root 0 Mar 23 11:39 /var/tmp/.oracle/sCRSD_IPC_SOCKET_11
      srwxrwxrwx 1 root root 0 Mar 23 11:39 /var/tmp/.oracle/sCRSD_UI_SOCKET

      We also run the "crsctl check cluster" with "strace" but there isn't any problem on file access.

      Which are the permission (file or groups) needed to run "crsctl" comand?
      Thanks.
      P.
        • 1. Re: CRS-4639 when "crsctl" with a different user
          758358
          Hi,

          Did you solve this? When you took the strace from crsctl did you save it to a file? If so could you check the following?

          egrep 'access|exec' <strace_output_file>

          Any messages in your crs log files on this node? e.g. $CRS_HOME/log/<host_name/alert<host_name>.log

          If this were a pure permissions issue then I'd expect a more relevant error to be returned, something else looks wrong here.

          I did a quick test case on a cluster I have and just creating a brand new user with no relevant groups I can still get a clean execution of that command:
          [agentusr@rac1~]$ groups
          oemagent
          [agentusr@rac1~]$ /u01/app/11.2.0.3/grid/bin/crsctl check cluster
          CRS-4537: Cluster Ready Services is online
          CRS-4529: Cluster Synchronization Services is online
          CRS-4533: Event Manager is online
          [agentusr@rac1~]$ ls -l /u01/app/11.2.0.3/grid/bin/crsctl
          -rwxr-xr-x 1 root oinstall 8279 Jun  3 14:08 /u01/app/11.2.0.3/grid/bin/crsctl
          Thanks
          Paul

          Edited by: PaulDTill on Sep 10, 2012 5:20 PM
          • 2. Re: CRS-4639 when "crsctl" with a different user
            877841
            Hi.
            We solved using agent user on one node and oracle user for the other (I know it's not perfect, but works now).
            No, there isn't any message in alert log file for this issue.

            I suspect a permission problem, I'm not sure however. Running strace I notice that "bad" node hung on polling from a socket [fd=5 => /var/tmp/.oracle/sOHASD_UI_SOCKET] (see below "bad"), while good node runs fine.

            I checked that file "/var/tmp/.oracle/sOHASD_UI_SOCKET" but permission are the same.
            Thank you for the answer.



            Good:
            [pid 26962] socket(PF_FILE, SOCK_STREAM, 0) = 5
            [pid 26962] access("/var/tmp/.oracle/sOHASD_UI_SOCKET", F_OK) = 0
            [pid 26962] connect(5, {sa_family=AF_FILE, path="/var/tmp/.oracle/sOHASD_UI_SOCKET"...}, 110) = 0
            [pid 26962] fcntl(5, F_SETFD, FD_CLOEXEC) = 0
            [pid 26962] rt_sigaction(SIGPIPE, {0x1, ~[ILL ABRT BUS FPE SEGV USR2 XCPU XFSZ SYS RTMIN RT_1], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x320880ebe0}, {SIG_DFL, [], 0}, 8) = 0
            [pid 26962] fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
            [pid 26962] write(5, "0\0\0\0\1\0\1\0\1\0\1\0\1\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 48) = 48
            [pid 26962] open("/etc/localtime", O_RDONLY) = 6
            [pid 26962] fstat(6, {st_mode=S_IFREG|0644, st_size=2652, ...}) = 0
            [pid 26962] fstat(6, {st_mode=S_IFREG|0644, st_size=2652, ...}) = 0
            [pid 26962] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaac77000
            [pid 26962] read(6, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\6\0\0\0\0"..., 4096) = 2652
            [pid 26962] lseek(6, -1701, SEEK_CUR)   = 951
            [pid 26962] read(6, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\10\0\0\0\0"..., 4096) = 1701
            [pid 26962] close(6)                    = 0
            [pid 26962] munmap(0x2aaaaac77000, 4096) = 0
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
            [pid 26962] poll([{fd=4, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLOUT|POLLRDNORM}], 2, -1) = 1 ([{fd=5, revents=POLLIN|POLLOUT|POLLRDNORM}])
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
            [pid 26962] read(5, "0\0\0\0\1\0\1\0\1\0\1\0\1\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 32768) = 48
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
            [pid 26962] write(5, "4\0\0\0\1\0\2\0\1\1\1\1\2\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 52) = 52
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
            [pid 26962] poll([{fd=4, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}], 2, -1) = 1 ([{fd=5, revents=POLLIN|POLLRDNORM}])
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
            [pid 26962] read(5, "r\0\0\0\1\0\2\0\1\1\1\1\2\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 32768) = 114
            [pid 26962] open("/crsexe/GRID/11202/auth/ohasd/itromdb102/A7867194/2bba18a", O_WRONLY|O_CREAT|O_EXCL, 0644) = 6
            [pid 26962] fchmod(6, 0644)             = 0
            [pid 26962] write(6, "oUXc", 4)         = 4
            [pid 26962] close(6)                    = 0
            [pid 26962] write(5, "4\0\0\0\2\0\2\0\1\1\1\1\3\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 52) = 52
            [pid 26962] write(5, "8\0\0\0\3\0\2\0\1\1\1\1\4\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 56) = 56
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
            [pid 26962] poll([{fd=4, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}], 2, -1) = 1 ([{fd=5, revents=POLLIN|POLLRDNORM}])
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
            [pid 26962] read(5, "8\0\0\0\3\0\2\0\1\1\1\1\3\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 32768) = 56
            [pid 26962] write(5, "+\2\0\0PC\0\0\v\2\2\0\5\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 555) = 555
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
            [pid 26962] poll([{fd=4, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}], 2, -1) = 1 ([{fd=5, revents=POLLIN|POLLRDNORM}])
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341762
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341762
            [pid 26962] read(5, "\32\4\0\0PC\0\0\v\2\2\0\4\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 32768) = 1050
            [pid 26962] brk(0x1f4eb000)             = 0x1f4eb000
            [pid 26962] brk(0x1f513000)             = 0x1f513000
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341762
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341762
            [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341762
            [pid 26962] write(5, "0\0\0\0\2\0\20\0\1\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\354\240\4\206\0\0\0\0"..., 48) = 48
            [pid 26962] fcntl(5, F_GETFL)           = 0x802 (flags O_RDWR|O_NONBLOCK)
            [pid 26962] fcntl(5, F_SETFL, O_RDWR)   = 0
            [pid 26962] close(5)                    = 0
            Bad:
            [pid 17153] socket(PF_FILE, SOCK_STREAM, 0) = 5
            [pid 17153] access("/var/tmp/.oracle/sOHASD_UI_SOCKET", F_OK) = 0
            [pid 17153] connect(5, {sa_family=AF_FILE, path="/var/tmp/.oracle/sOHASD_UI_SOCKET"...}, 110) = 0
            [pid 17153] fcntl(5, F_SETFD, FD_CLOEXEC) = 0
            [pid 17153] rt_sigaction(SIGPIPE, {0x1, ~[ILL ABRT BUS FPE SEGV USR2 XCPU XFSZ SYS RTMIN RT_1], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x3fb8c0ebe0}, {SIG_DFL, [], 0}, 8) = 0
            [pid 17153] fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
            [pid 17153] write(5, "0\0\0\0\1\0\1\0\1\0\1\0\1\0\0\0\0\0\0\0\1\0\0\0\272p\342u\0\0\0\0"..., 48) = 48
            [pid 17153] open("/etc/localtime", O_RDONLY) = 6
            [pid 17153] fstat(6, {st_mode=S_IFREG|0644, st_size=2652, ...}) = 0
            [pid 17153] fstat(6, {st_mode=S_IFREG|0644, st_size=2652, ...}) = 0
            [pid 17153] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaac77000
            [pid 17153] read(6, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\6\0\0\0\0"..., 4096) = 2652
            [pid 17153] lseek(6, -1701, SEEK_CUR)   = 951
            [pid 17153] read(6, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\10\0\0\0\0"..., 4096) = 1701
            [pid 17153] close(6)                    = 0
            [pid 17153] munmap(0x2aaaaac77000, 4096) = 0
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
            [pid 17153] poll([{fd=4, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLOUT|POLLRDNORM}], 2, -1) = 1 ([{fd=5, revents=POLLIN|POLLOUT|POLLRDNORM}])    *****Hung here
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
            [pid 17153] read(5, "0\0\0\0\1\0\1\0\1\0\1\0\1\0\0\0\0\0\0\0\1\0\0\0\272p\342u\0\0\0\0"..., 32768) = 163
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
            [pid 17153] write(5, "4\0\0\0\1\0\2\0\1\1\1\1\2\0\0\0\0\0\0\0\1\0\0\0\272p\342u\0\0\0\0"..., 52) = 52
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
            [pid 17153] open("/crsexe/GRID/11202/auth/ohasd/itromdb101/A6119445/93558eb7", O_WRONLY|O_CREAT|O_EXCL, 0644) = 6
            [pid 17153] fchmod(6, 0644)             = 0
            [pid 17153] write(6, "\226\t\210\335", 4) = 4
            [pid 17153] close(6)                    = 0
            [pid 17153] write(5, "4\0\0\0\2\0\2\0\1\1\1\1\3\0\0\0\0\0\0\0\1\0\0\0\272p\342u\0\0\0\0"..., 52) = 52
            [pid 17153] write(5, "8\0\0\0\3\0\2\0\1\1\1\1\4\0\0\0\0\0\0\0\1\0\0\0\272p\342u\0\0\0\0"..., 56) = 56
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
            [pid 17153] poll([{fd=4, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}], 2, -1) = 1 ([{fd=5, revents=POLLIN|POLLRDNORM}])
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
            [pid 17153] read(5, "0\0\0\0\2\0\20\0\1\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\272p\342u\0\0\0\0"..., 32768) = 48
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
            [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
            [pid 17153] fcntl(5, F_GETFL)           = 0x802 (flags O_RDWR|O_NONBLOCK)
            [pid 17153] fcntl(5, F_SETFL, O_RDWR)   = 0
            [pid 17153] close(5)                    = 0