This discussion is archived
2 Replies Latest reply: Sep 11, 2012 6:50 AM by 877841 RSS

CRS-4639 when "crsctl" with a different user

877841 Newbie
Currently Being Moderated
Hi,
I'm deploying OMS 12c agents on several two-node RAC database, 11.2.0.2 on RedHat x86_64.
The agent will run with a different user then "root" or "oracle" (owner of GRID) but the agent user ("agentusr") has the secondary group "oinstall".

All works well but on a RAC installation one node agent see cluster resources, the other not.
After a deep analysis we find that "agentusr" on one node can run "crsctl check cluster" without problem but on the other node it got:

CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Check failed, or completed with errors.

The "crsctl check cluster" works well with "oracle" user.

We checked some file permission but we can't find any issue:

$ ll /var/tmp/.oracle/sCRSD*
srwxrwxrwx 1 root root 0 Mar 23 11:39 /var/tmp/.oracle/sCRSD_IPC_SOCKET_11
srwxrwxrwx 1 root root 0 Mar 23 11:39 /var/tmp/.oracle/sCRSD_UI_SOCKET

We also run the "crsctl check cluster" with "strace" but there isn't any problem on file access.

Which are the permission (file or groups) needed to run "crsctl" comand?
Thanks.
P.
  • 1. Re: CRS-4639 when "crsctl" with a different user
    758358 Pro
    Currently Being Moderated
    Hi,

    Did you solve this? When you took the strace from crsctl did you save it to a file? If so could you check the following?

    egrep 'access|exec' <strace_output_file>

    Any messages in your crs log files on this node? e.g. $CRS_HOME/log/<host_name/alert<host_name>.log

    If this were a pure permissions issue then I'd expect a more relevant error to be returned, something else looks wrong here.

    I did a quick test case on a cluster I have and just creating a brand new user with no relevant groups I can still get a clean execution of that command:
    [agentusr@rac1~]$ groups
    oemagent
    [agentusr@rac1~]$ /u01/app/11.2.0.3/grid/bin/crsctl check cluster
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online
    [agentusr@rac1~]$ ls -l /u01/app/11.2.0.3/grid/bin/crsctl
    -rwxr-xr-x 1 root oinstall 8279 Jun  3 14:08 /u01/app/11.2.0.3/grid/bin/crsctl
    Thanks
    Paul

    Edited by: PaulDTill on Sep 10, 2012 5:20 PM
  • 2. Re: CRS-4639 when "crsctl" with a different user
    877841 Newbie
    Currently Being Moderated
    Hi.
    We solved using agent user on one node and oracle user for the other (I know it's not perfect, but works now).
    No, there isn't any message in alert log file for this issue.

    I suspect a permission problem, I'm not sure however. Running strace I notice that "bad" node hung on polling from a socket [fd=5 => /var/tmp/.oracle/sOHASD_UI_SOCKET] (see below "bad"), while good node runs fine.

    I checked that file "/var/tmp/.oracle/sOHASD_UI_SOCKET" but permission are the same.
    Thank you for the answer.



    Good:
    [pid 26962] socket(PF_FILE, SOCK_STREAM, 0) = 5
    [pid 26962] access("/var/tmp/.oracle/sOHASD_UI_SOCKET", F_OK) = 0
    [pid 26962] connect(5, {sa_family=AF_FILE, path="/var/tmp/.oracle/sOHASD_UI_SOCKET"...}, 110) = 0
    [pid 26962] fcntl(5, F_SETFD, FD_CLOEXEC) = 0
    [pid 26962] rt_sigaction(SIGPIPE, {0x1, ~[ILL ABRT BUS FPE SEGV USR2 XCPU XFSZ SYS RTMIN RT_1], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x320880ebe0}, {SIG_DFL, [], 0}, 8) = 0
    [pid 26962] fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
    [pid 26962] write(5, "0\0\0\0\1\0\1\0\1\0\1\0\1\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 48) = 48
    [pid 26962] open("/etc/localtime", O_RDONLY) = 6
    [pid 26962] fstat(6, {st_mode=S_IFREG|0644, st_size=2652, ...}) = 0
    [pid 26962] fstat(6, {st_mode=S_IFREG|0644, st_size=2652, ...}) = 0
    [pid 26962] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaac77000
    [pid 26962] read(6, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\6\0\0\0\0"..., 4096) = 2652
    [pid 26962] lseek(6, -1701, SEEK_CUR)   = 951
    [pid 26962] read(6, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\10\0\0\0\0"..., 4096) = 1701
    [pid 26962] close(6)                    = 0
    [pid 26962] munmap(0x2aaaaac77000, 4096) = 0
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
    [pid 26962] poll([{fd=4, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLOUT|POLLRDNORM}], 2, -1) = 1 ([{fd=5, revents=POLLIN|POLLOUT|POLLRDNORM}])
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
    [pid 26962] read(5, "0\0\0\0\1\0\1\0\1\0\1\0\1\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 32768) = 48
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
    [pid 26962] write(5, "4\0\0\0\1\0\2\0\1\1\1\1\2\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 52) = 52
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341758
    [pid 26962] poll([{fd=4, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}], 2, -1) = 1 ([{fd=5, revents=POLLIN|POLLRDNORM}])
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
    [pid 26962] read(5, "r\0\0\0\1\0\2\0\1\1\1\1\2\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 32768) = 114
    [pid 26962] open("/crsexe/GRID/11202/auth/ohasd/itromdb102/A7867194/2bba18a", O_WRONLY|O_CREAT|O_EXCL, 0644) = 6
    [pid 26962] fchmod(6, 0644)             = 0
    [pid 26962] write(6, "oUXc", 4)         = 4
    [pid 26962] close(6)                    = 0
    [pid 26962] write(5, "4\0\0\0\2\0\2\0\1\1\1\1\3\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 52) = 52
    [pid 26962] write(5, "8\0\0\0\3\0\2\0\1\1\1\1\4\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 56) = 56
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
    [pid 26962] poll([{fd=4, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}], 2, -1) = 1 ([{fd=5, revents=POLLIN|POLLRDNORM}])
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
    [pid 26962] read(5, "8\0\0\0\3\0\2\0\1\1\1\1\3\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 32768) = 56
    [pid 26962] write(5, "+\2\0\0PC\0\0\v\2\2\0\5\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 555) = 555
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341759
    [pid 26962] poll([{fd=4, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}], 2, -1) = 1 ([{fd=5, revents=POLLIN|POLLRDNORM}])
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341762
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341762
    [pid 26962] read(5, "\32\4\0\0PC\0\0\v\2\2\0\4\0\0\0\0\0\0\0\1\0\0\0\354\240\4\206\0\0\0\0"..., 32768) = 1050
    [pid 26962] brk(0x1f4eb000)             = 0x1f4eb000
    [pid 26962] brk(0x1f513000)             = 0x1f513000
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341762
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341762
    [pid 26962] times({tms_utime=1, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 654341762
    [pid 26962] write(5, "0\0\0\0\2\0\20\0\1\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\354\240\4\206\0\0\0\0"..., 48) = 48
    [pid 26962] fcntl(5, F_GETFL)           = 0x802 (flags O_RDWR|O_NONBLOCK)
    [pid 26962] fcntl(5, F_SETFL, O_RDWR)   = 0
    [pid 26962] close(5)                    = 0
    Bad:
    [pid 17153] socket(PF_FILE, SOCK_STREAM, 0) = 5
    [pid 17153] access("/var/tmp/.oracle/sOHASD_UI_SOCKET", F_OK) = 0
    [pid 17153] connect(5, {sa_family=AF_FILE, path="/var/tmp/.oracle/sOHASD_UI_SOCKET"...}, 110) = 0
    [pid 17153] fcntl(5, F_SETFD, FD_CLOEXEC) = 0
    [pid 17153] rt_sigaction(SIGPIPE, {0x1, ~[ILL ABRT BUS FPE SEGV USR2 XCPU XFSZ SYS RTMIN RT_1], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x3fb8c0ebe0}, {SIG_DFL, [], 0}, 8) = 0
    [pid 17153] fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
    [pid 17153] write(5, "0\0\0\0\1\0\1\0\1\0\1\0\1\0\0\0\0\0\0\0\1\0\0\0\272p\342u\0\0\0\0"..., 48) = 48
    [pid 17153] open("/etc/localtime", O_RDONLY) = 6
    [pid 17153] fstat(6, {st_mode=S_IFREG|0644, st_size=2652, ...}) = 0
    [pid 17153] fstat(6, {st_mode=S_IFREG|0644, st_size=2652, ...}) = 0
    [pid 17153] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaac77000
    [pid 17153] read(6, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6\0\0\0\6\0\0\0\0"..., 4096) = 2652
    [pid 17153] lseek(6, -1701, SEEK_CUR)   = 951
    [pid 17153] read(6, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\10\0\0\0\0"..., 4096) = 1701
    [pid 17153] close(6)                    = 0
    [pid 17153] munmap(0x2aaaaac77000, 4096) = 0
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
    [pid 17153] poll([{fd=4, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLOUT|POLLRDNORM}], 2, -1) = 1 ([{fd=5, revents=POLLIN|POLLOUT|POLLRDNORM}])    *****Hung here
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
    [pid 17153] read(5, "0\0\0\0\1\0\1\0\1\0\1\0\1\0\0\0\0\0\0\0\1\0\0\0\272p\342u\0\0\0\0"..., 32768) = 163
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
    [pid 17153] write(5, "4\0\0\0\1\0\2\0\1\1\1\1\2\0\0\0\0\0\0\0\1\0\0\0\272p\342u\0\0\0\0"..., 52) = 52
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
    [pid 17153] open("/crsexe/GRID/11202/auth/ohasd/itromdb101/A6119445/93558eb7", O_WRONLY|O_CREAT|O_EXCL, 0644) = 6
    [pid 17153] fchmod(6, 0644)             = 0
    [pid 17153] write(6, "\226\t\210\335", 4) = 4
    [pid 17153] close(6)                    = 0
    [pid 17153] write(5, "4\0\0\0\2\0\2\0\1\1\1\1\3\0\0\0\0\0\0\0\1\0\0\0\272p\342u\0\0\0\0"..., 52) = 52
    [pid 17153] write(5, "8\0\0\0\3\0\2\0\1\1\1\1\4\0\0\0\0\0\0\0\1\0\0\0\272p\342u\0\0\0\0"..., 56) = 56
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915764345
    [pid 17153] poll([{fd=4, events=POLLIN|POLLRDNORM}, {fd=5, events=POLLIN|POLLRDNORM}], 2, -1) = 1 ([{fd=5, revents=POLLIN|POLLRDNORM}])
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
    [pid 17153] read(5, "0\0\0\0\2\0\20\0\1\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\272p\342u\0\0\0\0"..., 32768) = 48
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
    [pid 17153] times({tms_utime=2, tms_stime=2, tms_cutime=0, tms_cstime=0}) = 1915765914
    [pid 17153] fcntl(5, F_GETFL)           = 0x802 (flags O_RDWR|O_NONBLOCK)
    [pid 17153] fcntl(5, F_SETFL, O_RDWR)   = 0
    [pid 17153] close(5)                    = 0

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points