This discussion is archived
7 Replies Latest reply: Mar 1, 2013 9:49 PM by JimKlimov RSS

Convergence intermittently fails to display Attachments (ISS error)

JimKlimov Newbie
Currently Being Moderated
I am setting up a POC for a customer (single-host, following the Wiki instructions), and found that ISS often returns HTTP-500 error, which in Convergence interface translates into "An error occured while accessing the ISS service" dialog window.

When in Convergence we can click into the Attachments folder (of an account with about 1000 messages), and press refresh or just scroll around so that pieces of the attachment list have to be fetched by the interface. In about quarter of such requests (logged as accesses to /rest webapp) the application returns an error and the user is kicked out of the Attachments folder back into INBOX. Sometimes it happens instantly upon a click into the Attachments folder, but more often it needs some scrolling around to reproduce. Alas, it does not take long...

This happens both with "indexeradmin" proxy-auth enabled and disabled.

As a hint, there may be some problem with IMQ on this deployment - it consistently hoards over a thousand "connections". Things used to fail during startup when it hit the thousand mark, however after increasing the limits in IMQ Broker configs to a 20000 limit and enforcing "ulimit -n 65535" in initscripts which launch the system, solved these. Still, maybe some such limitation remains.

# imqcmd list cxn -u admin -passfile /tmp/p | grep jissuser | wc -l
1157

I have no idea why it hangs on to this many "connections" or what they are (or how to properly purge them short of recreating the IMQ server). Or if they are related to the problem...

Thanks for any insights,
//Jim Klimov

PS: Alternately, maybe this is a known bug fixed in a later CommSuite patch? The customer intends to buy the product anyway, so accessing MOS and updating the production installation sooner or later won't be a problem. But they would be relieved to know that they don't pay for nothing - and this inconvenience would go away ;)

Edited by: JimKlimov on Feb 27, 2013 5:55 AM

PPS: Banging the /rest servlet with the same requests, I got hold on the headers in the HTTP-500 error message (which pops up in closer to half of cases now). Unfortunately these get lost in logging of the appserver's request to itself.

HTTP/1.1 500 $Proxy28 cannot be cast to com.sun.comms.iss.search.Search
X-Powered-By: JSP/2.1
Set-Cookie: JSESSIONID=99ae48111ed983a27ff3056900ae; Path=/rest
Content-Type: text/html;charset=UTF-8
Date: Wed, 27 Feb 2013 03:03:34 GMT
Connection: close


<html>
<head>
<title>Error Page</title>
</head>
<body>
<h2>Error Page</h2>
The server encountered an error. See the log file for details.
</body>
</html>

I failed to google up anything relevant to ISS so far, though this ("$Proxy28 cannot be cast to") seems like a somewhat common response in the Spring framework :(
  • 1. Re: Convergence intermittently fails to display Attachments (ISS error)
    JimKlimov Newbie
    Currently Being Moderated
    Hmmm... Having created a new mail user, added a few messages with attachments, and bootstrapped him for ISS, I could not reproduce the problem. Maybe the original admin user's mailbox search index got corrupt somehow. Quite likely, during the experiments I wanted to abort some issadmin operation with Ctrl+C, could this have left the store in an indeterminate state (sloppy programming for a DB, then)?..

    So now I did "issadmin deleteuser" and also removed remaining files from "his" path in ISS storage (a subdirectory under /var/opt/sun/comms/jiss/index/store/01) and bootstrapped this user's mailbox, and also can't reproduce the error in the Web-GUI however long I click. However, now each request I do with telnet (the tests I did before) returns a similar HTTP-500 Error, but mentions Proxy30.

    So far I'm not sure if I resolved the problem, but if I have (and don't report back with more gory details) - the workaround is to not only delete the presumed-corrupted ISS user account, but also to wipe his backend storage, otherwise it is seemingly reused during next bootstrapping, and remains corrupted.

    //Jim Klimov
  • 2. Re: Convergence intermittently fails to display Attachments (ISS error)
    993767 Newbie
    Currently Being Moderated
    This is definetly an error on the iss side, what version of ISS are you running? . The first thing to figure out is why so many connection are being created, can you tell me the values of the following parameters in your <iss-base>/etc/ jiss.conf file:
    iss.rest.proxypool.size
    iss.indexsvc.indexthread.count
    iss.jmqconsumer.thread.count

    Also can you provide the output of imqcmd list dst.

    | HTTP/1.1 500 $Proxy28 cannot be cast to com.sun.comms.iss.search.Search

    This error can also be caused by not restarting glassfish after a patch is installed and a bug in ealier versions where connections are not clean up when the rest war is redeployed. Restarting glassfish fixes both of these problems.
    -
    Jeff Bilicki
  • 3. Re: Convergence intermittently fails to display Attachments (ISS error)
    JimKlimov Newbie
    Currently Being Moderated
    Thanks for a quick response, replies below:
    Values of the following parameters in <iss-base>/etc/ jiss.conf file:
    iss.rest.proxypool.size
    iss.indexsvc.indexthread.count
    iss.jmqconsumer.thread.count
    iss.rest.proxypool.size = 512
    iss.indexsvc.indexthread.count = 768
    iss.jmqconsumer.thread.count = 128

    I don't think I've ever touched these, so I guess they are at defaults?

    The IMQ broker properties are now such (the 20000 max'es values added, in addition to the CommSuite Wiki suggestions):

    # cat /var/imq/instances/imqbroker/props/config.properties | grep -v '#'
    imq.system.max_count=20000
    imq.destination.DMQ.truncateBody=true
    imq.portmapper.backlog=-1
    imq.autocreate.reaptime=7200
    imq.instanceconfig.version=300
    imq.autocreate.destination.maxNumProducers=-1
    imq.jms.max_threads=20000
    imq.autocreate.destination.limitBehavior=REMOVE_OLDEST
    Also can you provide the output of imqcmd list dst.
    # imqcmd list dst -u admin -passfile /.imqpass
    Listing all the destinations on the broker specified by:

    -------------------------
    Host Primary Port
    -------------------------
    localhost 7676

    ---------------------------------------------------------------------------------------------------
    Name Type State Producers Consumers Msgs
    Total Wildcard Total Wildcard Count Remote UnAck Avg Size
    ---------------------------------------------------------------------------------------------------
    AccountState.iss1 Topic RUNNING 0 0 3 2 0 0 0 0.0
    INDEXMS Queue RUNNING 24 - 1 - 0 0 0 0.0
    Indexiss1 Queue RUNNING 128 - 1 - 0 0 0 0.0
    OUCSIM Topic RUNNING 0 0 1 0 0 0 0 0.0
    OUCS Queue RUNNING 24 - 0 - 0 0 0 0.0
    SearchTopic Topic RUNNING 1024 0 1 0 0 0 0 0.0
    mq.sys.dmq Queue RUNNING 0 - 0 - 1000 0 0 892.481

    Successfully listed destinations.


    # imqcmd list cxn -u admin -passfile /.imqpass | grep jissuser | wc -l
    1157

    ### Note, this value is pretty static - happens soon after startups of the complex and is the same for days now. I thought it could be related to amount of attachments in the system, but failed to prove any correlation.


    # imqcmd list cxn -u admin -passfile /.imqpass | grep jissuser | while read _TAG TAIL; do \
    echo $TAIL; done | sort | uniq -c

    5 jissuser jms 0 1 10.0.16.60
    1152 jissuser jms 1 1 10.0.16.60



    # imqcmd list cxn -u admin -passfile /.imqpass | grep -v jissuser
    Listing all the connections on the broker specified by:

    -------------------------
    Host Primary Port
    -------------------------
    localhost 7676

    -----------------------------------------------------------------------
    Connection ID User Service Producers Consumers Host
    -----------------------------------------------------------------------
    657311228090714624 guest jms 0 1 10.0.16.60
    657311240558612992 jesuser jms 8 0 10.0.16.60
    657311240558746112 jesuser jms 8 0 10.0.16.60
    657311240558922752 jesuser jms 8 0 10.0.16.60
    657311240559055872 jesuser jms 8 0 10.0.16.60
    657311240559255808 jesuser jms 8 0 10.0.16.60
    657311240559388928 jesuser jms 8 0 10.0.16.60
    657311247065615616 jesuser jms 0 1 10.0.16.60
    657311265783109120 admin admin 1 1 127.0.0.1

    Successfully listed connections.

    --

    As for Glassfish - it is 2.1.1 with HADB, as required by CommSuite Wiki. I did not yet test if newer versions (patches up from the 2.1.1 release, or the recent 3.1.2.2 would work or differ subtly and catastrophically). This is as much a pristine OUCS 7u2 installation as possible :)

    The whole glassfish domain was restarted in case of suspicions that app-reloading might be the cause, and the operating environment as well on a couple of occasions.

    One "wild idea" was that maybe there are some timing problems caused by the stack running in a VM (vs. hardware)?

    Thanks,
    //Jim Klimov
  • 4. Re: Convergence intermittently fails to display Attachments (ISS error)
    993767 Newbie
    Currently Being Moderated
    | SearchTopic Topic RUNNING 1024 0 1 0 0 0 0 0.0

    This is what causing the large number of connections. This corresponds to the iss.rest.proxypool.size in jiss.conf, it looks like it is double of what it should be. Try shutting down glasshfish on the system and see how many procuders are listed for SearchTopic in imqcmd list dst. It almost looks like two iss instances are pointed to the same broker. Did the restart fix the " HTTP/1.1 500 $Proxy28 cannot be cast to com.sun.comms.iss.search.Search" issues? Or can you still not perform searches?

    Edited by: user12607993 on Feb 27, 2013 3:30 PM
  • 5. Re: Convergence intermittently fails to display Attachments (ISS error)
    JimKlimov Newbie
    Currently Being Moderated
    Right-on! At least, this led me to halving the amount of such hanging requests, and finding another possible misconfiguration detail.

    So, the Glassfish server had two virtual servers on one set of default http listeners; one server for admin stuff (DA, DSCC), another for Convergence. The JISS webapps were "targeted" to both virtual hosts - and apparently ran twice (I somehow expected them to be one app, one context, two entry points - seems they are separate app instances, and maybe even conflicting at that). Attaching them to only one host (with Convergence) halved the connections.
    I also revised iwc/config/configuration.xml and jiss/config/jiss.conf so that the hostname for JISS would match that of the one virtual host used now, and it happened that as of late the two configs also referenced different virtual hosts. I am not sure if this could be fatal either, but "unclean" it was. So both configs now reference the one vhost of appserver, and things seem to work stable now in the Convergence GUI.

    Now I have 645 "jiss" connections hanging, which seems like 512+128+5:

    # imqcmd list cxn -u admin -passfile /.imqpass | grep jissuser | while read _TAG TAIL; do \
    echo $TAIL; done | sort | uniq -c
    5 jissuser jms 0 1 10.0.16.60
    640 jissuser jms 1 1 10.0.16.60

    Thanks for the suggestions, I hope this case is closed for good! :)

    Edited by: JimKlimov on Mar 1, 2013 8:18 AM

    PS: Also made sure that messaging server's msg.conf references the correct virtual host in service.imap.indexer.hostname.
  • 6. Re: Convergence intermittently fails to display Attachments (ISS error)
    993767 Newbie
    Currently Being Moderated
    | Now I have 645 "jiss" connections hanging, which seems like 512+128+5:

    Just to be clear they are not hanging, they are waiting to be utilized. ;)
  • 7. Re: Convergence intermittently fails to display Attachments (ISS error)
    JimKlimov Newbie
    Currently Being Moderated
    Right, thanks :)

    Also, I see that this results in a lot of "ESTABLISHED" TCP connections (below). Was I right to boost file descriptor limit with "ulimit -n 65535" (up from default 256) during my configuration experiments (asenv.conf for glassfish, initscript for imq IIRC)? This step is not required by docs, but I figured it was reasonable and indeed it solved some of the first symptoms. Since things work out of the box for others, I guess the increase of ulimit is done (within allowed soft limits, or by some other tunables' defaults I didn't find) by the software itself, by default?

    # telnet localhost 7676 | grep NORMAL
    jms tcp NORMAL 32804

    # netstat -an | grep ESTA | grep 32804 | wc -l
    1306

    Note: since both server and client are on this machine, the count seems "doubled". This is 1306/2 = 653 persistent connections.

    Also, if my instability problem was indeed due to two copies of ISS webapps running and sometimes failing to instantiate, this concerns me about possible HA-clustering of the solution. Was it a bug (i.e. 2 virthosts in one JVM are not supported by ISS, or old Glassfish, or...)? Should I expect that separate individual appserver JVMs running on other hosts in the cluster won't show such conflicts?

    Thanks again,
    //Jim

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points