Several users have run into a java.rmi.ConnectException message during deployment, and need some information on how to troubleshoot this. We'll be adding this to our documentation, but thought it might be useful now.
In NoSQL DB R2.0, the show plan -id <id> command displays plan status and any errors that may have occurred. If you see this exception listed in the error section, it means that the Admin service was unable to reach one of the NoSQL DB components while the system was trying to execute an administrative command.
The first step is check on the overall status of the store. One way to do that is to use the show topology command, followed by the verify command. The show topology command will display the layout of the store, while verify will check the status of each component. A component that can't be reached will display a status of UNREACHABLE.
In general, NoSQL DB attempts to make any troubleshooting information you need available through the Admin CLI, through commands like show plan, show events, show faults. A ConnectException message means that a communication channel either was not established, or failed, and in those cases, troubleshooting information may not have been conveyed to the Admin service. This is particularly true if there was a communication failure during initial deployment of a component.
The next step would be to look in the NoSQL log files for more detailed error information. Look first in the aggregated storewide log, which can be found in the node that is hosting the master admin service, under the KVROOT/<storename>/logs/<storename>*.log. You can locate the master Admin through the "verify" command. This log file contains information from all the different components in the store.
Suppose Replication Node rg1-rn3, on Storage Node sn3, is not responsive. Look through the <storename>.log for entries made by those components. Each log entry is prefixed with the name of the component that issued the log message. Sometimes the aggregated storewide log has too much information, or sometimes information from a component was not transmitted to the Admin, and therefore isn't included in the aggregated log. In that case, it can be more helpful to look at the Replication Node or Storage Node logs directly, which can be found on their host, in the <KVROOT>/<storename>/logs directory.
If the problem occurs during an initial deployment, it can be particularly helpful to review the Storage Node logs to make sure that the Storage Node Agent on that node was created correctly, and that the process came up as expected, according to the installation directions, and the Replication Node logs. Some of the common reasons why a Replication Node might not come up are that here is inadequate heap and memory on the node, or that a initial configuration parameter is misspelled or has an invalid value, or that the time skew between components is greater than NoSQL's acceptable limit.