i'm running a Oracle NoSQL Store (2.0.23) in a 2 node environment.
In the web admin interface under "logs" i find the message
+"UTC WARNING [rg1-rn2] Exception in ResolveHandlerThread when contacting:rg1-rn2"+
repeating each second.
A more detailled error message can be found in log-file rg1-rn2_0.log, which resides on second node of my store:
+2013-01-23 14:14:37.703 UTC WARNING [rg1-rn2] Exception in ResolveHandlerThread when contacting:rg1-rn2+
java.lang.NullPointerException at oracle.kv.impl.util.registry.RegistryUtils.getRequestHandler(RegistryUtils.java:125) at oracle.kv.impl.api.rgstate.RepNodeState$ReqHandlerRef.resolve(RepNodeState.java:592) at oracle.kv.impl.api.rgstate.RepNodeState.resolveReqHandlerRef(RepNodeState.java:181) at oracle.kv.impl.api.rgstate.RepNodeStateUpdateThread$ResolveHandler.run(RepNodeStateUpdateThread.java:395) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662)
When validating/verifying the store (either by web interface or by CLI), everything seems to be ok.
Output from CLI (strangely no performance information from rg1-rn2, which reports messages above): kv-> show topology store=sicstore numPartitions=300 sequence=307
+dc=[dc1] name=SicDC repFactor=2+
+sn=[sn1] dc=dc1 than-pc:5000 capacity=1 RUNNING+
+[rg1-rn1] RUNNING+ single-op avg latency=0.041355196 ms multi-op avg latency=1.6138855 ms
+sn=[sn2] dc=dc1 10.74.10.221:5000 capacity=1 RUNNING+
+[rg1-rn2] RUNNING+ No performance info available
+shard=[rg1] num partitions=300+
kv-> topology validate Validation for the current deployed topology: No problems kv->
From web interface: Verification complete, no violations.
Is the error message resulting from a slight (< 1 sec) difference of time on both nodes? I'm not able
to sync the time on both machines via ntp, as on one of them i do not have rights to modify time
settings (system is only for training purpose).
If so, can i ignore the warning or do i have to expect some side-effects?
This is an interesting, and unusual problem. While it doesn't affect you operationally it's something that should be investigated, especially with the null pointer exception.
It'd help to get the following information. The log files should be emailed off-forum to george dot feinberg at you-know-where.
1. What version of the system are you using? CE or EE?
2. Complete log files for the two RNs (rg1-rn1 and rg1-rn2). You'll need to get these from the machines themselves. If the log files have rolled and there's more than one, send them all (the entire contents of the log directories would be OK if that's easier).
one thing regarding the installation of the store: one of the nodes runs on Oracle Linux, the other one is running on Win7.
Just as additional information, if it should be relevant for the described situation.
I'm updating this thread. Thomas and I have had some offline communication regarding the log files and the store. The brief summary is that it's not yet clear why this situation arose but there are some clues:
1. The 2-node store started off with a clock skew problem that caused it to not work at all
2. Thomas fixed the skew and retried the deployment plans keeping the store intact (most people tend to make a new store when this happens)
3. The store became functional but in a state where an internal thread would log a warning every second
The development team is still looking into this.
Note: the fact that Windows 7 was involved is probably not a factor.
Disclaimer: technically Oracle NoSQL Database is not supported on that platform.
Another update. The cause has been found and the fix will find its way to the code base, but it is not critical. In this case it was related to the fact that the store started out life as a single node, with replication factor 1 and was expanded to include another node and replication factor 2. The clock skew and other environmental factors may or may not have contributed.
In the end the store was functional, but was logging a lot of unnecessary warnings.