OAM server Coherence node fails to join the Coherence cluster !!

Dec

2018

Posted in Oracle Fusion DBA, Performance Tuning, Uncategorized No comments

Hi Guys,

In today’s post will see the troubleshooting of Coherence Cluster where OAM server coherence node not able to join the Coherence Cluster.

Issue:

In one server, I installed OID (12c)and OAM(12c) and created separate domains, but OID domain is in production and OAM domain is in Development mode.

Now, when I start OID Admin, Managed servers it starts successfully and when I start OAM managed server it’s throwing below error.

Error:

2.1.3.0 <Error> (thread=Cluster, member=n/a): This member could not join the cluster because of a mismatch between Coherence license types. This member was attempting to run in dev mode. Rejected by Member(Id=1, Timestamp=2018-12-23 12:09:04.483, Address=192.168.0.150:51939, MachineId=8191, Location=site:hussain.net,machine:oidhost1,process:2097,member:wls_ods1, Role=WeblogicServer).>
<Dec 23, 2018 12:41:06,852 PM GST> <Error> <com.oracle.coherence> <BEA-000000> <2018-12-23 12:41:06.852/22.348 Oracle Coherence GE 12.2.1.3.0 <Error> (thread=[ACTIVE] ExecuteThread: ‘0’ for queue: ‘weblogic.kernel.Default (self-tuning)’, member=n/a): Error while starting cluster: java.lang.RuntimeException: Failed to start Service “Cluster” (ServiceState=SERVICE_STOPPED, STATE_JOINING)
at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:38)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)
at com.tangosol.coherence.component.net.Cluster.startSystemServices(Cluster.CDB:4)
at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:53)
at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:12)
at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:4)
at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:10)
at com.tangosol.coherence.component.util.SafeCluster.ensureRunningCluster(SafeCluster.CDB:32)
at com.tangosol.coherence.component.util.SafeCluster.getRunningCluster(SafeCluster.CDB:7)
at com.tangosol.coherence.component.util.SafeCluster.start(SafeCluster.CDB:5)
at com.tangosol.net.CacheFactory.ensureCluster(CacheFactory.java:580)
at weblogic.cacheprovider.coherence.EnsureClusterService.initialize(EnsureClusterService.java:51)
at weblogic.cacheprovider.coherence.EnsureClusterService.start(EnsureClusterService.java:34)
at weblogic.server.AbstractServerService.postConstruct(AbstractServerService.java:76)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

Solution:

After doing investigation came to know like below.

Cause:

The problem is, there are multiple products installed on the same machine which are using the same cluster name, but with differing configurations. In this case, the OID installation is using the production mode, while there is a OAM server cluster which is configured to use default Development mode. Since both installations are using the same cluster name their instances try to form a single shared cluster but the difference in cluster mode triggers the error. The principle differences between the production and development modes is the length of time a node will attempt to join a cluster before aborting.

Fix:

Since there is no reason for the OIDand OAM installations to share the same cluster, which might introduce unwanted dependencies, the solution is to configure one to use a different cluster. In this case, since the problem is first seen with the installation, a new cluster should be created in the OAM installation so it can be used instead of the defaultCoherenceCluster.

Please follow the below action plan to resolve the issue:

Shutdown all servers in OAM domain.
Create a coherence cluster similar to the existing defaultCoherenceCluster using the Weblogic console of ODI domain. Go through all the tabs in the existing defaultCoherenceCluster and create a new cluster with a different name, for example, defaultCoherenceCluster1, using the same settings as the defaultCoherenceCluster.
Target the OAM servers to new Coherence cluster instead of the defaultCoherenceCluster.
Delete the defaultCoherenceCluster.
Shutdown the OAM server instance and then start the OAM admin and managed servers.\