Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
15 Views

cluster stop working BUG caused by cluster_manager !!Double master!!

### summary:

cluster_manager stop working and whole cluster stopped service.

### environment:

CentOS7.6
mcu server: 4.3.x

### steps:

1. two cluster_managers: A for master, B for slave

2. network flash,B became master,

3. network recover, A & B both master, B life_time is shorter, so B process exit.

4. rabbitmq queue "owt-cluster" deleted. so A can not received any message from queue "owt-cluster", all other service lost connection with cluster_manager

### reason:

amqp_client.js:157

```

handler.close = function() { request_q && request_q.destroy(); request_q = undefined; exc && exc.destroy(true); exc = undefined; };

```

the handler.close will called if cluster_manager process exited by calling "process.exit(1);"
and request_q.destroy() will delete the queue even if the other cluster_manager still connected on it.

### manually reproduce:

start master cluster_manager A.

modify slave cluster_manager B file: clusterManager.js at last lines for simulate

```

exports.run = function (topicChannel, clusterName, id, spec) { var manager = new ClusterManager(clusterName, id, spec); runAsCandidate(topicChannel, manager); //just using runAsMaster(topicChannel, manager); instead };

```

start cluster_manager B.

cluster_manager B detected "Double master" and exit.

cluster_manager A stopped service.

 

### suggestion:
request_q.destroy(); using option: request_q.destroy({ifUnused : true});

Tags (2)
0 Kudos