[ovs-dev] ovsdb failures in cluster mode in IPv6 setup

Riccardo Ravaioli rravaiol at redhat.com
Thu Oct 7 12:48:57 UTC 2021


If it helps, I did a packet capture in both IPv4 and IPv6 setups:
- in IPv4, I see the remove_server_request and remove_server_reply messages
for the deleted db server;
- in IPv6, these messages do not appear. I see them neither on the affected
node nor on a different node where another db server instance runs.

Riccardo

On Wed, Oct 6, 2021 at 10:02 PM Riccardo Ravaioli <rravaiol at redhat.com>
wrote:

> Hi all,
>
> I have an issue with ovsdb in cluster mode when an instance of a db server
> fails.
>
> I'm running a HA single-stack IPv6 ovn-kubernetes Kind cluster, where we
> have ovnnb_db and ovnsb_db replicated on three nodes. All control traffic
> is IPv6.
> Then I take one node, I delete the db files, and I also delete the pod
> itself that holds the db server, so as to simulate a node failure.
> The pod is recreated as well as the db files, but "ovs-appctl
> cluster/status OVN_Northbound" still shows the *old* server instance, along
> with the new one.
>
> Indeed, when I look at the ovsdb-server-nb debug logs on the affected
> node, I see that it is still receiving heartbeat messages to both the new
> server (to which it correctly replies) and the old now (for which it raises
> an error: "syntax error: Parsing raft append_request RPC failed: misrouted
> message (addressed to 0227 but we're bcda").
>
> On the other hand, in an HA single-stack IPv4 cluster, everything works as
> expected:
> 1) during a few tens of seconds, the cluster/status command from above
> shows the old and the new server, as in the ipv6 case;
> 2) then, the old server is removed, as the new one is correctly added to
> the cluster.
>
> This is confirmed in ovsdb-server-nb.logs, where I see the
> remove_server_request and remove_server_reply messages.
>
> However, in a HA IPv6 cluster, I keep seeing 4 servers and no
> "remove_server_*" messages in the logs...  so it's stuck in the first point
> from above.
>
> Is this a bug? Is there anything I can do to debug this further?
>
> Thanks!
>
> Riccardo
>


More information about the dev mailing list