[ovs-discuss] [IDL] unhandled/unexpected OVSDB reply to set_db_change_aware method

Dumitru Ceara dceara at redhat.com
Fri Jul 10 11:39:32 UTC 2020


On 7/9/20 9:02 PM, Dumitru Ceara wrote:
> On 7/9/20 3:40 PM, Dan Williams wrote:
>> On Mon, 2020-07-06 at 23:57 -0700, Girish Moodalbail wrote:
>>> Hello all,
>>>
>>> In the OVN Kubernetes project, we are seeing a ton of following
>>> errors in
>>> ovsdb-server-nb.log and ovsdb-server-sb.log.
>>>
>>> ---------8<------------8<------
>>> 2020-07-07T06:30:53.320Z|20733|jsonrpc|WARN|unix#3145: send error:
>>> Broken
>>> pipe
>>> 2020-07-07T06:30:53.320Z|20734|reconnect|WARN|unix#3145: connection
>>> dropped
>>> (Broken pipe)
>>> ---------8<------------8<------
>>>
>>> (Note: The N in uniX#N used for unbound unix sockets is monotonically
>>> increasing number. So, with unix#3145 there are more than 3000 such
>>> errors
>>> in the log file)
>>>
>>> I am going to be vague in explaining why it might be happening, so
>>> sincere
>>> apologies for that. So, this is what we are seeing and how to
>>> reproduce
>>> the issue.
>>>
>>> 1. Unhandled reply case
>>> ========================
>>>
>>> (a) Client
>>> ~~~~~~~~~~~
>>> $  ovn-sbctl -vjsonrpc -vstream --no-leader-only
>>> --db=unix:/var/run/openvswitch/ovnsb_db.sock get SB_Global .
>>> options:e2e_timestamp
>>> <output snipped>
>>> 2020-07-
>>> 07T04:54:22Z|00006|jsonrpc|DBG|unix:/var/run/openvswitch/ovnsb_db.soc
>>> k:
>>> send request, method="set_db_change_aware", params=[true], id=3
>>> <output snipped>
>>> "1594097605"  (<-- returned value of e2e_timestamp)
>>>
>>> (b) Server
>>> ~~~~~~~~~~~
>>> <output snipped>
>>> 2020-07-07T06:30:53.320Z|20731|jsonrpc|DBG|unix#3145: received
>>> request,
>>> method="set_db_change_aware", params=[true], id=3
>>> 2020-07-07T06:30:53.320Z|20732|jsonrpc|DBG|unix#3145: send reply,
>>> result={}, id=3
>>> 2020-07-07T06:30:53.320Z|20733|jsonrpc|WARN|unix#3145: send error:
>>> Broken
>>> pipe
>>> 2020-07-07T06:30:53.320Z|20734|reconnect|WARN|unix#3145: connection
>>> dropped
>>> (Broken pipe)
>>> <output snipped>
>>>
>>> So, the OVSDB server did try to send the reply. However, before it
>>> could
>>> send it through stream-fd.c`fd_send() the client program exits and
>>> closed
>>> the Unix Socket and we see broken pipe on the sender
>>>
> 
> Hi Girish, Dan,
> 
> A bit as a side note, I wonder if there's actually a benefit from using
> "set_db_change_aware" on clients that use "short lived" connections to
> the DB. It does make sense for long lived connections though.
> 
> The problem here is that we always call ovsdb_idl_send_db_change_aware()
> after monitor_cond_since(data). And the replies will also arrive in the
> same order:
> - first monitor_cond_since reply
> - then set_db_change_aware reply
> 
> If the second one is delayed (e.g. network issues), the IDL might
> process the monitor_cond_since(data) reply and the client (ovn-sbctl
> will close the connection).
> 
> After a quick glance at the code I think it should be safe to first do
> set_db_change_aware and afterwards monitor_cond_since(data).
> 

I went ahead and sent a patch for this operation order change:

https://patchwork.ozlabs.org/project/openvswitch/patch/1594380801-32134-1-git-send-email-dceara@redhat.com/

I tested it locally and it works fine. I had to simulate a delay in
processing the set_db_change_aware request on the server side such that
the IDL client (ovn-sbctl) closes the connection before the server has a
chance to reply to set_db_change_aware. With the patch applied, the
server will always send the reply to set_db_change_aware before sending
the reply to monitor_cond_since(data).

I couldn't find a way to add a unit test for this scenario though. Ideas
are welcome.

Thanks,
Dumitru



More information about the discuss mailing list