[ovs-discuss] [IDL] unhandled/unexpected OVSDB reply to set_db_change_aware method

Dumitru Ceara dceara at redhat.com
Thu Jul 9 19:02:30 UTC 2020


On 7/9/20 3:40 PM, Dan Williams wrote:
> On Mon, 2020-07-06 at 23:57 -0700, Girish Moodalbail wrote:
>> Hello all,
>>
>> In the OVN Kubernetes project, we are seeing a ton of following
>> errors in
>> ovsdb-server-nb.log and ovsdb-server-sb.log.
>>
>> ---------8<------------8<------
>> 2020-07-07T06:30:53.320Z|20733|jsonrpc|WARN|unix#3145: send error:
>> Broken
>> pipe
>> 2020-07-07T06:30:53.320Z|20734|reconnect|WARN|unix#3145: connection
>> dropped
>> (Broken pipe)
>> ---------8<------------8<------
>>
>> (Note: The N in uniX#N used for unbound unix sockets is monotonically
>> increasing number. So, with unix#3145 there are more than 3000 such
>> errors
>> in the log file)
>>
>> I am going to be vague in explaining why it might be happening, so
>> sincere
>> apologies for that. So, this is what we are seeing and how to
>> reproduce
>> the issue.
>>
>> 1. Unhandled reply case
>> ========================
>>
>> (a) Client
>> ~~~~~~~~~~~
>> $  ovn-sbctl -vjsonrpc -vstream --no-leader-only
>> --db=unix:/var/run/openvswitch/ovnsb_db.sock get SB_Global .
>> options:e2e_timestamp
>> <output snipped>
>> 2020-07-
>> 07T04:54:22Z|00006|jsonrpc|DBG|unix:/var/run/openvswitch/ovnsb_db.soc
>> k:
>> send request, method="set_db_change_aware", params=[true], id=3
>> <output snipped>
>> "1594097605"  (<-- returned value of e2e_timestamp)
>>
>> (b) Server
>> ~~~~~~~~~~~
>> <output snipped>
>> 2020-07-07T06:30:53.320Z|20731|jsonrpc|DBG|unix#3145: received
>> request,
>> method="set_db_change_aware", params=[true], id=3
>> 2020-07-07T06:30:53.320Z|20732|jsonrpc|DBG|unix#3145: send reply,
>> result={}, id=3
>> 2020-07-07T06:30:53.320Z|20733|jsonrpc|WARN|unix#3145: send error:
>> Broken
>> pipe
>> 2020-07-07T06:30:53.320Z|20734|reconnect|WARN|unix#3145: connection
>> dropped
>> (Broken pipe)
>> <output snipped>
>>
>> So, the OVSDB server did try to send the reply. However, before it
>> could
>> send it through stream-fd.c`fd_send() the client program exits and
>> closed
>> the Unix Socket and we see broken pipe on the sender
>>

Hi Girish, Dan,

A bit as a side note, I wonder if there's actually a benefit from using
"set_db_change_aware" on clients that use "short lived" connections to
the DB. It does make sense for long lived connections though.

The problem here is that we always call ovsdb_idl_send_db_change_aware()
after monitor_cond_since(data). And the replies will also arrive in the
same order:
- first monitor_cond_since reply
- then set_db_change_aware reply

If the second one is delayed (e.g. network issues), the IDL might
process the monitor_cond_since(data) reply and the client (ovn-sbctl
will close the connection).

After a quick glance at the code I think it should be safe to first do
set_db_change_aware and afterwards monitor_cond_since(data).

>> 2. Unexpected reply case
>> =========================
>> (a) Client
>> ~~~~~~~~~~~
>> $ ovn-sbctl -v --no-leader-only
>> --db=unix:/var/run/openvswitch/ovnsb_db.sock get datapath_binding
>> 3b735851-db34-489a-a54c-8bb947431a74 external_ids:logical-switch
>>
>> <output snipped>
>> 2020-07-
>> 07T06:43:18Z|00021|ovsdb_idl|DBG|unix:/var/run/openvswitch/ovnsb_db.s
>> ock:
>> DATA_MONITOR_COND_SINCE_REQUESTED -> MONITORING at lib/ovsdb-
>> idl.c:764
>> 2020-07-
>> 07T06:43:18Z|00022|jsonrpc|DBG|unix:/var/run/openvswitch/ovnsb_db.soc
>> k:
>> received reply, result={}, id=3
>> 2020-07-
>> 07T06:43:18Z|00023|ovsdb_idl|DBG|unix:/var/run/openvswitch/ovnsb_db.s
>> ock:
>> received unexpected reply message: {"error":null,"id":3,"result":{}}
>> <output snipped>
>>
>>
>> (b) Server
>> ~~~~~~~~~~~
>> <output snipped>
>> 2020-07-07T06:43:18.402Z|21691|jsonrpc|DBG|unix#3281: received
>> request,
>> method="set_db_change_aware", params=[true], id=3
>> 2020-07-07T06:43:18.402Z|21692|jsonrpc|DBG|unix#3281: send reply,
>> result={}, id=3
>> <output snipped>
>>
>> So, in the 2nd case the Server's reply makes it all the way to the
>> Client,
>> but the client is not expecting it.
>>
>> So, looking at the lib/ovsdb-idl.c's ovsdb_idl_process_response()....
>>
>> ovsdb_idl_process_response() {
>>
>>     case IDL_S_SERVER_MONITOR_COND_REQUESTED:
>>         if (ok) {
>>             idl->server.monitoring = OVSDB_IDL_MONITORING_COND;
>>             ovsdb_idl_db_parse_monitor_reply(&idl->server, msg-
>>> result,
>>                                              OVSDB_IDL_MM_MONITOR_CON
>> D);
>>             if (ovsdb_idl_check_server_db(idl)) {
>>                 ovsdb_idl_send_db_change_aware(idl);
>>             }
>>         } else {
>>
>> }
>>
>> ... I see that we call the OVSDB set_db_change_aware method through
>> ovsdb_idl_send_db_change_aware(), but there is no state machine to
>> process
>> the reply for it. In the 1st case, the client exits early and
>> connection is
>> broken? Whilst, in the 2nd case the reply comes through but the code
>> is not
>> expecting it.

For case 2, it would be possible to handle the set_db_change_aware
result. As far as I see though the "unexpected reply message" DBG log in
the current implementation is quite benign. And the handler would
basically just ignore the reply because the reply is always the same:

https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb-server.7.rst#4118-database-change-awareness

> 
> I guess we're not seeing this at Red Hat because we're using
> monitor_all still... anyone have thoughts about this issue?
> 
> Dan
> 

For the first case (ovn-sbctl closing the connection too early), we
process messages from the server in batches [0] so we'd only hit the
issue if the reply for set_db_change_aware is delayed and is not
processed in the same batch as the reply for monitor_cond_since. Maybe
that's why we don't hit the issue too often.

Regards,
Dumitru

[0] https://github.com/openvswitch/ovs/blob/master/lib/ovsdb-idl.c#L923



More information about the discuss mailing list