[ovs-dev] [OVN][nbctld][bug] ovn-nbctl daemon hits an infinite loop?

Dumitru Ceara dceara at redhat.com
Mon Dec 21 16:00:13 UTC 2020


On 12/18/20 12:47 PM, Dumitru Ceara wrote:
> On 12/17/20 7:55 PM, Ben Pfaff wrote:
>> On Thu, Dec 17, 2020 at 08:54:56AM -0800, Girish Moodalbail wrote:
>>> Hello all,
>>>
>>> Say, ovn-nbctl is started in daemon mode with options set for certs, and
>>> those certs do not exist on the file system. For example, in the following
>>> invocation assume that `/ovn-cert` folder is empty
>>>
>>> ovn-nbctl -vconsole:dbg --pidfile=/tmp/ovn-nbctl.pid --db=ssl:10.0.64.7:6641
>>> ,ssl:10.0.64.6:6641,ssl:10.0.64.4:6641 --log-file=/tmp/ovn-nbctl.log
>>> --detach -p /ovn-cert/ovncontroller-privkey.pem -c
>>> /ovn-cert/ovncontroller-cert.pem -C /ovn-cert/ca-cert.pem
>>>
>>> Now, if we run a command against that daemon via....
>>>
>>> ovs-appctl -t /var/run/ovn/ovn-nbctl.32254.ctl list-commands
>>
>> [...]
>>
>>> This is my theory. In ovn-nbctl.c`server_loop(), we have this infinite loop
>>>
>>>     for (;;) {
>>>         if (ovsdb_idl_has_ever_connected(idl)) {
>>>             daemonize_complete();
>>>             unixctl_server_run(server);
>>>         }
>>>         ovsdb_idl_wait(idl);
>>>         unixctl_server_wait(server);
>>>         poll_block();
>>>     }
>>>
>>> Since ovsdb_idl_has_ever_connected()  is not true due to missing certs, we
>>> never get a chance to run the command from ovs-appctl and then poll_block()
>>> will return immediately and we enter an infinite loop?
>>
>> (The above is a partial snippet, there's actually more in the loop.)
>>
>> It's always an infinite loop, it's just that it wastes CPU in that case.
>> I think that you're right about the cause.  I think we should only call
>> unixctl_server_wait() if we'd call unixctl_server_run(), so the right
>> think to do appears to be move the unixctl_server_wait() call into the
>> "if" condition.
> 
> In that case an "ovn-appctl -t ... <command>" will just block until the
> IDL connects at least once.
> 
> Instead, would there be a concern with calling unixctl_server_run()
> unconditionally?
> 
> This would allow the users to actually interact with the nbctl daemon
> and, for example, gracefully stop it if it can't connect for whatever
> reason:
> 
> # Start ovn-nbctl daemon without first starting the NB DB:
> # This blocks because the IDL cannot connect.
> export OVN_NB_DAEMON=$(ovn-nbctl --detach)
> 
> # In a different terminal, enable debug logs, exit, etc.
> ovn-appctl -t /var/run/ovn/ovn-nbctl.18042.ctl vlog/set dbg
> ovn-appctl -t /var/run/ovn/ovn-nbctl.18042.ctl exit
> 

I went ahead and sent a patch in this direction:

http://patchwork.ozlabs.org/project/ovn/patch/1608566295-1324-1-git-send-email-dceara@redhat.com/

Regards,
Dumitru



More information about the dev mailing list