[ovs-dev] [PATCH v3] ovn-nbctl: Fix the ovn-nbctl test "LBs - daemon" which fails during rpm build
blp at ovn.org
Wed Oct 31 20:59:09 UTC 2018
On Thu, Oct 25, 2018 at 03:27:41PM +0530, nusiddiq at redhat.com wrote:
> From: Numan Siddique <nusiddiq at redhat.com>
> When 'make check' is called by the mock rpm build (which disables networking),
> the test "ovn-nbctl: LBs - daemon" fails when it runs the command
> "ovn-nbctl lb-add lb0 220.127.116.11a 192.168.10.10:80,192.168.10.20:80". ovn-nbctl
> extracts the vip by calling the socket util function 'inet_parse_active()',
> and this function blocks when libunbound function ub_resolve() is called
> further down. ub_resolve() is a blocking function without timeout and all the
> ovs/ovn utilities use this function.
> As reported by Timothy Redaelli, the issue can also be reproduced by running
> the below commands
> $ sudo unshare -mn -- sh -c 'ip addr add dev lo 127.0.0.1 && \
> mount --bind /dev/null /etc/resolv.conf && runuser $SUDO_USER'
> $ make sandbox SANDBOXFLAGS="--ovn"
> $ ovn-nbctl -vsocket_util:off lb-add lb0 18.104.22.168a \
> To address this issue, this patch adds a new function - inet_parse_ip_addr_and_port()
> which expects IP:[port] address in the 'target_' argument and disables resolving
> the host. This new function is now used in ovn-northd, ovn-nbctl and ovn-trace.
> It is fine to use this function as load balancer VIP cannot be a hostname.
> Reported-by: Timothy Redaelli <tredaelli at redhat.com>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1641672
> Tested-by: Timothy Redaelli <tredaelli at redhat.com>
> Signed-off-by: Numan Siddique <nusiddiq at redhat.com>
I have multiple thoughts here.
First, if the resolver in OVS never times out, then that seems like a
bug in the OVS resolver. Yifeng, you wrote the DNS code. Is it true
that it never times out? If so, should we fix that.
Second, about the mock RPM build with disabled networking. Does this
environment have a /etc/resolv.conf that specifies a DNS server? If it
does, then that seems like a bug in the build environment. If it does
not, then that seems like a bug in our DNS resolver code, because DNS
resolution should immediately fail if no DNS servers are available.
Third, again about naming. If we are going to have two functions that
act similarly, with the only difference being that one resolves DNS
names and the other does not, then the naming should reflect that
clearly. It still isn't obvious to me with the new names.
More information about the dev