[ovs-dev] [PATCH] ovn-northd: Only run idl loop if something changed.

Fri Dec 4 08:17:06 UTC 2015

On 3 December 2015 at 23:49, Numan Siddique <nusiddiq at redhat.com> wrote:
> On 12/04/2015 12:39 PM, Justin Pettit wrote:
>>> On Dec 3, 2015, at 10:55 PM, Ben Pfaff <blp at ovn.org> wrote:
>>>
>>> On Thu, Dec 03, 2015 at 05:11:49PM -0800, Joe Stringer wrote:
>>>> Before refactoring the main loop to reuse ovsdb_idl_loop_* functions, we
>>>> would use a sequence to see if anything changed in NB database to
>>>> compute and notify the SB database, and vice versa. This logic got
>>>> dropped with the refactor, causing a testsuite failure in the ovn-sbctl
>>>> test. Reintroduce the IDL sequence number checking.
>>>>
>>>> Fixes: 331e7aefe1c6 ("ovn-northd: Refactor main loop to use ovsdb_idl_loop_*
>>>> functions")
>>>> Suggested-by: Numan Siddique <nusiddiq at redhat.com>
>>>> Signed-off-by: Joe Stringer <joe at ovn.org>
>>> Acked-by: Ben Pfaff <blp at ovn.org>
>> I pushed this myself so we can branch for 2.5 with your Acked-by and Cascardo's Tested-by.
> Thanks Justin.
>
> I see the "send error: Broken pipe" warn logs [1] in the file - tests/testsuite.dir/1713/ovn-nb/ovsdb-server.log when I run the test case
> [ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR] at tests/ovn.at (Line no 842) and make it fail at the end by putting
> "AT_CHECK([echo hi], [1])" at line 1106 before clean up.

Thanks for the report. Does that mean that we're filtering the broken
pipe errors out in that test case? I'd expect the test case to fail if
these logs showed up, without needing someone to modify it.

> I tested this with out the refactored ovn-northd main loop code and I still see the same logs in the ovsdb-server.log
> Not sure what is the cause of this. May be the issue is somewhere else.

It sounds like this patch just makes the bug less likely (perhaps by
virtue of sending less transactions).

> Also with this patch, I can see the issue [2] again. I guess its not a major issue.

Are you able to tell us a bit more about your setup and why you're
able to reproduce it?

I have only been able to reproduce the broken pipe issue on build
systems like travis where the underlying CPU resource is shared with
other users, and I suspect they're using nested virtualization which
can exacerbate certain types of failures. I had been assuming that
this is related to why it fails.