[ovs-dev] Improved OVN CI

Tue May 19 00:13:54 UTC 2020

On 5/18/20 2:47 PM, aginwala wrote:
> Hi Mark:
> 
> Sounds a decent plan. However, I think regular patches should cover 
> system tests too. Let me know what make commands need to be inclusive 
> and I can give it a shot to enable them apart from make distchecks.

I was thinking that `make check-kernel` and `make 
check-system-userspace` would expand on the number of tests run. I'm not 
sure how feasible this actually is on travis systems though, because I 
don't know to what degree we have control over the test system's parameters.

> For 
> ovn-k8s, I think shouldn't we enable integration tests on ovn-k8s repo 
> where it can always use the latest ovn master and report failure? 

My thought process here was that we could more easily verify that a 
change in OVN caused a test failure in ovn-k8s. It would also make it 
easier to determine the specific OVN commit that caused the failure.

I also figure that running ovn-kubernetes tests means more test coverage 
of OVN code, meaning we're more likely to uncover bugs we may not have 
otherwise noticed.

There's nothing to prevent ovn-k8s from also doing what you suggest 
here. However, if I were setting up their tests, I'd either
a) Run against a consistent version of OVN. This way they ensure that if 
a test failure occurs, it's because of the recent change to ovn-k8s, not 
OVN.
b) Run against multiple candidate versions of OVN (e.g. OVN latest 
master, OVN 20.03.0, etc.). This way, they can determine if tests only 
fail against certain versions of OVN, and it also again helps to 
determine if the test failures are due to an ovn-k8s change or an OVN 
change.

> Also 
> using multiple ovs versions is ok but I don't think for regular patch 
> merge, we need to run. Maybe we can run those when we do a new release. 
> What do you think?

That's the point of continuous integration though, right? :)

If we wait until release time, we're now testing a lot of changes in the 
hopes that everything works. By testing one commit at a time, we can 
catch the problem early. It's much easier to fix the problem when it's 
new and has no other commits made on top of it.

> 
> On Fri, Mar 27, 2020 at 11:24 AM Mark Michelson <mmichels at redhat.com 
> <mailto:mmichels at redhat.com>> wrote:
> 
>     Hi everyone,
> 
>     I've taken some time recently to look into the current CI for OVS/OVN,
>     and I think there is room for improvement. Here I will outline a 4 part
>     plan for improving CI for OVN.
> 
> 
>     Part 1: Report travis build failures as replies to patch submissions.
> 
>     Currently, 0-day robot will respond to patches that fail to apply or
>     fail to pass checkpatch.py. However, builds are handled by travis and
>     reported out-of-band to the ovs-build list. By having build failures
>     reported in response to the patch, it makes it much easier to correlate
>     results to a patch series and does not require signing up for a
>     separate
>     mailing list.
> 
>     I suspect this will lead to some noise at first as some tests may need
>     to be stabilized. If we are diligent about fixing such volatile tests,
>     then this should reduce the noise after a while. Once we reach a point
>     where the signal-to-noise ratio is acceptable, we can move on to...
> 
> 
>     Part 2: Expand testing done by travis.
> 
>     Travis runs `make distcheck` now. This could be expanded to run the
>     following:
>     * system tests
>     * ovn-kubernetes tests
>     * Run tests against all supported versions of OVS
>     CI could also be expanded to run some scale tests, but that likely
>     would
>     need to be done outside of travis.
> 
>     Like with part 1, this likely will be noisy as more tests are added.
>     Again, though, as we fix these problems, it will lead to a more stable
>     suite of tests. We can then move on to...
> 
> 
>     Part 3: Correlate tests with patchwork checks
> 
>     Patchwork has "checks" that can be set on submissions which correspond
>     to external tests. The tests we add in parts 1 and 2 can correspond to
>     patchwork checks. The checks can provide an easily-referenced way to
>     determine the status of tests for a given patch series. Adding
>     checks to
>     patchwork also lays the groundwork for Part 4.
> 
> 
>     (Note: It may make sense to switch parts 2 and 3. We may want to
>     establish patchwork checks prior to adding more tests)
> 
> 
>     Part 4: Gating
> 
>     If we have a stable and extensive testsuite, then we can attempt to
>     enforce gating on patch submissions. In other words, patches that do
>     not
>     pass all checks cannot be pushed. This can be enforced in a few ways
> 
>     * Convention: We regulate ourselves. We agree as a community to only
>     push changes if tests have passed, and we refuse to push patches that
>     have failed CI.
>     * Automation: The bot that monitors patchwork for new submissions and
>     starts CI could potentially merge patches that have received ACKs and
>     have passed CI. Humans would not push patches themselves.
>     * Push hook: When pushing patches, if patchwork CI checks have not
>     passed (and if the patch has not been ACKed), then the hook would
>     reject
>     the push.
>     * Pull Requests: Switching from mailing list patches to pull requests
>     would allow for github to enforce gating.
> 
>     In any of the above cases, there will be ways for committers to
>     override
>     the system if absolutely necessary (e.g. CI false positives)
> 
> 
>     What are your thoughts on this expansion of CI for OVN? Do you have
>     ideas for other areas where it can be expanded? What are your thoughts
>     on gating in particular?
> 
>     Thanks for your feedback,
>     Mark Michelson
> 
>     _______________________________________________
>     dev mailing list
>     dev at openvswitch.org <mailto:dev at openvswitch.org>
>     https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>