[ovs-dev] [PATCH v9 ovn] ovn-northd: introduce new allow-stateless ACL verb

Fri May 7 23:05:42 UTC 2021

On Fri, May 7, 2021 at 12:54 AM Dumitru Ceara <dceara at redhat.com> wrote:
>
> On 5/7/21 7:35 AM, Han Zhou wrote:
> > Hi Ihar,
> >
> > Thanks for the patch, and it's great to see the significant performance
> > improvement!
> > Please see my comments below.
> >
> > On Thu, May 6, 2021 at 7:35 AM Ihar Hrachyshka <ihrachys at redhat.com>
wrote:
> >>
> >> For allow-stateless ACLs, bypass connection tracking by avoiding
> >> setting ct hints for matching traffic. Avoid sending all traffic to ct
> >> when a stateful ACL is present.
> >>
> >> ===
> >>
> >> Reusing an existing 'allow' verb for stateless matching would have its
> >> drawbacks, specifically, when it comes to backwards incompatibility of
> >> the new behavior with existing environments. When using "allow" ACLs
> >> in mixed allow/allow-related environment, we still commit "allow"
> >> traffic to conntrack. This unnecessarily hits performance when mixed
> >> ACL action types were used for the same datapath. This is why we
> >> introduce a new action verb to describe stateless behavior.
> >>
> >> Another complexity to consider is the fact that with stateless
> >> matching, one would not be able to rely on 'related' magic that
> >> guarantees that reply traffic is passed through. Instead, the user
> >> would have to accurately define matching rules both for request and
> >> reply directions of a protocol session. Specifically, when allowing
> >> ICMP for a specific peer host, one has to define 'allow-stateless'
> >> rules that would match against ip.dst for request direction and ip.src
> >> for reply direction. Other protocols and scenarios will require their
> >> own fine grained matching approaches implemented by the user.
> >>
> >> ===
> >>
> >> For performance measurements, qperf was used. Tests were executed on
two
> >> setups:
> >>
> >> 1) ovn-fake-multinode virtual environment;
> >> 2) Supermicro SYS-5039MS-H8TRF 3-node RH-OSP 16.x cluster with 2
compute
> >>    nodes, NIC: "Intel Corporation Ethernet Controller XXV710 for 25GbE
> >>    SFP28 (rev 02)", using fake_nova_driver to avoid qemu overhead.
> >>
> >> 1) ovn-fake-multinode:
> >>
> >> Performance measured between two virtual nodes, two ports
> >> that belong to different LSs connected via router. Using qperf,
> >> performance was measured for UDP, TCP, SCTP protocols (using
> >> <proto>_lat and <proto>_bw tests). The qperf version used:
> >> 0.4.9-16.fc31.x86_64.  Each test scenario was executed five times and
> >> averages compared.
> >>
> >> Tests were executed with `allow-stateless` rules for the tested
> >> protocol and `allow-related` for another protocol set for both ports,
> >> both directions, e.g. for TCP scenario, the following ACLs were
> >> defined:
> >>
> >> ovn-nbctl acl-add sw0 to-lport 100 tcp allow-stateless
> >> ovn-nbctl acl-add sw0 from-lport 100 tcp allow-stateless
> >> ovn-nbctl acl-add sw1 to-lport 100 tcp allow-stateless
> >> ovn-nbctl acl-add sw1 from-lport 100 tcp allow-stateless
> >>
> >> ovn-nbctl acl-add sw0 to-lport 100 sctp allow-related
> >> ovn-nbctl acl-add sw0 from-lport 100 sctp allow-related
> >> ovn-nbctl acl-add sw1 to-lport 100 sctp allow-related
> >> ovn-nbctl acl-add sw1 from-lport 100 sctp allow-related
> >>
> >> In this particular environment, improvement was seen in send_bw,
> >> latency, and msg_rate measurements, where applicable, for all three
> >> protocols under test.
> >>
> >> for UDP, send_bw: 293.6 MB/sec => 313.2 MB/sec (+6.68%)
> >>          latency: 16 us => 14.08 us (-12%)
> >>          msg_rate: 62.56 K/sec => 71.06 K/sec (+13.59%)
> >>
> >> for TCP, latency: 18.6 us => 14.88 us (-20%)
> >>          msg_rate: 53.8 K/sec => 67.28 K/sec (+25.06%)
> >>
> >> for SCTP, latency: 21.98 us => 19.42 us (-11.65%)
> >>           msg_rate: 45.58 K/sec => 51.54 K/sec (+13.08%)
> >>
> >> Interestingly, some performance improvement was also seen for the same
> >> scenarios with no ACLs set at all, albeit significantly more
> >> negligible.
> >>
> >> for UDP, send_bw: 320.0 MB/sec => 338.6 MB/sec (+5.81%)
> >>          latency: 13.74 us => 12.88 us (-6.68%)
> >>          msg_rate: 73.02 K/sec => 77.84 K/sec (+6.6%)
> >>
> >> for TCP, latency: 15.62 us => 14.26 us (-9.54%)
> >>          msg_rate: 64.02 K/sec => 70.26 K/sec (+9.75%)
> >>
> >> for SCTP, latency: 19.56 us => 18.16 us (-7.16%)
> >>           msg_rate: 51.16 K/sec => 55.12 K/sec (+7.74%)
> >>
> >> 2) Supermicro RH-OSP cluster:
> >>
> >> Two scenarios executed:
> >> - connectivity between two ports on the same switch
> >> - connectivity between two ports on different switches connected via
> >>   router
> >>
> >> For the same switch, improvements are as follows:
> >> - TCP latency: -5.3%, UDP latency: -13.5%, SCTP latency: -10.8%
> >> - TCP bw: +0.8%, UDP bw, +12.25%, SCTP bw: +21.29%
> >>
> >> For different switches, improvements are as follows:
> >> - TCP latency: -6%, UDP: -11.2%, SCTP: -0.2%
> >> - TCP bw: -0.7%, UDP bw: +2%, SCTP bw: +9.9%
> >>
> >> The effect is more noticeable in same switch scenario.
> >
> > This is interesting. Any idea why with different switches the result is
so
> > different? I'd expect similar improvement just like in same switch,
because
> > in this kind of test the flow should be cached in kernel datapath
> > (regardless of same/different logical switches) and in both cases the
cost
> > of traversing conntrack is saved.
> >
> >>
> >> Comparable numbers can be captured with iperf. It may be useful to run
> >> more tests in a more elaborate (bare metal) environment.
> >>
> >> ===
> >>
> >> The patch takes inspiration from a now abandoned patch:
> >>
> >> "ovn-northd: Support mixing stateless/stateful ACLs with
> >> Stateless_Filter." by Dumitru Ceara.
> >>
> >> The original patch assumed CMS doesn't require full flexibility of
> >> matching rules for stateless matching (for example, to be used by
> >> OpenShift). But other CMS interfaces may require the same
> >> customizability for stateless as well as stateful matching, like in
> >> OpenStack Neutron API. Which is why this patch reuses existing ACL
> >> object type to describe stateless rules.
> >>
> >> Signed-off-by: Ihar Hrachyshka <ihrachys at redhat.com>
> >>
> >> ---
> >>
> >> v1: initial version.
> >> v2: rebased after conflict.
> >> v3: added ddlog implementation.
> >> v3: apply stateless skip-hint-set rules to appropriate direction only.
> >> v3: added more background as to implementation in commit message.
> >> v3: test stateless scenarios with ddlog too.
> >> v3: rebased after conflict.
> >> v4: introduce a separate allow-stateless ACL match verb.
> >> v5: rebased.
> >> v6: updated docs for new allow-stateless approach.
> >> v6: removed no longer valid comments.
> >> v6: removed acl_is_stateless.
> >> v7: bump north db schema version to 5.31.0 -> 5.32.0.
> >> v8: fixed checkpatch failure on a too long line in dbschema.
> >> v8: added perf data on baremetal lab testing.
> >> v9: fixed ovsdb checksum.
> >> ---
> >>  NEWS                    |   2 +
> >>  northd/ovn-northd.8.xml |   8 +-
> >>  northd/ovn-northd.c     |  65 ++++++++-
> >>  northd/ovn_northd.dl    |  31 ++++
> >>  ovn-nb.ovsschema        |   9 +-
> >>  ovn-nb.xml              |   9 +-
> >>  tests/ovn-northd.at     | 309 ++++++++++++++++++++++++++++++++++++++++
> >>  utilities/ovn-nbctl.c   |   6 +-
> >>  8 files changed, 428 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/NEWS b/NEWS
> >> index 1ddde15f8..d72139e76 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -11,6 +11,8 @@ Post-v21.03.0
> >>      'use_parallel_build' to enable it.  It is disabled by default.
> >>    - Support vlan-passthru mode for tag=0 localnet ports.
> >>    - Support custom 802.11ad EthType for localnet ports.
> >> +  - Introduce a new "allow-stateless" ACL verb to always bypass
> > connection
> >> +    tracking. The existing "allow" verb behavior is left intact.
> >>
> >>  OVN v21.03.0 - 12 Mar 2021
> >>  -------------------------
> >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> >> index 54e88d3fa..08c4ea852 100644
> >> --- a/northd/ovn-northd.8.xml
> >> +++ b/northd/ovn-northd.8.xml
> >> @@ -419,7 +419,9 @@
> >>        before eventually advancing to ingress table <code>ACLs</code>.
If
> >>        special ports such as route ports or localnet ports can't use
> > ct(), a
> >>        priority-110 flow is added to skip over stateful ACLs. IPv6
> > Neighbor
> >> -      Discovery and MLD traffic also skips stateful ACLs.
> >> +      Discovery and MLD traffic also skips stateful ACLs. For
> > "allow-stateless"
> >> +      ACLs, a flow is added to bypass setting the hint for connection
> > tracker
> >> +      processing.
> >
> > This approach clearly achieves the goal of bypassing conntrack and
there is
> > great dataplane performance gain, but it also doubles the ACL logical
flows
> > which in turn could significantly increase the control plane cost,
> > especially in ovn-controller. In large scale environments where lots of
> > ACLs referencing large port-group/address-sets are used, it is already a
> > bottleneck of ovn-controller flow computing. It would be better if we
could
> > avoid doubling this cost. Instead of directly add the ACL match here,
can
> > we set a new hint (e.g. REGBIT_BYPASS_CT = 1), and in the later stage
> > pre-stateful add a higher priority flow, if this hint is matched then
> > directly "next" without going through CT? I hope this can achieve the
same
> > goal but not increasing control plane cost. What do you think?
> >
>
> Wouldn't this break ACL priority?  E.g., with the following ACLs:
>
> ACL1: from-lport, prio=100, ip.dst=1.1.1.1, allow-stateless
> ACL2: from-lport, prio=50, ip, drop
>
> If we don't add the lflow for ACL1 in the ls_in_acl stage then traffic
> will be dropped by the lflow added for ACL2.
>
You are right. Actually I was thinking about matching the hint again in acl
stage and just move next, but it would skip all the stateful ACLs that
could have a higher priority. This would work only if the stateless ACLs
can always be treated as higher priority than stateful ACLs, but it seems
unreasonable. It seems all the ACL matches have to be evaluated at the same
stage to ensure the priorities are followed. So, forget about this proposal.

> On the other hand, IIUC, this feature is designed to be used in
> combination with "allow-related" ACLs defined on the same switch,
> replacing "allow" ACLs in that case.
>
> Before this change an "allow" ACL would've been treated as
> "allow-related" if there was at least one other "allow-related" ACL
> applied on the logical switch.  Essentially generating 2 logical flows
> per "allow" ACL:
>
> lflow1: "REGBIT_ACL_HINT_ALLOW_NEW==1 && acl.match" then
> "REGBIT_CONNTRACK_COMMIT=1, next"
> lflow2: "REGBIT_ACL_HINT_ALLOW==1 && acl.match" then "next"
>
> If I'm not wrong, changing this ACL to "allow-stateless" would also
> generate 2 logical flows, so at least from a number of logical flows
> perspective, the result is the same.
>
You are right. But this also means that when there is no stateful rules,
"allow-stateless" is worse than "allow" because "allow" would generate only
a single lflow in that case, while "allow-stateless" always generates
doubled lflows. So, I think at least we could do the simple improvement to
make "allow-stateless" perform the same as "allow" when there are NO
stateful ACLs. (Otherwise, my comment on the documentation regarding
'"allow-stateless" is equivalent to "allow" when there are no stateful
ACLs' would be wrong). We can move the logic of adding the stateless
matching flows of pre-acl stage into a if-block (only when has_stateful is
true). Would this work?

Thanks,
Han