[ovs-dev] [PATCH 00/25] netdev datapath vxlan offload

Sriharsha Basavapatna sriharsha.basavapatna at broadcom.com
Mon Feb 3 17:05:09 UTC 2020


Hi Eli,

Thanks for sending this patchset. I have some questions about the
design, please see my comments below.

Thanks,
-Harsha

On Mon, Jan 20, 2020 at 8:39 PM Eli Britstein <elibr at mellanox.com> wrote:
>
> In the netdev datapath, packets arriving over a tunnel are processed by
> more than one flow. For example a packet that should be decapsulated and
> forwarded to another port is processed by two flows:
> 1. in_port=<PF>, outer-header matches, dst-port=4789 (VXLAN),
>    actions:tnl_pop(vxlan_sys_4789).
> 2. in_port=vxlan_sys_4789, tunnel matches (VNI for example),
>    inner-header matches, actions: vm1.
>
> In order to offload such a multi-flow processing path, a multi-table HW
> model is used.

Did you explore other ways to avoid this multi-flow processing model ?
For example, may be support some additional logic to merge the two
datapath flow-rules for decap into a single flow-rule in the offload
layer ?

> Flow #1 matches are offloaded as is, but tnl_pop is
> translated to MARK and JUMP to a tnl-port table actions.

The response to tnl_pop action is now different between non-offload
and offload datapaths, since offload layer changes it into entirely
different
actions. The user would expect consistent response to tnl_pop action,
especially if you treat this flow rule as an independent rule by
itself (i.e,
assuming there's no flow #2 subsequently).

Regarding the JUMP action:
IMO, offload framework should avoid anything that is too vendor
specific; and make use of simple/common abstractions. I'm not sure if
JUMP action is really needed to support offload of tnl_pop action.
Also, it may not map to the real use case for which JUMP action seems
to have been defined, which is to support flow-grouping (see
rte_flow.h).

> Flow #2 is
> offloaded to that tnl-port table, its tunnel matches are translated to
> outer-header matches, followed by the inner-header ones. The actions
> start with an implicit DECAP action followed by the flow's actions.
> The DECAP is done in flow #2. If it was done in flow #1, the outer
> headers of the packets would have been lost, and they could not have
> been matched in flow #2.

The only missing match field that would allow decap on flow #1 itself,
seems to be the tunnel-id. If that is provided in flow #1, then we may
not need to match it again in flow #2. And flow #2 could just match on
the inner packet headers. This would also make handling the flow-miss
simple, since you wouldn't need the recovery logic (offload layer
popping the header etc). The packet would be in the right state for
the next stage of processing in SW, if there's a  flow-miss after the
first stage in HW. It also means tnl_pop can be done in flow #1,
making the behavior consistent with non-offload processing.

The other advantage of providing the tunnel-id in flow #1 is that,
then we wouldn't need to match on the entire packet including the
outer-headers in the second stage and only the inner header fields
would need to be matched in flow #2.

> Furthermore, once the HW processing path is composed of a multi table
> processing, there is a need to handle a case where the processing starts
> in HW, but is not completed in HW. For example, only flow #1 is
> offloaded while flow #2 doesn't exist. For that, the MARK value set in
> the tnl_pop action is received in SW, the packet is recovered by the
> mapped vport, and processing continues in SW.

If it is not possible to provide tunnel-id in flow #1, then the other
alternative might be to defer offloading until flow #2 and offload
only flow #2. You are anyway matching almost the entire outer-headers
and the inner-header in flow #2 and the decap action is also added
implicitly to flow #2. Why not totally avoid offloading flow #1 ? That
way, none of the miss handling code is required.

> As vports are virtual and cannot have rte_flows, offloading a vport
> generates rte_flows for every PF in the system, as we cannot know from
> which one the packets arrive from.

Is it possible to track the "original in-port" through this two-stage
flow processing and generate rte_flow only on the specific PF that
received the
packet ?

Thanks,
-Harsha
>
> This series adds support for IPv6, tunnel push actions, and the
> multi-table model for the decap processing.
>
> v1 Travis passed: https://travis-ci.org/elibritstein/OVS/builds/639484233
>
> Eli Britstein (14):
>   dpif-netdev: Add mega ufid in flow add log
>   netdev-offload-dpdk: Remove pre-validate of patterns function
>   netdev-offload-dpdk: Add IPv6 pattern matching
>   netdev-offload-dpdk: Support offload of clone tnl_push/output actions
>   netdev-offload-dpdk: Support tnl/push using vxlan encap attribute
>   netdev-offload: Don't use zero flow mark
>   netdev-offload-dpdk: Introduce map APIs for table id and miss context
>   netdev-offload-dpdk: Implement HW miss packet recover for vport
>   netdev-offload-dpdk: Refactor disassociate ufid function
>   netdev-offload-dpdk: Support tunnel pop action
>   netdev-offload-dpdk: Implement flow dump create/destroy APIs
>   netdev-dpdk: Getter function for dpdk devargs API
>   netdev-dpdk: Introduce get netdev by devargs function
>   netdev-dpdk-offload: Add vxlan pattern matching function
>
> Ilya Maximets (4):
>   netdev: Allow storing dpif type into netdev structure.
>   netdev-offload: Use dpif type instead of class.
>   netdev-offload: Allow offloading to netdev without ifindex.
>   netdev-offload: Disallow offloading to unrelated tunneling vports.
>
> Ophir Munk (6):
>   dpif-netdev: Make mark allocation public API
>   dpif-netdev: Add HW miss packet state recover logic
>   netdev-dpdk: Introduce an API to query if a dpdk port is an uplink
>     port
>   netdev-dpdk-offload: Infrastructure for multiple rte_flows per UFID
>   netdev-dpdk: Add flow_api support for netdev vxlan vports
>   netdev-offload-dpdk: Support offload for vxlan vport
>
> Oz Shlomo (1):
>   netdev-offload: Add HW miss packet state recover API
>
>  Documentation/howto/dpdk.rst  |    5 +-
>  NEWS                          |    6 +-
>  lib/dpif-netdev.c             |   70 +--
>  lib/dpif-netlink.c            |   23 +-
>  lib/dpif.c                    |   21 +-
>  lib/netdev-dpdk.c             |   68 +++
>  lib/netdev-dpdk.h             |    6 +
>  lib/netdev-offload-dpdk.c     | 1275 +++++++++++++++++++++++++++++++++++------
>  lib/netdev-offload-provider.h |    6 +
>  lib/netdev-offload-tc.c       |   11 +-
>  lib/netdev-offload.c          |  115 ++--
>  lib/netdev-offload.h          |   23 +-
>  lib/netdev-provider.h         |    3 +-
>  lib/netdev.c                  |   16 +
>  lib/netdev.h                  |    2 +
>  ofproto/ofproto-dpif-upcall.c |    5 +-
>  tests/dpif-netdev.at          |   14 +-
>  tests/ofproto-macros.at       |    3 +-
>  18 files changed, 1394 insertions(+), 278 deletions(-)
>
> --
> 2.14.5
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list