[ovs-dev] [PATCH ovs V1 1/9] dpif-hw-acc: New dpif provider

Mon Nov 14 20:44:03 UTC 2016

On 1 November 2016 at 07:53, Paul Blakey <paulb at mellanox.com> wrote:
> Added infrastructure for a new provider that will be able
> to send some flows to supporting HW for offloading.
>
> Signed-off-by: Paul Blakey <paulb at mellanox.com>
> Signed-off-by: Shahar Klein <shahark at mellanox.com>

The amount of boilerplate makes me wonder if this could be structured
a bit better.

At minimum, I think that there is a bunch of functions that could be
directly used from dpif-netlink.c if they were exported by a private
header file, eg lib/dpif-netlink-priv.h which is included from
dpif-netlink and dpif-hw-acc.

A more invasive question is whether it makes more sense to push the
flow hardware offloads down to the netdev layer? As far as I follow,
the Linux netdev community has decided that there is supposed to be
one way to configure these offloads, and it is device-centric.
Furthermore it has impacts on how things like QoS are configured. So,
if the netdev interface provided a set of 'hardware flow offload' APIs
(similar to the flow APIs in dpif today), then the netdev-linux
implementation should be able to reason about how the ordering of
these QoS and offload pieces work.

What I'm proposing is something like:
netdev_flow_flush(netdev *)
netdev_flow_dump_create(netdev *)
netdev_flow_dump_destroy(netdev *)
netdev_flow_dump_next(netdev *, struct match *, int max_flows)
netdev_flow_put(netdev *, struct match *, struct ufid *)
netdev_flow_del(netdev *, struct match *, struct ufid *)

Part of the argument is from a perspective of sharing code. Each
dpif-provider roughly handles 5 things today:
* Port management
* Upcalls
* Downcall / execution
* Flow management
* Conntrack management

>From what I can tell, there is maybe a few differences for dpif-hw-acc
here in port management and primarily flow management. If the netdev
layer allowed insertion, fetch, dump and delete of some variation on
"struct match" (plus maybe UFID), what would be left in this file?
Then my question becomes, is there a specific reason to force users to
set the bridge datapath_type, or can OVS just intelligently probe for
offloads at startup then attempt to use them if available?

To explore the above question a little further, I noticed four maps in
this implementation:
* port_to_netdev - Used to translate ifindex to netdev (could be
a generic library function in lib/netdev.c); Also used for collecting
all of the ports so they can be iterated over to flush/dump/etc. This
part I'm not so sure how to handle but I'm sure something can be
figured out.
* ufid_to_handle/handle_to_ufid - I'm a little confused by these, UFID
is supposed to uniquely identify a flow and that's what a handle does
(at least for a particular device, right?) so at a naive first look I
would have though that these can be statelessly translated between
without map storage. There's a couple of different places where
puthandle() is called so I wasn't exactly sure how the handle is
determined.
* mask_to_prio - I think this is just a cache to avoid malloc() calls?

This direction might suggest that each platform can choose how to do
its hardware offloads and we put that logic into the 'struct netdev'
in userspace, then we don't need an additional dpif just for tc flower
hardware offloads. This assumes that each platform comes up with one
way to do hardware offloads.

If particular platforms or implementations decide not to include a
separate API for hardware offloads, they can always do the offloads
transparently as they already do, and just ignore this interface.

I had some discussions with Johann offline and there was some concern
that we end up with, say, four dpifs with potentially arbitrary
ordering between them. This is a lot of permutations. If the above
logic makes sense, then the offloads is an attribute of the netdevice
and not a dedicated dpif, so there are less possible permutations. In
terms of handling the different possible configurations of two dpifs
like shadowing or duplicating flows etc, this logic could exist within
the hw-offload-policy module.

The last part would be to extend the dpif-netlink a bit so that during
startup there is an attempt to query the hardware offload capabilities
so that, eg, if there is no offloads available there is a simple
switch to turn this logic off and there is minimal change for that
path; and ideally it gathers enough information so that the typical
offloads case doesn't involve attempting + failing to install hardware
flow followed by installing the software flow.

Thoughts?