[ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload

Jiri Pirko jiri at resnulli.us
Tue Aug 26 14:06:30 UTC 2014

Tue, Aug 26, 2014 at 03:50:21PM CEST, roopa at cumulusnetworks.com wrote:
>On 8/25/14, 3:50 PM, Thomas Graf wrote:
>>On 08/25/14 at 12:15pm, Jamal Hadi Salim wrote:
>>>On 08/25/14 10:17, Thomas Graf wrote:
>>>>On 08/25/14 at 09:53am, Jamal Hadi Salim wrote:
>>>>fdb_add() *is* flow based. At least in my understanding, the whole
>>>>point here is to extend the idea of fdb_add() and make it understand
>>>>L2-L4 in a more generic way for the most common protocols.
>>>>The reason fdb_add() is not reused is because it is Netlink specific
>>>>and only suitable for User -> HW offload. Kernel -> HW offload is
>>>>technically possible but not clean.
>>>I dont think we have a problem handling any of this today.
>>Yes we do. It's restricted to L2 and we can't extend it easily
>>because it is based on NDA_*. The use of Netlink makes in-kernel
>>usage a pain. To me this is the sole reason for not using fdb_add()
>>in the first place. It seems absolutely clear though that fdb_add()
>>should be removed after the more generic ndo is in place providing
>>a superset of what fdb_add() can do today.
>>>This is where our (shall i say strong) disagreement is.
>>>I think you will find it non-trivial to show me how you can
>>>actually take the simple L2 bridge and map it to a "flow".
>>>Since your starting point is "everything can be represented via a flow
>>>and some table" - we are at a crosspath.
>>OK, let me do the convertion for you:
>>NDA_DST		unused
>>NDA_LLADDR	sw_flow_key.eth.dst
>>NDA_PROBES	unused
>>NDA_VLAN	sw_flow_key.eth.tci
>>NDA_PORT	unused
>>NDA_VNI		sw_flow_key.tun_key.tun_id
>>NDA_IFINDEX	sw_flow_key.phys.in_port
>>NDA_MASTER	unused
>>>The tc filter API seems to be doing just that.
>>>You have different types of classifiers - the h/w may not be able
>>>to support some classifier types - but that is a capability discovery
>>Agreed but tc is only one out of many possible existing interfaces
>>we have. macvtap (given we want to extend beyond L2), routing,
>>OVS, bridge and eventually even things like a team device can and
>>should make use of offloads.
>>>I am saying two things:
>>>1) There are a few "fundamental" interfaces; L2 and L3 being some.
>>>Add crypto offload and a few i mentioned in  my presentation. We
>>Can you share that preso? I was not present.
>>>know how to do those. example; there is nothing i cant do with
>>>the rtmsg that is L3. or the fdb/port/vlan filter for L2.
>>>This flow thing should stay out of those.
>>Let me remind you about the name of the structure behind all L3
>>forwarding decisions:
>>         struct flowi4 {
>>		[...]
>>	}
>>Adding a route means adding a flow. Can we please stop the flow
>>bashing? The concept of a flow is very generic, well known and already
>>very present in the kernel.
>>The sw_flow_key proposed comes close to flowi4. Some fields are
>>different. They can eventually get merged. The strict IPv4/IPv6
>>separation is what makes it non obvious and probably why Jiri chose
>>the OVS representation. If you say rtmsg is complete then that clearly
>>is not the case. In particular VTEP fields, ARP, and TCP flags are
>>clearly missing for many uses.
>>Again, I'm not saying flow is the ultimate answer to everything. It
>>is not. But a lot of hardware out there is aware of flows in combination
>>with some form of action execution. Non flow based hardware can have
>>their own classifier.
>>>2) The flow thing should allow a variety of classifiers to be
>>>handled. Again capability discovery would take care of differences.
>>So you want the flow to represent something that is not a flow. Again,
>>this comes back to the conversation in the other email. If this is
>>all about having a single ndo I'm sure we can find common grounds on
>From what i understood (trying to summarize here for my own benefit):
>the switchdev api currently under review proposes every switch asic offload
>abstraction as a flow.
>It does not mandate this via code, however, there seems to be some discussion
>along those lines.
>The switchdev api flow ndo's need to stay for switch asic drivers that
>support flows directly or
>possibly want all their hw offload abstraction to be represented by the flow
>abstraction (openvswitch, the rocker dev ). The details of how the flow is
>mapped to hw lies in the corresponding switch driver code.


>We think rtnetlink is the api to model switch asic hw tables.
>We have a working model (Cumulus) that maps rtnetlink to switch
>asic hw tables (via snooping rtnetlink msgs). This can be done by extending
>the switchdev api
>with new ndo's for l2 and l3.
>  new switchdev ndo's for fdb_add/fdb_del
>  new switchdev ndo's for l3


>Now we only need working patches that implement switchdev api ndo ops for
>l2/l3 (this is in the works).
>As long as the current patches under review allow the extension of the api to
>cover non-flow based l2/l3 switch asic offloads, we might be good (?).

Yes. Flows are phase one. The api will be extended in for whatever is
needed for l2/l3 as you said. Also I see a possibility to implement the
l2/l3 use case with flows as well. But generally, as stands for ever in-kernel
api, we can extend it and change it.

>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo at vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

More information about the dev mailing list