[ovs-dev] [patch net-next RFC 10/12] openvswitch: add support for datapath hardware offload

Jiri Pirko jiri at resnulli.us
Sat Aug 23 09:24:58 UTC 2014


Sat, Aug 23, 2014 at 12:53:34AM CEST, sfeldma at cumulusnetworks.com wrote:
>
>On Aug 22, 2014, at 12:39 PM, John Fastabend <john.fastabend at gmail.com> wrote:
>
>> On 08/21/2014 09:19 AM, Jiri Pirko wrote:
>>> Benefit from the possibility to work with flows in switch devices and
>>> use the swdev api to offload flow datapath.
>> 
>> we should add a description here on the strategy being used.
>> 
>> If I read this correctly this will try to add any flow to the
>> hardware along with the actions and duplicate it in software.
>> 
>> There are a couple things I don't like,
>> 
>> - this requires OVS to be loaded to work. If all I want is
>>   direct access to the hardware flow tables requiring openvswitch.ko
>>   shouldn't be needed IMO. For example I may want to use the
>>   hardware flow tables with something not openvswitch and we
>>   shouldn't preclude that.
>> 
>
>The intent is to use openvswitch.ko’s struct sw_flow to program hardware via the ndo_swdev_flow_* ops, but otherwise be independent of OVS.  So the upper layer of the driver is struct sw_flow and any module above the driver can construct a struct sw_flow and push it down via ndo_swdev_flow_*.  So your non-OVS use-case should be handled.  OVS is another use-case.  struct sw_flow should not be OVS-aware, but rather a generic flow match/action sufficient to offload the data plane to HW.

Yes. I was thinking about simple Netlink API that would expose direct
sw_flow manipulation (ndo_swdev_flow_* wrapper) to userspace. I will
think abou that more and perhaps add it to my next patchset version.

>
>> - Also there is no programmatic way to learn which flows are
>>   in hardware and which in software. There is a pr_warn but
>>   that doesn't help when interacting with the hardware remotely.
>>   I need some mechanism to dump the set of hardware tables and
>>   the set of software tables.
>
>Agreed, we need a way to annotate which flows are installed hardware.

Yes, we discussed that already. We need to make OVS daemon hw-offload
aware indicating which flow it want/prefers to be offloaded. This is I
believe easily extentable feature and can be added whenever the right
time is.

>
>> - Simply duplicating the software flow/action into
>>   hardware may not optimally use the hardware tables. If I have
>>   a TCAM in hardware for instance. (This is how I read the patch
>>   let me know if I missed something)
>
>The hardware-specific driver is the right place to handle optimizing the flow/action in hardware since only the driver can know the size/shape of the device.  struct sw_flow is a generic flow description; how (or if) a flow gets programmed into hardware must be handled in the swdev driver.  If the device driver can’t make the sw_flow fit into HW because of resource limitations or the flow simply can’t be represented in HW, then the flow is SW only.  
>
>In the rocker driver posted in this patch set, the steps are to parse the struct sw_flow to figure out what type of flow match/action we’re dealing with (L2 or L3 or L4, ucast or mcast, ipv4 or ipv6, etc) and then install the correct entries into the corresponding device tables within the constraints of the device’s pipeline.  Any optimizations, like coalescing HW entries, is something only the driver can do.
>
>> 
>> - I need a way to specify put this flow/action in hardware,
>>   put this flow/action in software, or put this in both software
>>   and hardware.
>> 
>
>This seems above the swdev layer.  In other words, don’t call ndo_swdev_flow_* if you don’t want flow match/action install in HW.
>
>>   We did this with a bitmask in the fdb L2 stuff and it seems to
>>   work reasonable well so maybe something like that would help.
>> 
>>   For example if I don't have this what happens if I have an
>>   entry to decrement TTL in both hardware and software. If the
>>   flow hits both the hardware path and software path the TTL
>>   gets decremented. Here userspace needs to indicate where to
>>   do the decrement to avoid the duplication.
>
>I’m not following why a flow would hit both HW and SW paths.  That seems bad, and negating to effort of offloading the flow to HW in the first place.  My simple view is if flow hits HW path, then SW path is unaware.  Clearly work is needed to provide coherent view to user with respect to stat counters and such, but I believe do-able.
>
>> 
>> I think if we can pull this out OVS and add the hw/sw bitmask (or
>> maybe a better implementation of that idea) then this should work
>> for the stuff I'm looking at. I want to try and get it working on
>> the i40e driver as a fdir replacement but it might take me a bit
>> to get to it.
>
>That sounds cool and would really help get the interface in place.  Take another look at the way Jiri has busted out sw_flow.h and see if this works for you outside of an OVS context.  If not, we need to fix it.

Great. John, please keep us posted about your progress.
Let me know if you need any help.

>
>
>> 
>> Thanks,
>> John
>> 
>> 
>> -- 
>> John Fastabend         Intel Corporation
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>-scott
>
>
>



More information about the dev mailing list