[ovs-dev] [PATCH v2 0/8] OVS-DPDK flow offload with rte_flow

Yuanhan Liu yliu at fridaylinux.org
Mon Sep 11 09:11:36 UTC 2017


On Sun, Sep 10, 2017 at 04:12:47PM +0000, Chandran, Sugesh wrote:
> Hi Yuanhan,
> 
> Thank you for sending out the patch series. 

Hi Sugesh,

Thank you for taking your time to review it!


> We are also looking into something similar to enable full offload in OVS-DPDK.

Good to know!

> It is based on ' http://dpdk.org/ml/archives/dev/2017-September/074746.html' and some other rte_flow extension in DPDK.

I saw the patches, I will take some time to read it.

> 
> It is noted that the patch series doesn't work very well for some of our requirements.
> Please find below for the high level comments. I have also provided specific comments on the following patches.
> 
> 1) Looks to me the patch series is to enable/use just one functionality of NIC(mark action). In a multiple hardware environment it is necessary to have a feature discovery mechanism to define what needs to be installed in the hardware based on its capabilities, for eg: MARK+QUEUE, MARK only, number of supported flow entries, supported flow fields and etc. This is very important to support different hardware NICs and make flow install easy.

Yes, you are right. I have also observed this issue while coding this
patch.

> In our implementation we have a feature discovery at the OVS init. It will also populate the OVSDB to expose the device capability to higher management layers. The new table introduced in OVSDB is like below.

The solution I want to go, however, is different though. I was thinking
to introduce few DPDK rte_flow APIs and structs to define the NIC flow
capabilities.

I think this would help long run, as the capabilitiy will be updated
as the new features are added (when new versions are released). For the
solution you proposed, it won't allow DPDK work with multiple DPDK versions
(assume they provides different rte flow capabilities).

>   <table name="hw_offload">
>     <p>
>       Hardware switching configuration and capabilities.
>     </p>
>     <column name="name">
>       The name of hardware acceleration device.
>     </column>
>     <column name="dev_id" type='{"type": "integer", "minInteger": 0, "maxInteger": 7}'>
>       The integer device id of hardware accelerated NIC.
>     </column>
>      <column name="pci_id" type='{"type": "string"}'>
>       The PCI ID of the hardware acceleration device. The broker id/PF id.
>      </column>
>      <column name="features" key="n_vhost_ports" type='{"type": "integer"}'>
>       The number of supported vhost ports in the hardware switch.
>      </column>
>   </table>
> 
> The features column can be extended with more fields as necessary.
> IMO the proposed partial offload doesn't need populating the OVSDB, however its necessary to have some kind of feature discovery at init.
> 
> 2) I feel its better to keep the hardware offload functionalities in netdev as much as possible similar to kernel implementation. I see changes in upcall and dpif. 

I agree with you. But unfortunately, due to some dirver or hardware
limitation, that's what I can get best.

> 3) The cost of flow install . PMDs are blocked when a hardware flow install is happening. Its an issue when there are lot of short lived flows are getting installed in the DP.

I wasn't aware of it. Thank you for letting me know that!

> One option to handle this would be move the flow install into revalidate. The advantage of this approach would be hardware offload would happen only when a flow is being used at least for sometime. Something like how revalidator thread handle the flow modify operation.

Yes, it sounds workable. However, the MARK and QUEUE workaround won't
work then: we need record the rxq first. And again, I know the workaround
is far from being perfect.

> 4) AFAIK, these hardware programmability for a NIC/not for a specific port. i.e the FDIR/RSS hash configuration are device specific. This will be an issue if a NIC shared between kernel and DPDK drivers? 

That might be NIC specific. What do you mean by sharing between kernel
and DPDK? In most NICs I'm aware of, it's requried to unbind the kernel
driver first. Thus, it won't be shared. For Mellanox, the control unit
is based on queues, thus it could be shared correctly.

	--yliu


More information about the dev mailing list