[ovs-dev] [PATCH v2 6/8] netdev-dpdk: retry with queue action

Finn Christensen fc at napatech.com
Thu Sep 14 08:14:25 UTC 2017



-----Original Message-----
From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-bounces at openvswitch.org] On Behalf Of Darrell Ball
Sent: 13. september 2017 18:18
To: Simon Horman <simon.horman at netronome.com>
Cc: dev at openvswitch.org
Subject: Re: [ovs-dev] [PATCH v2 6/8] netdev-dpdk: retry with queue action



On 9/13/17, 2:57 AM, "Simon Horman" <simon.horman at netronome.com> wrote:

    On Tue, Sep 12, 2017 at 08:36:19AM +0000, Darrell Ball wrote:
    > 
    > On 9/10/17, 11:14 PM, "ovs-dev-bounces at openvswitch.org on behalf of Yuanhan Liu" <ovs-dev-bounces at openvswitch.org on behalf of yliu at fridaylinux.org> wrote:
    > 
    >     On Fri, Sep 08, 2017 at 06:48:50PM +0200, Simon Horman wrote:
    >     > On Tue, Sep 05, 2017 at 05:22:59PM +0800, Yuanhan Liu wrote:
    >     > > From: Finn Christensen <fc at napatech.com>
    >     > > 
    >     > > AFAIK, most (if not all) NICs (including Mellanox and Intel) do not
    >     > > support a pure MARK action.  It's required to be used together with
    >     > > some other actions, like QUEUE.
    >     > > 
    >     > > To workaround it, retry with a queue action when first try failed.
    >     > > 
    >     > > Moreover, some Intel's NIC (say XL710) needs the QUEUE action set
    >     > > before the MARK action.
    >     > > 
    >     > > Co-authored-by: Yuanhan Liu <yliu at fridaylinux.org>
    >     > > Signed-off-by: Finn Christensen <fc at napatech.com>
    >     > > Signed-off-by: Yuanhan Liu <yliu at fridaylinux.org>
    >     > 
    >     > This feels a bit like the tail wagging the dog.
    >     > Is this the lowest level at which it makes sense to implement
    >     > this logic?
    >     > 
    >     > If so then I wonder if some sort of probing would be in order
    >     > to avoid the cost of trying to add the flow twice to hardware
    >     > where the queue is required.
    >     
    >     Do you mean something like rte_flow capability query, like whether
    >     a queue action is needed for a mark action? If so, yes, I do think
    >     we miss an interface like this.
    >     
    >     Note that even in this solution, the flow won't be created twice
    >     to the hardware, because the first try would be failed.
    > 
    > [Darrell]
    > 
    >               Having an api to quey capability and avoid the first try to HW would be nice, but there are dependencies
    >                on RTE, drivers etc and I don’t know definitive the api would be.
    > 
    >              Also, as nics are added this capability needs to be done and state needs to be kept in all cases.
    > 
    >            It is an enhancement and if done should be reliable.
    
    Agreed. Though I was more thinking of probing the hardware rather than
    having a capability API - I expect this would remove several of the
    dependencies you describe above.


[Darrell] I have been pondering the probing option as well. It is certainly a valid option; we use it in other cases such as datapath probing. One of the aspects that worries me here is maintaining the correct per interface (essentially; although the attribute is per nic) state across various events such as new ports being added, vswitchd restarts, races with flow creation. It would be non-trivial I guess and probably appropriate for the next patch series, if done.

In this case, we have what seems like a clear distinction b/w Napatech which does not need the queue action workaround and everything else, which does.
Besides the non-Napatech behavior, which is worrisome, maintaining the difference for flow handling under the covers is concerning.

I wonder if we should be upfront as possible here and just have a dpdk interface configuration – maybe something like “supports native HWOL mark action” since the better behavior is the exception?
The interface config would be more robust than probing.
This would need documentation, of course.

[Finn]
I think the rte queue action should never be used here when using partial HWOL. Not the way OVS handles multi queues today. 
Maybe a "default queue" could be used in the dpdk PMDs when no queue is specified in rte flow?
Essentially, this is a mismatch between the rte flow impl functionality in PMD and the needs in OVS for flow classification, in a partial HWOL setup.
However, we would not mind Darells proposal, it makes sense since most nic PMDs unfortunately needs this and it is only relevant for partial HWOL. We are mostly concerned with full HWOL and therefore see partial HWOL as a failover when full HWOL could not be handled in nic, if enabled.


I think anyways we need documentation describing the difference b/w nics in the dpdk documentation (howto part).

    
    Assuming no such enhancement is appropriate at this time I would
    still like to ask if this is the best place for this hardware-specific code?

[Darrell]

For OVS, the netdev-dpdk layer is the lowest layer.
This kind of workaround is hard to hide, since we are messing with the rxq, so I think OVS needs to know that it is in effect anyways. An alternative is to supply a mark and an ‘optional queue’ and let the driver decide if the queue is needed and report back whether it was. This would be hard to do across various drivers. Supporting in the rte layer would require both rte and driver support, so even more support.


    
    >            A separate comment is we need to document which nics need the queue action.
    > 
    >          Also, I think we should check errno in the present code.
    

_______________________________________________
dev mailing list
dev at openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list