[ovs-dev] [PATCH V3 0/4] netdev datapath flush offloaded flows

Eli Britstein elibr at nvidia.com
Sun Jan 17 08:22:33 UTC 2021


On 1/17/2021 9:48 AM, Sriharsha Basavapatna wrote:
> On Sun, Jan 10, 2021 at 12:25 PM Eli Britstein <elibr at nvidia.com> wrote:
>>
>> On 1/7/2021 9:10 PM, Sriharsha Basavapatna wrote:
>>> Hi Eli,
>>>
>>> On Mon, Dec 28, 2020 at 11:43 PM Eli Britstein <elibr at nvidia.com> wrote:
>>>> Netdev datapath offloads are done in a separate thread, using messaging
>>>> between the threads. With port removal there is a race between the offload
>>>> thread removing the offloaded rules and the actual port removal, so some
>>>> rules will not be removed.
>>> Can you provide details on this sequence, to get more clarity on how
>>> some rules will not be removed ?
>> A deletion sequence from dpif point of view starts with dpif_port_del.
>> It calls dpif->dpif_class->port_del (implemented by
>> dpif_netdev_port_del) and netdev_ports_remove.
>>
>> flow_mark_flush is called
>> (dpif_netdev_port_del->do_del_port->reconfigure_datapath->reload_affected_pmds).
>> This function only posts deletion requests to the queue
>> (queue_netdev_flow_del), to be later handled by the offload thread.
>>
>> When the offload thread wakes up to handle the request
>> (dp_netdev_flow_offload_del->mark_to_flow_disassociate) it looks for the
>> netdev object by netdev_ports_get in port_to_netdev map, racing the
>> execution of netdev_ports_remove that removes that mapping.
>>
>> If the mapping is removed before the handling, the removal of the HW
>> rule won't take place, leaking it and the related allocated memory.
> Thanks for the details; the above commit message could be reworded to
> be more clear like this:
>
> With port removal there is a race -->
>
> "With port removal,  there is a race due to which by the time the flow
> delete request is processed by the offload thread, the corresponding
> netdev would have already been removed (in dpif_port_del()).  And the
> offload thread fails to find the netdev in
> mark_to_flow_disassociate(). Thus the flow is not deleted from HW."
The series got merged.
>
>>>> thread removing the offloaded rules and the actual port removal, so some
>>>> rules will not be removed.
>>>> In OVS the offload objects are not freed (memory
>>>> leak).
>>> Since dp_netdev_free_flow_offload() frees the offload object, not sure
>>> how the memory leak happens ?
>> As eventually netdev_offload_dpdk_flow_del is not called, the objects in
>> ufid_to_rte_flow that should have been freed by
>> ufid_to_rte_flow_disassociate are leaked (as well as the rte_flows not
>> destroyed).
> Ok;  the commit message could be more specific about the objects that
> are leaked.
>
>>>> In HW the remining of the rules depend on PMD behavior.
>>> A few related questions:
>>>
>>> 1.  In Patch-1, netdev_flow_flush() is called from do_del_port().
>>> Doesn't it violate the layering rules, since port deletion shouldn't
>>> be directly involved with netdev/flow related operations (that would
>>> otherwise be processed as a part of datapath reconfig) ?
>> netdev/flow operations are called from dpif-netdev layer, that includes
>> do_del_port. We can call the offload functions under dp->port_mutex
>> which is locked in that function.
>>
>> Which layering rules are violated?
> With this change, a port directly operates on the flows, as opposed to
> letting the normal datapath routines delete the flow in the context of
> a specific pmd thread.
>
> Before this change, queue_netdev_flow_del() was the only entry point
> for flow deletion and it was called by either flow_mark_flush() for a
> given pmd thread or by dp_netdev_pmd_remove_flow() again for a given
> pmd thread. But with the new change, the flow gets deleted without the
> context of a pmd thread.  Also, we are now entirely bypassing the
> offload thread.
The change only removes the offloads of the port. It doesn't handle 
removing the flows themselves.
>>> 2. Since the rte_flow is deleted first, doesn't it leave the
>>> dpif_netdev_flow in an inconsistent state, since the flow is still
>>> active from the pmd thread's perspective (megaflow_to_mark and
>>> mark_to_flow cmap entries are still valid, indicating that it is
>>> offloaded).
>> That flow is a sequence not related to this patch set, that occurs normally.
>>
>> Currently there are two types of offloads (both are ingress):
>>
>> - Partial. Packets are handled by SW with or without offload. Packets
>> are fully processed correctly by the SW either if they don't have mark
>> or have mark that is not found by mark_to_flow_find.
>>
>> - Full. Packets will be handled correctly by the SW as they arrive with
>> no mark.
>>
>> The mark<->megaflow mapping disassociation is handled correctly even if
>> the rte_flow is not removed.
>>
>> However, indeed, this is a good point to take into consideration when
>> implementing egress offloading.
> I understand how full and partial offloads work w.r.t mark. That was
> not my point here. For an offloaded flow (partial or full), entries
> are added to the "megaflow_to_mark" and "mark_to_flow" cmaps after the
> flow is successfully offloaded by rte. Those entries get removed, when
> the flow gets deleted, through dp_netdev_flow_offload_del() .  The
> cmap entries are removed at the same time the flow is deleted from
> rte. But with this change, the deletion of a flow occurs in two
> separate sequences, leaving the entries in cmap (for sometime) while
> there's no corresponding offloaded rte_flow.
>
> Instead of flushing flows from do_del_port(), why can't we just wait
> for the netdev reference count to drop, since you would have taken a
> reference on the netdev for each rte_flow [ while offloading, in
> ufid_to_rte_flow_associate() ].
> Let reconfigure_datapath()->reload_affected_pmds() issue delete
> requests to the offload thread; delete requests get processed by the
> offload thread and the reference on the netdev is released for each
> flow. Wait on the refcnt to drop in do_del_port() before returning.
> This way, you go through the normal steps to delete each flow (via
> offload thread), also ensuring that the cmap entries and the rte_flow
> are deleted together.

As explained, it doesn't matter if the mark cmaps and the rte_flows are 
removed "together" or not, like with this change.

This approach is valid, but also has some cons. We need to wait in 
do_del_port thread for the offload thread, because we must not remove 
the port mapping before the offload requests are handled. Furthermore, 
queuing such requests may take eventually longer as they will be queued 
(and handled) after possible existing requests. As this is already 
merged, any change should be done on top of it.

>
>>> 3. Also, how does this work with partially offloaded flows (same
>>> megaflow mapped to multiple pmd threads) ?
>> Also not related to this patch-set. The flow is ref-counted. Upon the
>> first ref the rte_flow is created, and it is destroyed with the last
>> unref (see mark_to_flow_disassociate).
>>> Thanks,
>>> -Harsha
>>>> This patch-set resolves this issue using flush.
>>>>
>>>> v2-v1:
>>>> - Fix for pending offload requests.
>>>> v3-v2:
>>>> - Rebase.
>>>>
>>>> Travis:
>>>> v1: https://travis-ci.org/github/elibritstein/OVS/builds/747022942
>>>> v2: https://travis-ci.org/github/elibritstein/OVS/builds/748788786
>>>> v3: https://travis-ci.org/github/elibritstein/OVS/builds/751787939
>>>>
>>>> GitHub Actions:
>>>> v1: https://github.com/elibritstein/OVS/actions/runs/394296553
>>>> v2: https://github.com/elibritstein/OVS/actions/runs/413379295
>>>> v3: https://github.com/elibritstein/OVS/actions/runs/448541155
>>>>
>>>> Eli Britstein (4):
>>>>     dpif-netdev: Flush offload rules upon port deletion
>>>>     netdev-offload-dpdk: Keep netdev in offload object
>>>>     netdev-offload-dpdk: Refactor disassociate and flow destroy
>>>>     netdev-offload-dpdk: Implement flow flush
>>>>
>>>>    lib/dpif-netdev.c         |  2 +
>>>>    lib/netdev-offload-dpdk.c | 85 +++++++++++++++++++++++++--------------
>>>>    2 files changed, 56 insertions(+), 31 deletions(-)
>>>>
>>>> --
>>>> 2.28.0.546.g385c171
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev at openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list