[ovs-dev] [PATCH ovs-dev v1] netdev-offload: Flush offloaded rules when ports removed

Ilya Maximets i.maximets at ovn.org
Mon Jun 15 11:13:15 UTC 2020


On 6/14/20 1:40 PM, Tonghao Zhang wrote:
> On Sun, Jun 14, 2020 at 1:31 PM Roi Dayan <roid at mellanox.com> wrote:
>>
>>
>>
>> On 2020-06-11 1:36 PM, xiangxia.m.yue at gmail.com wrote:
>>> From: Tonghao Zhang <xiangxia.m.yue at gmail.com>
>>>
>>> When ports were removed from bridge, we should flush the
>>> offload rules on the ports. The main reason is two factors:
>>>
>>> * The ports removed from bridge, will be managed by OvS, so
>>>   flush the rules which installed by OvS.
>>> * If using the TC flower offload, for example, tc rules still
>>>   are on the ports. And the information still are maintained by
>>>   OvS, such as the mapping for tc and ufid.
>>>   Then if adding the port to bridge and installing the rules
>>>   to it again, *del_filter_and_ufid_mapping will be invoked,
>>>   and delete the tc rule using tc handle which may not exist
>>>   (offload init api flushed them.) on kernel or is used by other
>>>   previous rules (if so, that rules will be deleted that is not
>>>   we expected.).
>>>
>>> Cc: Simon Horman <simon.horman at netronome.com>
>>> Cc: Paul Blakey <paulb at mellanox.com>
>>> Cc: Roi Dayan <roid at mellanox.com>
>>> Cc: Ben Pfaff <blp at ovn.org>
>>> Cc: William Tu <u9012063 at gmail.com>
>>> Cc: Ilya Maximets <i.maximets at ovn.org>
>>> Tested-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.com%2Fgithub%2Fovn-open-virtual-networks%2Fovs%2Fbuilds%2F170832624&data=02%7C01%7Croid%40mellanox.com%7Cf0bc38aa88074693105108d80df36979%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637274686378147436&sdata=Dlq4jnSclj0vXQoldO7Oq2Wwx%2BGEHjK%2FftPUDHnJcKM%3D&reserved=0
>>> Co-authored-by: Wengang Hou <houwengang at didiglobal.com>
>>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue at gmail.com>
>>> ---
>>>  lib/netdev-offload.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
>>> index ab97a292ebac..964566caab1e 100644
>>> --- a/lib/netdev-offload.c
>>> +++ b/lib/netdev-offload.c
>>> @@ -593,6 +593,7 @@ netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class)
>>>      data = netdev_ports_lookup(port_no, dpif_class);
>>>      if (data) {
>>>          dpif_port_destroy(&data->dpif_port);
>>> +        netdev_flow_flush(data->netdev); /* flush offloaded rules. */
>>>          netdev_close(data->netdev); /* unref and possibly close */
>>>          hmap_remove(&port_to_netdev, &data->portno_node);
>>>          hmap_remove(&ifindex_to_port, &data->ifindex_node);
>>>
>>
> Hi Roi
> Thanks for your review. The case is reproduced as below:
> # ovs-appctl dpctl/show
> system at ovs-system:
>   lookups: hit:0 missed:0 lost:0
>   flows: 0
>   masks: hit:0 total:0 hit/pkt:0.00
>   port 0: ovs-system (internal)
>   port 1: br-int (internal)
>   port 2: enp130s0f0_0
>   port 3: enp130s0f0_1
>   port 4: vxlan_sys_4789 (vxlan: packet_type=ptap)
> 
> # cat /tmp/tmp-ovs-debug.sh
> set -x
> ovs-appctl dpctl/add-flow
> 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> 3
> ovs-appctl dpctl/dump-flows
> ovs-vsctl del-port br-int enp130s0f0_0
> tc filter show dev enp130s0f0_0 ingress
> ovs-appctl dpctl/dump-flows
> 
> # sh /tmp/tmp-ovs-debug.sh
> + ovs-appctl dpctl/add-flow
> 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> 3
> + ovs-appctl dpctl/dump-flows
> recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200),
> packets:0, bytes:0, used:0.020s, actions:3
> + ovs-vsctl del-port br-int enp130s0f0_0
> + tc filter show dev enp130s0f0_0 ingress
> filter protocol ip pref 2 flower chain 0
> filter protocol ip pref 2 flower chain 0 handle 0x1
>   dst_mac 00:11:22:33:44:66
>   eth_type ipv4
>   dst_ip 1.1.1.200
>   in_hw in_hw_count 1
> action order 1: mirred (Egress Redirect to device enp130s0f0_1) stolen
> index 1 ref 1 bind 1
> cookie 40ca20750c4add5a9cc1c880ccb3d0e8
> no_percpu
> used_hw_stats delayed
> 
> + ovs-appctl dpctl/dump-flows
> 
> The tc rules on port enp130s0f0_0 are not deleted, we can use the tc
> command to show them.
> 
>> it looks ok but I wonder if it's redundant?
>> I tested and when I removed a port with tc rules on it,
>> all tc rules got deleted using the flow api flow del,
>> we get into netdev_tc_flow_del() for each flow.
> I didn't find the netdev_tc_flow_del was invoked while the port was deleting.
> The debug step is:
> 1. run the /tmp/tmp-ovs-debug.sh
> 2. and while gdb the ovs-vswitchd:
> gdb /root/local/openvswitch-master-dpdk/sbin/ovs-vswitchd -ex 'b
> netdev_tc_flow_del' -ex c --pid [ovs-vswitchd pid]
> 
> the breakpoint was not triggered.
>> So you shouldn't have a tc rule left or ufid mapping.
> As I debug, the  netdev_tc_flow_del was not invoked, and the ufids of
> flow still are in the mapping.
> When we add the port back and the install the rules again, the new
> rules(e.g. tc handle 10) may be deleted by the later new rules(which
> ufid mapping not
> deleted and the handle may be 10), via  del_filter_and_ufid_mapping.
>> How did you test that you saw tc rules were still on the port?
> Yes, I test it using the tc command.

Hi.

{add/del/mod}-flow dpctl commands are for *debugging purposes only*.  And this
is clearly stated in documentation:

"DATAPATH FLOW TABLE DEBUGGING COMMANDS
 ...
 Do not use commands to add or remove or modify datapath flows if ovs-vswitchd
 is running because it interferes with ovs-vswitchd's own datapath flow
 management.  Use ovs-ofctl(8), instead, to work with OpenFlow flow entries."

As you're injecting flows directly to datapath you can't expect higher layers
to handle them correctly.  The issue here is that you're using OVS in a way
it's not intended to be used.  Please, configure OVS rules via OpenFlow or
be sure that you're removing flows by hands from the datapath as you're adding
them.

In case above scenario leaves ufids in OVS that cannot be removed, I'd suggest
reverting of the previous patch that allowed offloading via add-flow since this
is effectively a memory leak and even debugging commands should not produce
a memory leak.

---

BTW, IMHO, it's a very confusing design that we're installing rules directly to
tc.  The issue here is that we have to maintain list of ports we offloaded to
with ufids completely in userspace and we can not re-create these mappings after
OVS restart like we do in case of kernel datapath.  The main issue is that we
do not know which ports are ours.  Datapath is actually split between kernel
and userspace with all the metadata in ovs-vswitchd and flows in tc in kernel.
This leads to big number of issues.  If we could install tc rules from the
inside of the usual kernel datapath that might solve a lot of configuration issues.
Might create some new, but anyway, I think, common architecture would be much
more clear.
For example, in userspace datapath we're installing flows to usual userspace
flow table AND to offload provider. This allows us to manage flows in more
natural way.  And even if we will lost some flows inside offload provider
implementation we will still clean up all the hanging data while removing flows
from the usual userspace datapath flow tables.  So, offloading is actually
joined part of userspace datapath and not a side thing like tc for kernel datapath.
So, what is this part about: kernel datapath has persistent ports and flows
(persistent in terms of ovs-vswitchd restarts), but tc offloading has persistent
flows and non-persistent ports (i.e. list of ports we're affloading to) and
non-persistent flows metadata (ufids of flows we offlaoded).  That introduces
a lot of complications.


Best regards, Ilya Maximets.


More information about the dev mailing list