[ovs-dev] [PATCH ovs-dev v1] netdev-offload: Flush offloaded rules when ports removed

Tonghao Zhang xiangxia.m.yue at gmail.com
Tue Jun 16 08:25:21 UTC 2020


On Mon, Jun 15, 2020 at 7:13 PM Ilya Maximets <i.maximets at ovn.org> wrote:
>
> On 6/14/20 1:40 PM, Tonghao Zhang wrote:
> > On Sun, Jun 14, 2020 at 1:31 PM Roi Dayan <roid at mellanox.com> wrote:
> >>
> >>
> >>
> >> On 2020-06-11 1:36 PM, xiangxia.m.yue at gmail.com wrote:
> >>> From: Tonghao Zhang <xiangxia.m.yue at gmail.com>
> >>>
> >>> When ports were removed from bridge, we should flush the
> >>> offload rules on the ports. The main reason is two factors:
> >>>
> >>> * The ports removed from bridge, will be managed by OvS, so
> >>>   flush the rules which installed by OvS.
> >>> * If using the TC flower offload, for example, tc rules still
> >>>   are on the ports. And the information still are maintained by
> >>>   OvS, such as the mapping for tc and ufid.
> >>>   Then if adding the port to bridge and installing the rules
> >>>   to it again, *del_filter_and_ufid_mapping will be invoked,
> >>>   and delete the tc rule using tc handle which may not exist
> >>>   (offload init api flushed them.) on kernel or is used by other
> >>>   previous rules (if so, that rules will be deleted that is not
> >>>   we expected.).
> >>>
> >>> Cc: Simon Horman <simon.horman at netronome.com>
> >>> Cc: Paul Blakey <paulb at mellanox.com>
> >>> Cc: Roi Dayan <roid at mellanox.com>
> >>> Cc: Ben Pfaff <blp at ovn.org>
> >>> Cc: William Tu <u9012063 at gmail.com>
> >>> Cc: Ilya Maximets <i.maximets at ovn.org>
> >>> Tested-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.com%2Fgithub%2Fovn-open-virtual-networks%2Fovs%2Fbuilds%2F170832624&data=02%7C01%7Croid%40mellanox.com%7Cf0bc38aa88074693105108d80df36979%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637274686378147436&sdata=Dlq4jnSclj0vXQoldO7Oq2Wwx%2BGEHjK%2FftPUDHnJcKM%3D&reserved=0
> >>> Co-authored-by: Wengang Hou <houwengang at didiglobal.com>
> >>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue at gmail.com>
> >>> ---
> >>>  lib/netdev-offload.c | 1 +
> >>>  1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> >>> index ab97a292ebac..964566caab1e 100644
> >>> --- a/lib/netdev-offload.c
> >>> +++ b/lib/netdev-offload.c
> >>> @@ -593,6 +593,7 @@ netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class)
> >>>      data = netdev_ports_lookup(port_no, dpif_class);
> >>>      if (data) {
> >>>          dpif_port_destroy(&data->dpif_port);
> >>> +        netdev_flow_flush(data->netdev); /* flush offloaded rules. */
> >>>          netdev_close(data->netdev); /* unref and possibly close */
> >>>          hmap_remove(&port_to_netdev, &data->portno_node);
> >>>          hmap_remove(&ifindex_to_port, &data->ifindex_node);
> >>>
> >>
> > Hi Roi
> > Thanks for your review. The case is reproduced as below:
> > # ovs-appctl dpctl/show
> > system at ovs-system:
> >   lookups: hit:0 missed:0 lost:0
> >   flows: 0
> >   masks: hit:0 total:0 hit/pkt:0.00
> >   port 0: ovs-system (internal)
> >   port 1: br-int (internal)
> >   port 2: enp130s0f0_0
> >   port 3: enp130s0f0_1
> >   port 4: vxlan_sys_4789 (vxlan: packet_type=ptap)
> >
> > # cat /tmp/tmp-ovs-debug.sh
> > set -x
> > ovs-appctl dpctl/add-flow
> > 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> > 3
> > ovs-appctl dpctl/dump-flows
> > ovs-vsctl del-port br-int enp130s0f0_0
> > tc filter show dev enp130s0f0_0 ingress
> > ovs-appctl dpctl/dump-flows
> >
> > # sh /tmp/tmp-ovs-debug.sh
> > + ovs-appctl dpctl/add-flow
> > 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> > 3
> > + ovs-appctl dpctl/dump-flows
> > recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200),
> > packets:0, bytes:0, used:0.020s, actions:3
> > + ovs-vsctl del-port br-int enp130s0f0_0
> > + tc filter show dev enp130s0f0_0 ingress
> > filter protocol ip pref 2 flower chain 0
> > filter protocol ip pref 2 flower chain 0 handle 0x1
> >   dst_mac 00:11:22:33:44:66
> >   eth_type ipv4
> >   dst_ip 1.1.1.200
> >   in_hw in_hw_count 1
> > action order 1: mirred (Egress Redirect to device enp130s0f0_1) stolen
> > index 1 ref 1 bind 1
> > cookie 40ca20750c4add5a9cc1c880ccb3d0e8
> > no_percpu
> > used_hw_stats delayed
> >
> > + ovs-appctl dpctl/dump-flows
> >
> > The tc rules on port enp130s0f0_0 are not deleted, we can use the tc
> > command to show them.
> >
> >> it looks ok but I wonder if it's redundant?
> >> I tested and when I removed a port with tc rules on it,
> >> all tc rules got deleted using the flow api flow del,
> >> we get into netdev_tc_flow_del() for each flow.
> > I didn't find the netdev_tc_flow_del was invoked while the port was deleting.
> > The debug step is:
> > 1. run the /tmp/tmp-ovs-debug.sh
> > 2. and while gdb the ovs-vswitchd:
> > gdb /root/local/openvswitch-master-dpdk/sbin/ovs-vswitchd -ex 'b
> > netdev_tc_flow_del' -ex c --pid [ovs-vswitchd pid]
> >
> > the breakpoint was not triggered.
> >> So you shouldn't have a tc rule left or ufid mapping.
> > As I debug, the  netdev_tc_flow_del was not invoked, and the ufids of
> > flow still are in the mapping.
> > When we add the port back and the install the rules again, the new
> > rules(e.g. tc handle 10) may be deleted by the later new rules(which
> > ufid mapping not
> > deleted and the handle may be 10), via  del_filter_and_ufid_mapping.
> >> How did you test that you saw tc rules were still on the port?
> > Yes, I test it using the tc command.
>
> Hi.
>
> {add/del/mod}-flow dpctl commands are for *debugging purposes only*.  And this
> is clearly stated in documentation:
>
> "DATAPATH FLOW TABLE DEBUGGING COMMANDS
>  ...
>  Do not use commands to add or remove or modify datapath flows if ovs-vswitchd
>  is running because it interferes with ovs-vswitchd's own datapath flow
>  management.  Use ovs-ofctl(8), instead, to work with OpenFlow flow entries."
>
> As you're injecting flows directly to datapath you can't expect higher layers
> to handle them correctly.  The issue here is that you're using OVS in a way
> it's not intended to be used.  Please, configure OVS rules via OpenFlow or
> be sure that you're removing flows by hands from the datapath as you're adding
> them.
Hi
Thanks for your review. I test it with openflow, It works fine because
the *revalidator
will clean up the flows(which may be offloaded).
> In case above scenario leaves ufids in OVS that cannot be removed, I'd suggest
> reverting of the previous patch that allowed offloading via add-flow since this
> is effectively a memory leak and even debugging commands should not produce
> a memory leak.
yes, we can revert the previous patch. But I think the command is
useful for us to
debug offload(rte flow offload and tc offload), and test the offload function.
For this issue, can we use this little patch to fix it ? or other better way ?
> ---
>
> BTW, IMHO, it's a very confusing design that we're installing rules directly to
> tc.  The issue here is that we have to maintain list of ports we offloaded to
> with ufids completely in userspace and we can not re-create these mappings after
> OVS restart like we do in case of kernel datapath.  The main issue is that we
> do not know which ports are ours.  Datapath is actually split between kernel
> and userspace with all the metadata in ovs-vswitchd and flows in tc in kernel.
> This leads to big number of issues.  If we could install tc rules from the
> inside of the usual kernel datapath that might solve a lot of configuration issues.
> Might create some new, but anyway, I think, common architecture would be much
> more clear.
> For example, in userspace datapath we're installing flows to usual userspace
> flow table AND to offload provider. This allows us to manage flows in more
> natural way.  And even if we will lost some flows inside offload provider
> implementation we will still clean up all the hanging data while removing flows
> from the usual userspace datapath flow tables.  So, offloading is actually
> joined part of userspace datapath and not a side thing like tc for kernel datapath.
> So, what is this part about: kernel datapath has persistent ports and flows
> (persistent in terms of ovs-vswitchd restarts), but tc offloading has persistent
> flows and non-persistent ports (i.e. list of ports we're affloading to) and
> non-persistent flows metadata (ufids of flows we offlaoded).  That introduces
> a lot of complications.
>
>
> Best regards, Ilya Maximets.



-- 
Best regards, Tonghao


More information about the dev mailing list