[ovs-dev] [PATCH ovs-dev v1] netdev-offload: Flush offloaded rules when ports removed

Simon Horman simon.horman at netronome.com
Thu Jun 18 11:18:13 UTC 2020


On Tue, Jun 16, 2020 at 04:25:21PM +0800, Tonghao Zhang wrote:
> On Mon, Jun 15, 2020 at 7:13 PM Ilya Maximets <i.maximets at ovn.org> wrote:
> >
> > On 6/14/20 1:40 PM, Tonghao Zhang wrote:
> > > On Sun, Jun 14, 2020 at 1:31 PM Roi Dayan <roid at mellanox.com> wrote:
> > >>
> > >>
> > >>
> > >> On 2020-06-11 1:36 PM, xiangxia.m.yue at gmail.com wrote:
> > >>> From: Tonghao Zhang <xiangxia.m.yue at gmail.com>
> > >>>
> > >>> When ports were removed from bridge, we should flush the
> > >>> offload rules on the ports. The main reason is two factors:
> > >>>
> > >>> * The ports removed from bridge, will be managed by OvS, so
> > >>>   flush the rules which installed by OvS.
> > >>> * If using the TC flower offload, for example, tc rules still
> > >>>   are on the ports. And the information still are maintained by
> > >>>   OvS, such as the mapping for tc and ufid.
> > >>>   Then if adding the port to bridge and installing the rules
> > >>>   to it again, *del_filter_and_ufid_mapping will be invoked,
> > >>>   and delete the tc rule using tc handle which may not exist
> > >>>   (offload init api flushed them.) on kernel or is used by other
> > >>>   previous rules (if so, that rules will be deleted that is not
> > >>>   we expected.).
> > >>>
> > >>> Cc: Simon Horman <simon.horman at netronome.com>
> > >>> Cc: Paul Blakey <paulb at mellanox.com>
> > >>> Cc: Roi Dayan <roid at mellanox.com>
> > >>> Cc: Ben Pfaff <blp at ovn.org>
> > >>> Cc: William Tu <u9012063 at gmail.com>
> > >>> Cc: Ilya Maximets <i.maximets at ovn.org>
> > >>> Tested-at: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.com%2Fgithub%2Fovn-open-virtual-networks%2Fovs%2Fbuilds%2F170832624&data=02%7C01%7Croid%40mellanox.com%7Cf0bc38aa88074693105108d80df36979%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C637274686378147436&sdata=Dlq4jnSclj0vXQoldO7Oq2Wwx%2BGEHjK%2FftPUDHnJcKM%3D&reserved=0
> > >>> Co-authored-by: Wengang Hou <houwengang at didiglobal.com>
> > >>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue at gmail.com>
> > >>> ---
> > >>>  lib/netdev-offload.c | 1 +
> > >>>  1 file changed, 1 insertion(+)
> > >>>
> > >>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> > >>> index ab97a292ebac..964566caab1e 100644
> > >>> --- a/lib/netdev-offload.c
> > >>> +++ b/lib/netdev-offload.c
> > >>> @@ -593,6 +593,7 @@ netdev_ports_remove(odp_port_t port_no, const struct dpif_class *dpif_class)
> > >>>      data = netdev_ports_lookup(port_no, dpif_class);
> > >>>      if (data) {
> > >>>          dpif_port_destroy(&data->dpif_port);
> > >>> +        netdev_flow_flush(data->netdev); /* flush offloaded rules. */
> > >>>          netdev_close(data->netdev); /* unref and possibly close */
> > >>>          hmap_remove(&port_to_netdev, &data->portno_node);
> > >>>          hmap_remove(&ifindex_to_port, &data->ifindex_node);
> > >>>
> > >>
> > > Hi Roi
> > > Thanks for your review. The case is reproduced as below:
> > > # ovs-appctl dpctl/show
> > > system at ovs-system:
> > >   lookups: hit:0 missed:0 lost:0
> > >   flows: 0
> > >   masks: hit:0 total:0 hit/pkt:0.00
> > >   port 0: ovs-system (internal)
> > >   port 1: br-int (internal)
> > >   port 2: enp130s0f0_0
> > >   port 3: enp130s0f0_1
> > >   port 4: vxlan_sys_4789 (vxlan: packet_type=ptap)
> > >
> > > # cat /tmp/tmp-ovs-debug.sh
> > > set -x
> > > ovs-appctl dpctl/add-flow
> > > 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> > > 3
> > > ovs-appctl dpctl/dump-flows
> > > ovs-vsctl del-port br-int enp130s0f0_0
> > > tc filter show dev enp130s0f0_0 ingress
> > > ovs-appctl dpctl/dump-flows
> > >
> > > # sh /tmp/tmp-ovs-debug.sh
> > > + ovs-appctl dpctl/add-flow
> > > 'recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200)'
> > > 3
> > > + ovs-appctl dpctl/dump-flows
> > > recirc_id(0),in_port(2),eth(dst=00:11:22:33:44:66),eth_type(0x0800),ipv4(dst=1.1.1.200),
> > > packets:0, bytes:0, used:0.020s, actions:3
> > > + ovs-vsctl del-port br-int enp130s0f0_0
> > > + tc filter show dev enp130s0f0_0 ingress
> > > filter protocol ip pref 2 flower chain 0
> > > filter protocol ip pref 2 flower chain 0 handle 0x1
> > >   dst_mac 00:11:22:33:44:66
> > >   eth_type ipv4
> > >   dst_ip 1.1.1.200
> > >   in_hw in_hw_count 1
> > > action order 1: mirred (Egress Redirect to device enp130s0f0_1) stolen
> > > index 1 ref 1 bind 1
> > > cookie 40ca20750c4add5a9cc1c880ccb3d0e8
> > > no_percpu
> > > used_hw_stats delayed
> > >
> > > + ovs-appctl dpctl/dump-flows
> > >
> > > The tc rules on port enp130s0f0_0 are not deleted, we can use the tc
> > > command to show them.
> > >
> > >> it looks ok but I wonder if it's redundant?
> > >> I tested and when I removed a port with tc rules on it,
> > >> all tc rules got deleted using the flow api flow del,
> > >> we get into netdev_tc_flow_del() for each flow.
> > > I didn't find the netdev_tc_flow_del was invoked while the port was deleting.
> > > The debug step is:
> > > 1. run the /tmp/tmp-ovs-debug.sh
> > > 2. and while gdb the ovs-vswitchd:
> > > gdb /root/local/openvswitch-master-dpdk/sbin/ovs-vswitchd -ex 'b
> > > netdev_tc_flow_del' -ex c --pid [ovs-vswitchd pid]
> > >
> > > the breakpoint was not triggered.
> > >> So you shouldn't have a tc rule left or ufid mapping.
> > > As I debug, the  netdev_tc_flow_del was not invoked, and the ufids of
> > > flow still are in the mapping.
> > > When we add the port back and the install the rules again, the new
> > > rules(e.g. tc handle 10) may be deleted by the later new rules(which
> > > ufid mapping not
> > > deleted and the handle may be 10), via  del_filter_and_ufid_mapping.
> > >> How did you test that you saw tc rules were still on the port?
> > > Yes, I test it using the tc command.
> >
> > Hi.
> >
> > {add/del/mod}-flow dpctl commands are for *debugging purposes only*.  And this
> > is clearly stated in documentation:
> >
> > "DATAPATH FLOW TABLE DEBUGGING COMMANDS
> >  ...
> >  Do not use commands to add or remove or modify datapath flows if ovs-vswitchd
> >  is running because it interferes with ovs-vswitchd's own datapath flow
> >  management.  Use ovs-ofctl(8), instead, to work with OpenFlow flow entries."
> >
> > As you're injecting flows directly to datapath you can't expect higher layers
> > to handle them correctly.  The issue here is that you're using OVS in a way
> > it's not intended to be used.  Please, configure OVS rules via OpenFlow or
> > be sure that you're removing flows by hands from the datapath as you're adding
> > them.
> Hi
> Thanks for your review. I test it with openflow, It works fine because
> the *revalidator
> will clean up the flows(which may be offloaded).
> > In case above scenario leaves ufids in OVS that cannot be removed, I'd suggest
> > reverting of the previous patch that allowed offloading via add-flow since this
> > is effectively a memory leak and even debugging commands should not produce
> > a memory leak.
> yes, we can revert the previous patch. But I think the command is
> useful for us to
> debug offload(rte flow offload and tc offload), and test the offload function.
> For this issue, can we use this little patch to fix it ? or other better way ?

I am confused. Which little patch?

> > ---
> >
> > BTW, IMHO, it's a very confusing design that we're installing rules directly to
> > tc.  The issue here is that we have to maintain list of ports we offloaded to
> > with ufids completely in userspace and we can not re-create these mappings after
> > OVS restart like we do in case of kernel datapath.  The main issue is that we
> > do not know which ports are ours.  Datapath is actually split between kernel
> > and userspace with all the metadata in ovs-vswitchd and flows in tc in kernel.
> > This leads to big number of issues.  If we could install tc rules from the
> > inside of the usual kernel datapath that might solve a lot of configuration issues.
> > Might create some new, but anyway, I think, common architecture would be much
> > more clear.
> > For example, in userspace datapath we're installing flows to usual userspace
> > flow table AND to offload provider. This allows us to manage flows in more
> > natural way.  And even if we will lost some flows inside offload provider
> > implementation we will still clean up all the hanging data while removing flows
> > from the usual userspace datapath flow tables.  So, offloading is actually
> > joined part of userspace datapath and not a side thing like tc for kernel datapath.
> > So, what is this part about: kernel datapath has persistent ports and flows
> > (persistent in terms of ovs-vswitchd restarts), but tc offloading has persistent
> > flows and non-persistent ports (i.e. list of ports we're affloading to) and
> > non-persistent flows metadata (ufids of flows we offlaoded).  That introduces
> > a lot of complications.
> >
> >
> > Best regards, Ilya Maximets.
> 
> 
> 
> -- 
> Best regards, Tonghao


More information about the dev mailing list