[ovs-discuss] [ovs-dev] ovn-controller is taking 100% CPU all the time in one deployment

Numan Siddique nusiddiq at redhat.com
Sat Aug 31 06:59:57 UTC 2019


On Sat, Aug 31, 2019 at 2:05 AM Han Zhou <zhouhan at gmail.com> wrote:

>
>
> On Fri, Aug 30, 2019 at 1:25 PM Numan Siddique <nusiddiq at redhat.com>
> wrote:
> >
> > Hi Han,
> >
> > I am thinking of this approach to solve this problem. I still need to
> test it.
> > If you have any comments or concerns do let me know.
> >
> >
> > **************************************************
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index 9a2222282..a83b56362 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -6552,6 +6552,41 @@ build_lrouter_flows(struct hmap *datapaths,
> struct hmap *ports,
> >
> >          }
> >
> > +        /* Handle GARP reply packets received on a distributed router
> gateway
> > +         * port. GARP reply broadcast packets could be sent by external
> > +         * switches. We don't want them to be handled by all the
> > +         * ovn-controllers if they receive it. So add a priority-92
> flow to
> > +         * apply the put_arp action on a redirect chassis and drop it on
> > +         * other chassis.
> > +         * Note that we are already adding a priority-90 logical flow
> in the
> > +         * table S_ROUTER_IN_IP_INPUT to apply the put_arp action if
> > +         * arp.op == 2.
> > +         * */
> > +        if (op->od->l3dgw_port && op == op->od->l3dgw_port
> > +                && op->od->l3redirect_port) {
> > +            for (int i = 0; i < op->lrp_networks.n_ipv4_addrs; i++) {
> > +                ds_clear(&match);
> > +                ds_put_format(&match,
> > +                              "inport == %s && is_chassis_resident(%s)
> && "
> > +                              "eth.bcast && arp.op == 2 && arp.spa ==
> %s/%u",
> > +                              op->json_key,
> op->od->l3redirect_port->json_key,
> > +                              op->lrp_networks.ipv4_addrs[i].network_s,
> > +                              op->lrp_networks.ipv4_addrs[i].plen);
> > +                ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 92,
> > +                              ds_cstr(&match),
> > +                              "put_arp(inport, arp.spa, arp.sha);");
> > +                ds_clear(&match);
> > +                ds_put_format(&match,
> > +                              "inport == %s && !is_chassis_resident(%s)
> && "
> > +                              "eth.bcast && arp.op == 2 && arp.spa ==
> %s/%u",
> > +                              op->json_key,
> op->od->l3redirect_port->json_key,
> > +                              op->lrp_networks.ipv4_addrs[i].network_s,
> > +                              op->lrp_networks.ipv4_addrs[i].plen);
> > +                ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 92,
> > +                              ds_cstr(&match), "drop;");
> > +            }
> > +        }
> > +
> >          /* A set to hold all load-balancer vips that need ARP
> responses. */
> >          struct sset all_ips = SSET_INITIALIZER(&all_ips);
> >          int addr_family;
> > *************************************************
> >
> > If a physical switch sends GARP request packets we have existing logical
> flows
> > which handle them only on the gateway chassis.
> >
> > But if the physical switch sends GARP reply packets, then these packets
> > are handled by ovn-controllers where bridge mappings are configured.
> > I think its good enough if the gateway chassis handles these packet.
> >
> > In the deployment where we are seeing this issue, the physical switch
> sends GARP reply
> > packets.
> >
> > Thanks
> > Numan
> >
> >
> Hi Numan,
>
> I think both GARP request and reply should be handled on all chassises. It
> should work not only for physical switch, but also for virtual workloads.
> At least our current use cases relies on that.
>

I think you might have misunderstood what I am trying to say. May be I
didn't state properly.
Let me give an example.

Suppose we have a below logical switches and router

*******
switch dd80005a-a638-4c41-b5fc-fffc97722f38 (sw1)
    port sw1-port2
        addresses: ["40:54:00:00:00:04 20.0.0.4"]
    port sw1-port1
        addresses: ["40:54:00:00:00:03 20.0.0.3"]
    port sw1-lr0
        type: router
        addresses: ["00:00:00:00:ff:02"]
        router-port: lr-sw1
switch 8e23a4da-a269-4a46-8088-411b5e6371a5 (public)
    port ln-public
        type: localnet
        addresses: ["unknown"]
    port public-lr0
        type: router
        router-port: lr0-public
switch 231d1c57-0540-4584-9a37-28d8eb227ba3 (sw0)
    port sw0-port1
        addresses: ["50:54:00:00:00:03 10.0.0.3"]
    port sw0-lr0
        type: router
        addresses: ["00:00:00:00:ff:01"]
        router-port: lr0-sw0
    port sw0-port2
        addresses: ["50:54:00:00:00:04 10.0.0.4"]
router 46dbf486-5540-42ab-8d01-ed5af90b79f6 (lr0)
    port lr0-sw0
        mac: "00:00:00:00:ff:01"
        networks: ["10.0.0.1/24"]
    port lr0-public
        mac: "00:00:20:20:12:13"
        networks: ["172.168.0.100/24"]
        gateway chassis: [chassis-1]
    port lr0-sw1
        mac: "00:00:00:00:ff:02"
        networks: ["21.0.0.1/24"]
*********

sw0 and sw1 are private geneve logical switches and 'public' logical switch
is a provider network.

Below are the logical flows related to put_arp  added by ovn-northd in the
router ingress pipeline

******
 Flow 1:  table=1 (lr_in_ip_input     ), priority=90   , match=(arp.op ==
2), action=(put_arp(inport, arp.spa, arp.sha);)

 Flow 2:  table=1 (lr_in_ip_input     ), priority=90   , match=(inport ==
"lr0-public" && arp.spa == 172.168.0.0/24 && arp.tpa == 172.168.0.100 &&
arp.op == 1 && is_chassis_resident("cr-lr0-public")),
action=(put_arp(inport, arp.spa, arp.sha); eth.dst = eth.src; eth.src =
00:00:20:20:12:13; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha =
00:00:20:20:12:13; arp.tpa = arp.spa; arp.spa = 172.168.0.100; outport =
"lr0-public"; flags.loopback = 1; output;)

 Flow 3:  table=1 (lr_in_ip_input     ), priority=90   , match=(inport ==
"lr0-sw0" && arp.spa == 10.0.0.0/24 && arp.tpa == 10.0.0.1 && arp.op == 1),
action=(put_arp(inport, arp.spa, arp.sha); eth.dst = eth.src; eth.src =
00:00:00:00:ff:01; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha =
00:00:00:00:ff:01; arp.tpa = arp.spa; arp.spa = 10.0.0.1; outport =
"lr0-sw0"; flags.loopback = 1; output;)

 Flow 4: table=1 (lr_in_ip_input     ), priority=90   , match=(inport ==
"lr0-sw1" && arp.spa == 21.0.0.0/24 && arp.tpa == 21.0.0.1 && arp.op == 1),
action=(put_arp(inport, arp.spa, arp.sha); eth.dst = eth.src; eth.src =
00:00:00:00:ff:02; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha =
00:00:00:00:ff:02; arp.tpa = arp.spa; arp.spa = 21.0.0.1; outport =
"lr0-sw1"; flags.loopback = 1; output;)

 Flow  5: table=1 (lr_in_ip_input     ), priority=80   , match=(inport ==
"lr0-public" && arp.spa == 172.168.0.0/24 && arp.op == 1 &&
is_chassis_resident("cr-lr0-public")), action=(put_arp(inport, arp.spa,
arp.sha);)

 Flow 6 table=1 (lr_in_ip_input     ), priority=80   , match=(inport ==
"lr0-sw0" && arp.spa == 10.0.0.0/24 && arp.op == 1),
action=(put_arp(inport, arp.spa, arp.sha);)

 Flow 7: table=1 (lr_in_ip_input     ), priority=80   , match=(inport ==
"lr0-sw1" && arp.spa == 21.0.0.0/24 && arp.op == 1),
action=(put_arp(inport, arp.spa, arp.sha);)
*******

If a physical switch sends a Gratuitous ARP request packet (with arp.op ==
1), then it is handled only on the gateway chassis (Flow 5 above)

I am trying to do the same for Gratuitous ARP reply packets (with arp.op
==  2 and eth.dst == ff:ff:ff:ff:ff:ff with arp.spa == arp.tpa and arp.sha
== arp.tha) so that these packets are handled only on the gateway chassis.

So with the proposed approach the logical flows will look like

******************
New Flow 1:  table=1 (lr_in_ip_input     ), priority=92   , match=(inport
== "lr0-public" && !is_chassis_resident("cr-lr0-public") && eth.bcast &&
arp.op == 2 && arp.spa == 172.168.0.0/24), action=(drop;)

New Flow 2:  table=1 (lr_in_ip_input     ), priority=92   , match=(inport
== "lr0-public" && is_chassis_resident("cr-lr0-public") && eth.bcast &&
arp.op == 2 && arp.spa == 172.168.0.0/24), action=(put_arp(inport, arp.spa,
arp.sha);)

Flow 3:  table=1 (lr_in_ip_input     ), priority=90   , match=(arp.op ==
2), action=(put_arp(inport, arp.spa, arp.sha);)

Flow 4:  table=1 (lr_in_ip_input     ), priority=90   , match=(inport ==
"lr0-public" && arp.spa == 172.168.0.0/24 && arp.tpa == 172.168.0.100 &&
arp.op == 1 && is_chassis_resident("cr-lr0-public")),
action=(put_arp(inport, arp.spa, arp.sha); eth.dst = eth.src; eth.src =
00:00:20:20:12:13; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha =
00:00:20:20:12:13; arp.tpa = arp.spa; arp.spa = 172.168.0.100; outport =
"lr0-public"; flags.loopback = 1; output;)

 Flow 5:  table=1 (lr_in_ip_input     ), priority=90   , match=(inport ==
"lr0-sw0" && arp.spa == 10.0.0.0/24 && arp.tpa == 10.0.0.1 && arp.op == 1),
action=(put_arp(inport, arp.spa, arp.sha); eth.dst = eth.src; eth.src =
00:00:00:00:ff:01; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha =
00:00:00:00:ff:01; arp.tpa = arp.spa; arp.spa = 10.0.0.1; outport =
"lr0-sw0"; flags.loopback = 1; output;)

  table=1 (lr_in_ip_input     ), priority=90   , match=(inport == "lr0-sw1"
&& arp.spa == 21.0.0.0/24 && arp.tpa == 21.0.0.1 && arp.op == 1),
action=(put_arp(inport, arp.spa, arp.sha); eth.dst = eth.src; eth.src =
00:00:00:00:ff:02; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha =
00:00:00:00:ff:02; arp.tpa = arp.spa; arp.spa = 21.0.0.1; outport =
"lr0-sw1"; flags.loopback = 1; output;)

  table=1 (lr_in_ip_input     ), priority=80   , match=(inport ==
"lr0-public" && arp.spa == 172.168.0.0/24 && arp.op == 1 &&
is_chassis_resident("cr-lr0-public")), action=(put_arp(inport, arp.spa,
arp.sha);)

  table=1 (lr_in_ip_input     ), priority=80   , match=(inport == "lr0-sw0"
&& arp.spa == 10.0.0.0/24 && arp.op == 1), action=(put_arp(inport, arp.spa,
arp.sha);)

  table=1 (lr_in_ip_input     ), priority=80   , match=(inport == "lr0-sw1"
&& arp.spa == 21.0.0.0/24 && arp.op == 1), action=(put_arp(inport, arp.spa,
arp.sha);)
*******************

The first 2 flows will apply only for gratuitous ARP reply packets for the
CIDR of the public network. On the non gateway chassis these packet will be
dropped
and on gateway chassis, put_arp action will be applied. The Flow 3 will
take care for the ARP reply packets from virtual networks without any
change.

Also it doesn't make much sense for all the chassises to process these GARP
reply packets as eventually all of them will discard
these packets because the mac_binding value would not have changed.

Is this approach breaking your existing use case ? If so, can you please
explain how ?

Thanks
Numan


> Thanks,
> Han
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20190831/068c6ea5/attachment-0001.html>


More information about the discuss mailing list