[ovs-discuss] OVN /OVS openvswitch: ovs-system: deferred action limit reached, drop recirc action

Ammad Syed syedammad83 at gmail.com
Tue Nov 2 06:59:02 UTC 2021


Hi,

I just upgraded by ovn and ovs to the latest releases i.e ovn 21.09 and ovs
2.16.0. Still getting the same messages in my dmesg logs.

The issue can be reproduced by below steps.

- Add neutron router
- Set its external gateway.
- Add a local network subnet with a router. In my case geneve is a tenant
network and vlan is provider external network.
- Now try to access SNAT public / external IP that is assigned to the
router via any means (you can just put that IP in your web browser and
enter) you will see below logs in dmesg.
- The logs can only be seen on external gateway chassis.

[Tue Nov  2 11:48:12 2021] openvswitch: ovs-system: deferred action limit
reached, drop recirc action
[Tue Nov  2 11:48:19 2021] openvswitch: ovs-system: deferred action limit
reached, drop recirc action
[Tue Nov  2 11:48:39 2021] openvswitch: ovs-system: deferred action limit
reached, drop recirc action

- Ammad


On Thu, Sep 9, 2021 at 7:25 PM Odintsov Vladislav <VlOdintsov at croc.ru>
wrote:

> Hi Han,
>
> I’ll try answer first question to move this discussion forward.
>
> Next is the output of the ovs-appctl ofproto/trace <flow> | ovn-detrace
> for my topology.
> There is a part of last stages of lr egress pipeline and jump to lr
> ingress.
> The full output is in attachment.
> Hope this can help.
>
>
> 25. metadata=0x4, priority 0, cookie 0xb4d0917
> resubmit(,26)
>   *  Logical datapaths:
>   *      "lr0-edge" (c55eb989-eda9-47b9-8b34-e898dc1c6be2) [ingress]
>   *      "lr0" (f0acad28-1531-4c32-98f1-6e95c528c2a5) [ingress]
>   *  Logical flow: table=17 (lr_in_larger_pkts), priority=0, match=(1),
> actions=(next;)
> 26. reg15=0x1,metadata=0x4, priority 50, cookie 0x1d634149
> set_field:0x2->reg15
> resubmit(,27)
>   *  Logical datapaths:
>   *      "lr0-edge" (c55eb989-eda9-47b9-8b34-e898dc1c6be2) [ingress]
>   *  Logical flow: table=18 (lr_in_gw_redirect), priority=50,
> match=(outport == "lr0-wan), actions=(outport = "cr-lr0-wan"; next;)
>    *  Logical Router Port: lr0-wan mac 0e:01:aa:29:41:03 networks ['
> 172.16.0.1/32'] ipv6_ra_configs {}
> 27. metadata=0x4, priority 0, cookie 0x433abe7d
> resubmit(,37)
>   *  Logical datapaths:
>   *      "lr0-edge" (c55eb989-eda9-47b9-8b34-e898dc1c6be2) [ingress]
>   *      "lr0" (f0acad28-1531-4c32-98f1-6e95c528c2a5) [ingress]
>   *  Logical flow: table=19 (lr_in_arp_request), priority=0, match=(1),
> actions=(output;)
> 37. priority 0
> resubmit(,38)
> 38. reg15=0x2,metadata=0x4, priority 100, cookie 0xf7faafb5
> set_field:0x1->reg15
> set_field:0x9->reg11
> set_field:0xb->reg12
> resubmit(,39)
>   *  Logical datapath: "lr0-edge" (c55eb989-eda9-47b9-8b34-e898dc1c6be2)
>   *  Port Binding: logical_port "cr-lr0-wan", tunnel_key 2, chassis-name
> "ai10", chassis-str "ai10.ai315t.int.c2.croc.ru"
> 39. priority 0
> set_field:0->reg0
> set_field:0->reg1
> set_field:0->reg2
> set_field:0->reg3
> set_field:0->reg4
> set_field:0->reg5
> set_field:0->reg6
> set_field:0->reg7
> set_field:0->reg8
> set_field:0->reg9
> resubmit(,40)
> 40. ip,metadata=0x4, priority 50, cookie 0x851809e6
> set_field:0x1/0x1->reg10
> ct(table=41,zone=NXM_NX_REG11[0..15],nat)
> nat
> -> A clone of the packet is forked to recirculate. The forked pipeline
> will be resumed at table 41.
> -> Sets the packet to an untracked state, and clears all the conntrack
> fields.
>   *  Logical datapaths:
>   *      "lr0-edge" (c55eb989-eda9-47b9-8b34-e898dc1c6be2) [egress]
>   *  Logical flow: table=0 (lr_out_undnat), priority=50, match=(ip),
> actions=(flags.loopback = 1; ct_dnat;)
>
> Final flow:
> recirc_id=0x3223,eth,tcp,reg10=0x1,reg11=0x9,reg12=0xb,reg14=0x1,reg15=0x1,metadata=0x4,in_port=132,vlan_tci=0x0000,dl_src=0e:01:aa:29:41:03,dl_dst=0e:01:aa:29:41:03,nw_src=10.0.0.21,nw_dst=172.16.0.1,nw_tos=0,nw_ecn=0,nw_ttl=56,tp_src=0,tp_dst=22,tcp_flags=0
> Megaflow:
> recirc_id=0x3223,ct_state=+new-est-rel-rpl-inv+trk,ct_label=0/0x1,eth,ip,in_port=132,dl_src=84:3d:c6:da:5f:ff,dl_dst=0e:01:aa:29:41:03,nw_src=
> 0.0.0.0/1,nw_dst=172.16.0.1,nw_ttl=57,nw_frag=no
> Datapath actions:
> set(eth(src=0e:01:aa:29:41:03)),set(ipv4(ttl=56)),ct(zone=9,nat),recirc(0x329c)
>
>
> ===============================================================================
> recirc(0x329c) - resume conntrack with default ct_state=trk|new (use
> --ct-next to customize)
> Replacing src/dst IP/ports to simulate NAT:
> Initial flow:
> Modified flow:
>
> ===============================================================================
>
> Flow:
> recirc_id=0x329c,ct_state=new|trk,ct_zone=9,eth,tcp,reg10=0x1,reg11=0x9,reg12=0xb,reg14=0x1,reg15=0x1,metadata=0x4,in_port=132,vlan_tci=0x0000,dl_src=0e:01:aa:29:41:03,dl_dst=0e:01:aa:29:41:03,nw_src=10.0.0.21,nw_dst=172.16.0.1,nw_tos=0,nw_ecn=0,nw_ttl=56,tp_src=0,tp_dst=22,tcp_flags=0
>
> bridge("internet")
> ------------------
> thaw
> Resuming from table 41
> 41. ct_state=+new+trk,ip,metadata=0x4, priority 50, cookie 0xc26ec65c
> ct(commit,zone=NXM_NX_REG11[0..15],nat(src))
> nat(src)
> -> Sets the packet to an untracked state, and clears all the conntrack
> fields.
> resubmit(,42)
>   *  Logical datapaths:
>   *      "lr0-edge" (c55eb989-eda9-47b9-8b34-e898dc1c6be2) [egress]
>   *  Logical flow: table=1 (lr_out_post_undnat), priority=50, match=(ip &&
> ct.new), actions=(ct_commit { } ; next; )
> 42. metadata=0x4, priority 0, cookie 0xa404a92a
> resubmit(,43)
>   *  Logical datapaths:
>   *      "lr0-edge" (c55eb989-eda9-47b9-8b34-e898dc1c6be2) [egress]
>   *      "lr0" (f0acad28-1531-4c32-98f1-6e95c528c2a5) [egress]
>   *  Logical flow: table=2 (lr_out_snat), priority=0, match=(1),
> actions=(next;)
> 43. ip,reg15=0x1,metadata=0x4,nw_dst=172.16.0.1, priority 100, cookie
> 0x48dfa8d3
>
> clone(ct_clear,move:NXM_NX_REG15[]->NXM_NX_REG14[],set_field:0->reg15,push:NXM_OF_ETH_SRC[],push:NXM_OF_ETH_DST[],pop:NXM_OF_ETH_SRC[],pop:NXM_OF_ETH_DST[],set_field:0->reg10,set_field:0x1/0x1->reg10,set_field:0/0xffffffff000000000000000000000000->xxreg0,set_field:0/0xffffffff0000000000000000->xxreg0,set_field:0/0xffffffff00000000->xxreg0,set_field:0/0xffffffff->xxreg0,set_field:0/0xffffffff000000000000000000000000->xxreg1,set_field:0/0xffffffff0000000000000000->xxreg1,set_field:0/0xffffffff00000000->xxreg1,set_field:0/0xffffffff->xxreg1,set_field:0/0xffffffff00000000->xreg4,set_field:0/0xffffffff->xreg4,set_field:0x1/0x1->xreg4,resubmit(,8))
> ct_clear
> move:NXM_NX_REG15[]->NXM_NX_REG14[]
> -> NXM_NX_REG14[] is now 0x1
> set_field:0->reg15
> push:NXM_OF_ETH_SRC[]
> push:NXM_OF_ETH_DST[]
> pop:NXM_OF_ETH_SRC[]
> -> NXM_OF_ETH_SRC[] is now 0e:01:aa:29:41:03
> pop:NXM_OF_ETH_DST[]
> -> NXM_OF_ETH_DST[] is now 0e:01:aa:29:41:03
> set_field:0->reg10
> set_field:0x1/0x1->reg10
> set_field:0/0xffffffff000000000000000000000000->xxreg0
> set_field:0/0xffffffff0000000000000000->xxreg0
> set_field:0/0xffffffff00000000->xxreg0
> set_field:0/0xffffffff->xxreg0
> set_field:0/0xffffffff000000000000000000000000->xxreg1
> set_field:0/0xffffffff0000000000000000->xxreg1
> set_field:0/0xffffffff00000000->xxreg1
> set_field:0/0xffffffff->xxreg1
> set_field:0/0xffffffff00000000->xreg4
> set_field:0/0xffffffff->xreg4
> set_field:0x1/0x1->xreg4
> resubmit(,8)
>   *  Logical datapaths:
>   *      "lr0-edge" (c55eb989-eda9-47b9-8b34-e898dc1c6be2) [egress]
>   *  Logical flow: table=3 (lr_out_egr_loop), priority=100, match=(ip4.dst
> == 172.16.0.1 && outport == "lr0-wan" &&
> is_chassis_resident("cr-lr0-wan")), actions=(clone { ct_clear; inport =
> outport; outport = ""; eth.dst <-> eth.src; flags = 0; flags.loopback = 1;
> reg0 = 0; reg1 = 0; reg2 = 0; reg3 = 0; reg4 = 0; reg5 = 0; reg6 = 0; reg7
> = 0; reg8 = 0; reg9 = 0; reg9[0] = 1; next(pipeline=ingress, table=0); };)
>    *  NAT: external IP 172.16.0.1 external_mac [] logical_ip
> 192.168.0.0/16 logical_port [] type snat
> 8. reg14=0x1,metadata=0x4,dl_dst=0e:01:aa:29:41:03, priority 50, cookie
> 0xce77dac6
>
> set_field:0xe01aa2941030000000000000000/0xffffffffffff0000000000000000->xxreg0
> resubmit(,9)
>   *  Logical datapaths:
>   *      "lr0-edge" (c55eb989-eda9-47b9-8b34-e898dc1c6be2) [ingress]
>   *  Logical flow: table=0 (lr_in_admission), priority=50, match=(eth.dst
> == 0e:01:aa:29:41:03 && inport == "lr0-wan" &&
> is_chassis_resident("cr-lr0-wan")), actions=(xreg0[0..47] =
> 0e:01:aa:29:41:03; next;)
>    *  Logical Router Port: lr0-wan mac 0e:01:aa:29:41:03 networks ['
> 172.16.0.1/32'] ipv6_ra_configs {}
> 9. metadata=0x4, priority 0, cookie 0x27b6069b
> set_field:0x4/0x4->xreg4
> resubmit(,10)
>   *  Logical datapaths:
>   *      "lr0-edge" (c55eb989-eda9-47b9-8b34-e898dc1c6be2) [ingress]
>   *      "lr0" (f0acad28-1531-4c32-98f1-6e95c528c2a5) [ingress]
>   *  Logical flow: table=1 (lr_in_lookup_neighbor), priority=0, match=(1),
> actions=(reg9[2] = 1; next;)
> 10. reg9=0x4/0x4,metadata=0x4, priority 100, cookie 0xebcd30a8
> resubmit(,11)
>   *  Logical datapaths:
>   *      "lr0-edge" (c55eb989-eda9-47b9-8b34-e898dc1c6be2) [ingress]
>   *      "lr0" (f0acad28-1531-4c32-98f1-6e95c528c2a5) [ingress]
>   *  Logical flow: table=2 (lr_in_learn_neighbor), priority=100,
> match=(reg9[2] == 1), actions=(next;)
>
>
> Regards,
> Vladislav Odintsov
>
> On 4 Aug 2021, at 21:02, Han Zhou <hzhou at ovn.org> wrote:
>
>
>
> On Wed, Aug 4, 2021 at 6:41 AM Numan Siddique <numans at ovn.org> wrote:
> >
> > On Wed, Aug 4, 2021 at 4:17 AM Krzysztof Klimonda
> > <kklimonda at syntaxhighlighted.com> wrote:
> > >
> > > Hi Ammad,
> > >
> > > (Re-adding ovs-discuss at openvswitch.org to CC to keep track of the
> discussion)
> > >
> > > Thanks for testing it with SNAT enabled/disabled and verifying that it
> seems to be related.
> > >
> > > As for the impact of this bug I have to say I'm unsure. I have
> theorized that this could the cause for (or at least connected to) BFD
> sessions being dropped between gateway chassises, but I couldn't really
> validate it.
> > >
> > > My linked patch is pretty old and no longer applies cleanly on master,
> but I'd be interested in getting some feedback from developers on whether
> I'm even fixing the right thing.
> >
> > Hi Krzysztof,
> >
> > Your patch is in the "change requested" stage.  I see from the comment
> > that the ddlog part of the code is missing.
> >
> > Seems like a valid case to me.  The issue is seen when the packet is
> > destined to the router port IP right ?
> >
> > In the case of ovn-kubernetes, the router port IP is also used as a
> > load balancer backend IP.
> >
> > Will your patch have any impact if the logical router has this load
> > balancer configured ? (for the system test case you've added )
> >
> > ovn-nbctl lb-add lb1 172.16.1.254:90 192.168.1.100:90
> > ovn-nbctl lr-lb-add R1 lb1
> >
> > Can you please repost the patch for further review.  It would be great
> > if you can add ddlog code.  Or you can repost the patch
> > and the ddlog part can be added if the reviewers are fine with the patch.
> >
> > Thanks
> > Numan
> >
>
> Thanks Krzysztof, this is interesting. Could you share more on the root
> cause since you debugged it - how did the loop happen? When a packet
> destined to the SNAT IP hits the router ingress pipeline, what's the next
> hop? How the L2 dst is populated for the dst IP and how is the packet
> forwarded back to the router pipeline? How /32 IP (instead of a subnet) on
> the SNAT config made a difference?
>
> > >
> > > Regards,
> > > Krzysztof
> > >
> > > On Wed, Aug 4, 2021, at 09:02, Ammad Syed wrote:
> > > > I am able to reproduce this issue with snat enabled network and
> > > > accessing the snat IP from external network can reproduce this issue
> .
> > > > If I keep snat disable, then I didn't see these logs in syslog.
> > > >
> > > > Ammad
> > > >
> > > > On Tue, Aug 3, 2021 at 6:39 PM Ammad Syed <syedammad83 at gmail.com>
> wrote:
> > > > > Thanks. Let me try to reproduce it with this way.
> > > > >
> > > > > Can you please advise if this will cause any trouble if we have
> this bug in production? Any workaround to avoid this issue?
> > > > >
> > > > > Ammad
> > > > >
> > > > > On Tue, Aug 3, 2021 at 5:56 PM Krzysztof Klimonda <
> kklimonda at syntaxhighlighted.com> wrote:
> > > > >> Hi,
> > > > >>
> > > > >> To reproduce it (on openstack. although the issue does not seem
> to be openstack-specific) I've created a network with SNAT enabled (which
> is default) and set its external gateway to my external network. Next, I've
> tried establishing TCP session from the outside to IP address assigned to
> the router and checked dmesg on the chassis that the port is assigned to
> for "ovs-system: deferred action limit reached, drop recirc action"
> messages.
> > > > >>
> > > > >> Best Regards,
> > > > >> Krzysztof
> > > > >>
> > > > >> On Tue, Aug 3, 2021, at 09:05, Ammad Syed wrote:
> > > > >> > Hi Krzysztof,
> > > > >> >
> > > > >> > Yes I might be stuck in this issue. How can I check if there is
> any
> > > > >> > loop in lflow-list ?
> > > > >> >
> > > > >> > Ammad
> > > > >> >
> > > > >> > On Tue, Aug 3, 2021 at 2:14 AM Krzysztof Klimonda
> > > > >> > <kklimonda at syntaxhighlighted.com> wrote:
> > > > >> > > Hi,
> > > > >> > >
> > > > >> > > Not sure if it's related, but I've seen this bug in ovn 20.12
> release, where routing loop was related to flows created to handle SNAT,
> I've sent an RFC patch few months back but didn't really have time to
> follow up on it since then to get some feedback:
> https://www.mail-archive.com/ovs-dev@openvswitch.org/msg53195.html
> > > > >> > > I was planning on re-testing it with 21.06 release and follow
> up on the patch.
> > > > >> > >
> > > > >> > > On Mon, Aug 2, 2021, at 21:31, Han Zhou wrote:
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Mon, Aug 2, 2021 at 5:07 AM Ammad Syed <
> syedammad83 at gmail.com> wrote:
> > > > >> > > > >
> > > > >> > > > > Hello,
> > > > >> > > > >
> > > > >> > > > > I am using openstack with OVN 20.12 and OVS 2.15.0 on
> ubuntu 20.04. I am using geneve tenant network and vlan provider network.
> > > > >> > > > >
> > > > >> > > > > I am continuously getting below messages in my dmesg logs
> continuously on compute node 1 only the other two compute nodes have no
> such messages.
> > > > >> > > > >
> > > > >> > > > > [275612.826698] openvswitch: ovs-system: deferred action
> limit reached, drop recirc action
> > > > >> > > > > [275683.750343] openvswitch: ovs-system: deferred action
> limit reached, drop recirc action
> > > > >> > > > > [276102.200772] openvswitch: ovs-system: deferred action
> limit reached, drop recirc action
> > > > >> > > > > [276161.575494] openvswitch: ovs-system: deferred action
> limit reached, drop recirc action
> > > > >> > > > > [276210.262524] openvswitch: ovs-system: deferred action
> limit reached, drop recirc action
> > > > >> > > > >
> > > > >> > > > > I have tried by reinstalling (OS everything) compute node
> 1 but still having same errors.
> > > > >> > > > >
> > > > >> > > > > Need your advise.
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Regards,
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > Syed Ammad Ali
> > > > >> > > > > _______________________________________________
> > > > >> > > > > discuss mailing list
> > > > >> > > > > discuss at openvswitch.org
> > > > >> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > > > >> > > >
> > > > >> > > > Hi Syed,
> > > > >> > > >
> > > > >> > > > Could you check if you have routing loops (i.e. a packet
> being routed
> > > > >> > > > back and forth between logical routers infinitely) in your
> logical
> > > > >> > > > topology?
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Han
> > > > >> > > > _______________________________________________
> > > > >> > > > discuss mailing list
> > > > >> > > > discuss at openvswitch.org
> > > > >> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > >   Krzysztof Klimonda
> > > > >> > >   kklimonda at syntaxhighlighted.com
> > > > >> > > _______________________________________________
> > > > >> > > discuss mailing list
> > > > >> > > discuss at openvswitch.org
> > > > >> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Regards,
> > > > >> >
> > > > >> >
> > > > >> > Syed Ammad Ali
> > > > >>
> > > > >>
> > > > >> --
> > > > >>   Krzysztof Klimonda
> > > > >>   kklimonda at syntaxhighlighted.com
> > > > > --
> > > > > Regards,
> > > > >
> > > > >
> > > > > Syed Ammad Ali
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > >
> > > > Syed Ammad Ali
> > >
> > >
> > > --
> > >   Krzysztof Klimonda
> > >   kklimonda at syntaxhighlighted.com
> > > _______________________________________________
> > > discuss mailing list
> > > discuss at openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > >
> > _______________________________________________
> > discuss mailing list
> > discuss at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>


-- 
Regards,


Syed Ammad Ali
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20211102/7ec3b6b8/attachment-0001.html>


More information about the discuss mailing list