[ovs-dev] [PATCH v1] OVN: Fix the ovn-controller 100% usage issue with put_mac_bindings

Ankur Sharma ankur.sharma at nutanix.com
Fri May 17 00:28:45 UTC 2019


Hi Numan,

Excellent find. I missed the POP while iterating the hmap.
Submitted a V2, added relevant tags as well.

Please take a look.

Regards,
Ankur

From: Numan Siddique <nusiddiq at redhat.com>
Sent: Thursday, May 16, 2019 11:31 AM
To: Ankur Sharma <ankur.sharma at nutanix.com>
Cc: Han Zhou <zhouhan at gmail.com>; ovs-dev at openvswitch.org
Subject: Re: [ovs-dev] [PATCH v1] OVN: Fix the ovn-controller 100% usage issue with put_mac_bindings



On Thu, May 16, 2019 at 12:26 AM Numan Siddique <nusiddiq at redhat.com<mailto:nusiddiq at redhat.com>> wrote:


On Thu, May 16, 2019 at 12:20 AM Ankur Sharma <ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com>> wrote:
Hi Numan,

Just confirming, anything else pending from my side?
or given that patch has been Acked, hence someone will apply it to master?

I don't think you have anything pending unless the committer has any comments.

Thanks for the fix :)

Numan


Thanks

Regards,
Ankur

From: Numan Siddique <nusiddiq at redhat.com<mailto:nusiddiq at redhat.com>>
Sent: Wednesday, May 15, 2019 11:43 AM
To: Ankur Sharma <ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com>>
Cc: Han Zhou <zhouhan at gmail.com<mailto:zhouhan at gmail.com>>; ovs-dev at openvswitch.org<mailto:ovs-dev at openvswitch.org>
Subject: Re: [ovs-dev] [PATCH v1] OVN: Fix the ovn-controller 100% usage issue with put_mac_bindings

Gentle ping on this patch. This fix is a bit critical to fix the ovn-controller CPU usage.

Thanks
Numan

On Sat, May 11, 2019 at 2:34 PM Numan Siddique <nusiddiq at redhat.com<mailto:nusiddiq at redhat.com>> wrote:


On Sat, May 11, 2019 at 2:09 PM Numan Siddique <nusiddiq at redhat.com<mailto:nusiddiq at redhat.com>> wrote:


On Sat, May 11, 2019 at 5:38 AM Ankur Sharma <ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com>> wrote:
Hi Han,

Thanks for review.

Yup, looks like it, although I did not try the scenario myself, but yes entry are not getting removed once added,
hence, new mac bindings will not be added after some time (as existing count reaches 1000).

Regards,
Ankur

From: Han Zhou <zhouhan at gmail.com<mailto:zhouhan at gmail.com>>
Sent: Friday, May 10, 2019 4:47 PM
To: Ankur Sharma <ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com>>
Cc: ovs-dev at openvswitch.org<mailto:ovs-dev at openvswitch.org>
Subject: Re: [ovs-dev] [PATCH v1] OVN: Fix the ovn-controller 100% usage issue with put_mac_bindings



On Fri, May 10, 2019 at 2:46 PM Ankur Sharma <ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com><mailto:ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com>>> wrote:
>
> ISSUE:
> a. As soon as entries get added to put_mac_bindings in pinctrl.c,
>    ovn-controller CPU consumption reached 100%.
>
> b. This happens because in wait_put_mac_bindings,
>    if !hmap_is_empty(&put_mac_bindings) succeeds and
>    calls poll_immediat_wake().
>
>    ovn-controller.log:
>    "2019-05-10T19:43:28.035Z|00035|poll_loop|INFO|wakeup due to
>     0-ms timeout at ovn/controller/pinctrl.c:2520 (99% CPU usage)"
>
> ROOT_CAUSE:
> a. Earlier it used to work fine, because in run_put_mac_bindings
>    all the bindings in put_mac_bindings would be flushed.
>
> b. Looks like, as a part of adding a new thread in pinctrl, this
>    line got replaced with calling buffer_put_mac_bindings.
>
>     "
>     .
>     .
>     .
>        /* Move the mac bindings from 'put_mac_bindings' hmap to
>      * 'buffered_mac_bindings' and notify the pinctrl_handler.
>      * pinctrl_handler will reinject the buffered packets. */
>     if (!hmap_is_empty(&put_mac_bindings)) {
>         buffer_put_mac_bindings();
>         notify_pinctrl_handler();
>     }
>     "
>
> c. Because of which put_mac_bindings is never emptied and wait_put_mac_bindings
>    ends up doing poll_immediate_wake.

Hi Ankur,

I was very curious why this 100% CPU issue came now and why we didn't see earlier.
I looked into  the actual commit which added the pinctrl thread [1]
The function run_put_mac_bindings() didn't have any issues earlier as the code was like below

*******
...
/* Move the mac bindings from 'put_mac_bindings' hmap to
 * 'buffered_mac_bindings' and notify the pinctrl_handler.
 * pinctrl_handler will reinject the buffered packets. */
 if (!hmap_is_empty(&put_mac_bindings)) {
     buffer_put_mac_bindings();
     notify_pinctrl_handler();
 }
********

and the function buffer_put_mac_bindings() popped the put_mac_bindings hmap.

The recent commit [2] has introduced this issue because it deleted the function buffer_put_mac_bindings().

I think its better to correct the commit message  and also add "Fixes" tag.

[1] - https://github.com/openvswitch/ovs/commit/3594ffab6b4b423aa635a313f6b304180d7dbaf7#diff-426c527fd2f3cf3c57def4b2747a48c3 [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openvswitch_ovs_commit_3594ffab6b4b423aa635a313f6b304180d7dbaf7-23diff-2D426c527fd2f3cf3c57def4b2747a48c3&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=XA7ByJtXDRAQSk6npDCYocdoN-a3Bktrhxx0flh5B4A&s=Vxr6-c-Anu0LEhO9OwySZ5A5v_hpNVq6UQWkGy1JyMI&e=>
[2] - https://github.com/openvswitch/ovs/commit/1c24b2f490ba002bbfeb23006965188a7c5b9747#diff-426c527fd2f3cf3c57def4b2747a48c3 [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openvswitch_ovs_commit_1c24b2f490ba002bbfeb23006965188a7c5b9747-23diff-2D426c527fd2f3cf3c57def4b2747a48c3&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=XA7ByJtXDRAQSk6npDCYocdoN-a3Bktrhxx0flh5B4A&s=qKoaN2qjrET5DGYwQBS9Po2Bj4Bi8TIbeBE-K_BJU4Y&e=>


Thanks
Numan

>
> FIX:
> a. Added call to flush_put_mac_bindings back in
>    run_put_mac_bindings.
>
> b. Additioanlly, logic mentioned in ROOT_CAUSE.b has been changed
>    in 1c24b2f490ba002bbfeb23006965188a7c5b9747, hence updated
>    the documentation in pinctrl.c accordingly.
>
> Signed-off-by: Ankur Sharma <ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com><mailto:ankur.sharma at nutanix.com<mailto:ankur.sharma at nutanix.com>>>
> ---
>  ovn/controller/pinctrl.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c
> index 8ae1f9b..b7bb4c9 100644
> --- a/ovn/controller/pinctrl.c
> +++ b/ovn/controller/pinctrl.c
> @@ -91,11 +91,13 @@ VLOG_DEFINE_THIS_MODULE(pinctrl);
>   *
>   *   - arp/nd_ns      - These actions generate an ARP/IPv6 Neighbor solicit
>   *                      requests. The original packets are buffered and
> - *                      injected back when put_arp/put_nd actions are called.
> + *                      injected back when put_arp/put_nd resolves
> + *                      corresponding ARP/IPv6 Neighbor solicit requests.
>   *                      When pinctrl_run(), writes the mac bindings from the
>   *                      'put_mac_bindings' hmap to the MAC_Binding table in
> - *                      SB DB, it moves these mac bindings to another hmap -
> - *                      'buffered_mac_bindings'.
> + *                      SB DB, run_buffered_binding will add the buffered
> + *                      packets to buffered_mac_bindings and notify
> + *                      pinctrl_handler.
>   *
>   *                      The pinctrl_handler thread calls the function -
>   *                      send_mac_binding_buffered_pkts(), which uses
> @@ -2468,6 +2470,7 @@ run_put_mac_bindings(struct ovsdb_idl_txn *ovnsb_idl_txn,
>                              sbrec_mac_binding_by_lport_ip,
>                              pmb);
>      }
> +    "flush_put_mac_bindings();

Thanks for the patch and finding the issue. This is a big mistake from me when I added the pinctrl_thread.

Instead of calling  "flush_put_mac_bindings()" in the function "run_put_mac_bindings()"
I would suggest to use - HMAP_FOR_EACH_POP in run_put_mac_bindings() and then free
pmb after the call to run_put_mac_binding().

If you see flush_put_mac_bindings() also does the same thing. We can avoid running the for loop twice.

You can ignore my suggestion. Calling flush_put_mac_bindings() seems much cleaner.

Acked-by: Numan Siddique <nusiddiq at redhat.com<mailto:nusiddiq at redhat.com>>

Thanks
Numan


Thanks
Numan

>  }
>
>  static void
> --
> 1.8.3.1
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org<mailto:dev at openvswitch.org><mailto:dev at openvswitch.org<mailto:dev at openvswitch.org>>
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev [mail.openvswitch.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=wyCWCurDrDdYULVOHFu1TJyFW2oQoNtJtlISXqkQEOU&s=kwv6Xpj9J47S5qVPfEF92S5wBd_7KHBtvOeOua67ijM&e=> [mail.openvswitch.org [mail.openvswitch.org]<https://urldefense.proofpoint.com/v2/url?u=http-3A__mail.openvswitch.org&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=wyCWCurDrDdYULVOHFu1TJyFW2oQoNtJtlISXqkQEOU&s=jQd-m5xirQqrCOBwkOE4D0fAkYZjwy2No-tgJksKVCk&e=>]<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=P-XiYg1pCA7c5_H6MKaFGMBLWLPE7GAfuUnfELvcIGY&s=6QBH6hjY9c7Vc-Bh4_AzkwWJBBGuOH3Y2v7sXIIU-kc&e=>

Thanks for the fix. Does it mean the the mac-binding will not work after some time since it is never flushed, and the check "hmap_count(&put_mac_bindings) >= 1000" in pinctrl_handle_put_mac_binding() will prevent any new mac-binding being handled?

The fix looks good to me.

Acked-by: Han Zhou <hzhou8 at ebay.com<mailto:hzhou8 at ebay.com><mailto:hzhou8 at ebay.com<mailto:hzhou8 at ebay.com>>>
_______________________________________________
dev mailing list
dev at openvswitch.org<mailto:dev at openvswitch.org>
https://mail.openvswitch.org/mailman/listinfo/ovs-dev [mail.openvswitch.org]<https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwMFaQ&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=wyCWCurDrDdYULVOHFu1TJyFW2oQoNtJtlISXqkQEOU&s=kwv6Xpj9J47S5qVPfEF92S5wBd_7KHBtvOeOua67ijM&e=>


More information about the dev mailing list