[ovs-dev] [PATCH ovn v2 1/4] inc-proc-eng: Call clear_tracked_data before recompute.

Mark Gray mark.d.gray at redhat.com
Wed Apr 28 08:27:25 UTC 2021


On 27/04/2021 18:27, Zhen Wang (SW-CLOUD) wrote:
> 
> 
> From: Han Zhou <hzhou at ovn.org>
> Sent: Tuesday, April 27, 2021 9:53 AM
> To: Mark Gray <mark.d.gray at redhat.com>
> Cc: Han Zhou <hzhou at ovn.org>; ovs dev <dev at openvswitch.org>; Zhen Wang (SW-CLOUD) <zhewang at nvidia.com>; Girish Moodalbail <gmoodalbail at nvidia.com>
> Subject: Re: [ovs-dev] [PATCH ovn v2 1/4] inc-proc-eng: Call clear_tracked_data before recompute.
> 
> External email: Use caution opening links or attachments
> 
> 
> 
> On Tue, Apr 27, 2021 at 8:39 AM Mark Gray <mark.d.gray at redhat.com<mailto:mark.d.gray at redhat.com>> wrote:
>>
>> Hi Han,
>>
>> Thanks for fixing this. I reviewed this series but I am not an expert on
>> the code. Please have a look at my suggestions but I suggest also
>> waiting for an ack from Girish or Krzystof as they will probably test it.
>>
> 
> Thanks Mark for the review.
> 
> + Winson who verified for the same environment where Girish was reporting the issue. (Both of them are now my colleagues :) )
> Winson would you add your Tested-by to this series?
> 
> In our 600+  node k8s cluster,  scale up 1000 pods in namespace with NetworkPolicy  “deny-from-other-namespaces”
> 
> Without Han’s Patch:
> K8s node br-int OF increased  around 800K.
> 
> With Han’s patch:
> K8s node br-int OF increased  around 6K

Nice!

> 
> Regards,
> Winson
> 
>> Mark
>>
>> On 22/04/2021 21:14, Han Zhou wrote:
>>> Cleanup particially tracked data due to some of the change handler
>> s/particially/partially?
> 
> Ack
> 
>>> executions before falling back to recompute. This is done already
>>> in the en_runtime_data_run() implementation, but this patch makes
>>> it a generic behavior of the I-P engine.
>>>
>>> Signed-off-by: Han Zhou <hzhou at ovn.org<mailto:hzhou at ovn.org>>
>>> ---
>>> v1->v2: no change
>>>
>>>  controller/ovn-controller.c | 17 -----------------
>>>  lib/inc-proc-eng.c          |  5 +++++
>>>  2 files changed, 5 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>> index 6f7c9ea61..13c03131c 100644
>>> --- a/controller/ovn-controller.c
>>> +++ b/controller/ovn-controller.c
>>> @@ -1412,23 +1412,6 @@ en_runtime_data_run(struct engine_node *node, void *data)
>>>      struct sset *local_lport_ids = &rt_data->local_lport_ids;
>>>      struct sset *active_tunnels = &rt_data->active_tunnels;
>>>
>>> -    /* Clear the (stale) tracked data if any. Even though the tracked data
>>> -     * gets cleared in the beginning of engine_init_run(),
>>> -     * any of the runtime data handler might have set some tracked
>>> -     * data and later another runtime data handler might return false
>>> -     * resulting in full recompute of runtime engine and rendering the tracked
>>> -     * data stale.
>>> -     *
>>> -     * It's possible that engine framework can be enhanced to indicate
>>> -     * the node handlers (in this case flow_output_runtime_data_handler)
>>> -     * that its input node had a full recompute. However we would still
>>> -     * need to clear the tracked data, because we don't want the
>>> -     * stale tracked data to be accessed outside of the engine, since the
>>> -     * tracked data is cleared in the engine_init_run() and not at the
>>> -     * end of the engine run.
>>> -     * */
>>> -    en_runtime_data_clear_tracked_data(data);
>>> -
>>>      static bool first_run = true;
>>>      if (first_run) {
>>>          /* don't cleanup since there is no data yet */
>>> diff --git a/lib/inc-proc-eng.c b/lib/inc-proc-eng.c
>>> index a6337a1d9..161327404 100644
>>> --- a/lib/inc-proc-eng.c
>>> +++ b/lib/inc-proc-eng.c
>>> @@ -327,6 +327,11 @@ engine_recompute(struct engine_node *node, bool forced, bool allowed)
>>>      }
>>>
>>>      /* Run the node handler which might change state. */
>> Can you move this^ comment down to above the run function as I think it
>> is relevant to that code?
> 
> Well, I added it this way because the major step here is to "run()", i.e. recompute, and I added a minor/sub step which is clearing tracked data first. Is this reasonable? I can change it if you think the other way is better.
> 
>>> +    /* Clear tracked data before calling run() so that partially tracked data
>>> +     * from some of the change handler executions are cleared. */
>>> +    if (node->clear_tracked_data) {
>>> +        node->clear_tracked_data(node->data);
>>> +    }
>>>      node->run(node, node->data);
>>>      node->stats.recompute++;
>>>  }
>>>
>>



More information about the dev mailing list