[ovs-dev] [PATCH ovn] ofctrl: Fix the assert seen when flood removing flows.
Ilya Maximets
i.maximets at ovn.org
Tue Feb 16 11:25:34 UTC 2021
On 2/16/21 12:11 PM, Ilya Maximets wrote:
> On 2/16/21 8:02 AM, numans at ovn.org wrote:
>> From: Numan Siddique <numans at ovn.org>
>
> Hi, Numan.
> Thanks for the fix! The logic seems correct, but I'd suggest to
> implement the fix a bit differently just to avoid unnecessary
> complications. This code is already very hard to understand.
>
> Test would be nice, but it could be a separate change. BTW, this
> module is kind of isolated, so it should be possible to write real
> unit test.
>
> Comments inline.
>
> Best regards, Ilya Maximets.
>
>>
>> In one of the scaled deployments, ovn-controller is asserting with the
>> below stack trace
>>
>> ****************
>> (gdb) bt
>> #0 0x00007fa6aed4970f in raise () from /lib64/libc.so.6
>> #1 0x00007fa6aed33b25 in abort () from /lib64/libc.so.6
>> #2 0x000055d46594a714 in ovs_abort_valist (err_no=err_no at entry=0, format=format at entry=0x55d465a2a190 "%s: assertion %s failed in %s()", args=args at entry=0x7ffd24307ba0) at lib/util.c:419
>> #3 0x000055d465952504 in vlog_abort_valist (module_=<optimized out>, message=0x55d465a2a190 "%s: assertion %s failed in %s()", args=args at entry=0x7ffd24307ba0) at lib/vlog.c:1249
>> #4 0x000055d4659525aa in vlog_abort (module=module at entry=0x55d465ce6880 <this_module>, message=message at entry=0x55d465a2a190 "%s: assertion %s failed in %s()") at lib/vlog.c:1263
>> #5 0x000055d46594a42b in ovs_assert_failure (where=where at entry=0x55d465a05529 "controller/ofctrl.c:1198", function=function at entry=0x55d465a05ca0 <__func__.33798> "flood_remove_flows_for_sb_uuid",
>> condition=condition at entry=0x55d465a05a80 "ovs_list_is_empty(&f->list_node)") at lib/util.c:86
>> #6 0x000055d465877aa2 in flood_remove_flows_for_sb_uuid (flow_table=flow_table at entry=0x55d46688d080, sb_uuid=sb_uuid at entry=0x55d53dbec538, flood_remove_nodes=flood_remove_nodes at entry=0x7ffd24307ed0) at controller/ofctrl.c:1205
>> #7 0x000055d465877c2e in flood_remove_flows_for_sb_uuid (flow_table=flow_table at entry=0x55d46688d080, sb_uuid=sb_uuid at entry=0x55d546553898, flood_remove_nodes=flood_remove_nodes at entry=0x7ffd24307ed0) at controller/ofctrl.c:1230
>> #8 0x000055d465877c2e in flood_remove_flows_for_sb_uuid (flow_table=flow_table at entry=0x55d46688d080, sb_uuid=sb_uuid at entry=0x55d55eafebf0, flood_remove_nodes=flood_remove_nodes at entry=0x7ffd24307ed0) at controller/ofctrl.c:1230
>> #9 0x000055d465877dc2 in ofctrl_flood_remove_flows (flow_table=0x55d46688d080, flood_remove_nodes=flood_remove_nodes at entry=0x7ffd24307ed0) at controller/ofctrl.c:1250
>> #10 0x000055d465872b24 in lflow_handle_changed_ref (ref_type=ref_type at entry=REF_TYPE_PORTGROUP, ref_name=ref_name at entry=0x55d49375a050 "5564_pg_6415729e_58ec_4b8b_bc99_2ceef5c44bac", l_ctx_in=l_ctx_in at entry=0x7ffd24307fd0,
>> l_ctx_out=l_ctx_out at entry=0x7ffd24307f90, changed=changed at entry=0x7ffd24307f8f) at controller/lflow.c:612
>> #11 0x000055d46588f2f8 in _flow_output_resource_ref_handler (node=<optimized out>, data=<optimized out>, ref_type=REF_TYPE_PORTGROUP) at controller/ovn-controller.c:2181
>> #12 0x000055d4658a8163 in engine_compute (recompute_allowed=<optimized out>, node=<optimized out>) at lib/inc-proc-eng.c:306
>> #13 engine_run_node (recompute_allowed=true, node=0x7ffd2430d4b0) at lib/inc-proc-eng.c:352
>> #14 engine_run (recompute_allowed=recompute_allowed at entry=true) at lib/inc-proc-eng.c:377
>> #15 0x000055d46586613d in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:2794
>>
>> ***************
>
> Thanks for providing the backtrace. It's hard to read though and
> it contains too much unnecessary information for a commit message.
>
> I'd say we can strip it like this to be more readable:
>
> ***
> (gdb) bt
> 0 raise () from /lib64/libc.so.6
> 1 abort () from /lib64/libc.so.6
> 2 ovs_abort_valist ("%s: assertion %s failed in %s()") at lib/util.c:419
> 3 vlog_abort_valist ("%s: assertion %s failed in %s()") at lib/vlog.c:1249
> 4 vlog_abort ("%s: assertion %s failed in %s()") at lib/vlog.c:1263
> 5 ovs_assert_failure (where="controller/ofctrl.c:1198",
> function="flood_remove_flows_for_sb_uuid",
> condition="ovs_list_is_empty(&f->list_node)") at lib/util.c:86
> 6 flood_remove_flows_for_sb_uuid (sb_uuid=...538,
> flood_remove_nodes=...ed0) at controller/ofctrl.c:1205
> 7 flood_remove_flows_for_sb_uuid (sb_uuid=...898,
> flood_remove_nodes=...ed0) at controller/ofctrl.c:1230
> 8 flood_remove_flows_for_sb_uuid (sb_uuid=...bf0,
> flood_remove_nodes=...ed0) at controller/ofctrl.c:1230
> 9 ofctrl_flood_remove_flows (flood_remove_nodes=...ed0) at controller/ofctrl.c:1250
> 10 lflow_handle_changed_ref (ref_type=REF_TYPE_PORTGROUP,
> ref_name= "5564_pg_64...bac") at controller/lflow.c:612
> 11 _flow_output_resource_ref_handler (ref_type=REF_TYPE_PORTGROUP)
> at controller/ovn-controller.c:2181
> 12 engine_compute () at lib/inc-proc-eng.c:306
> 13 engine_run_node (recompute_allowed=true) at lib/inc-proc-eng.c:352
> 14 engine_run (recompute_allowed=true) at lib/inc-proc-eng.c:377
> 15 main () at controller/ovn-controller.c:2794
> ***
>
>>
>> This assertion is seen when a port group gets updated and it is referenced by many
>> logical flows (with conj actions). The function
>> ofctrl_flood_remove_flows(), calls flood_remove_flows_for_sb_uuid() for
>> each sb uuid in the hmap - flood_remove_nodes using HMAP_FOR_EACH (flood_remove_nodes).
>> flood_remove_flows_for_sb_uuid() also takes the hmap
>> 'flood_remove_nodes' as an argument and it inserts few items into it
>> when it has to call itself recursively. When an item is inserted, its possible that
>> the hmap may get expanded. And if this happens, the HMAP_FOR_EACH ()
>> skips few entries causing some of the desired flows not getting cleared.
>
> Line wrapping might be a bit better in above text, I think. :)
> Sorry for nitpicking.
>
>>
>> Later when ofctrl_add_or_append_flow() is called, there would be
>> multiple 'struct sb_flow_ref' references for the same desired flow.
>> And this causes the above assertion later when the same port group gets
>> updated.
>>
>> This patch fixes this issue by cloning the hmap 'flood_remove_nodes' and
>> using it to iterate the flood remove nodes.
>
> We're only copying this hmap for iteration purposes, so there is no real
> reason to copy it as a hmap. Simple array will save memory and will
> be much faster. And it will be much more readable, IMHO.
>
> Something like this (not tested):
>
> ---
> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> index 9d62e1260..a2cdbb507 100644
> --- a/controller/ofctrl.c
> +++ b/controller/ofctrl.c
> @@ -1247,11 +1247,24 @@ ofctrl_flood_remove_flows(struct ovn_desired_flow_table *flow_table,
> struct hmap *flood_remove_nodes)
> {
> struct ofctrl_flood_remove_node *ofrn;
> + struct uuid *sb_uuids;
> + int i, n = 0;
> +
> + /* flood_remove_flows_for_sb_uuid() will modify the 'flood_remove_nodes'
> + * hash map by inserting new items, so we can't use it for iteration.
> + * Copying. */
> + sb_uuids = xmalloc(hmap_count(flood_remove_nodes) * sizeof *sb_uuids);
> HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) {
> - flood_remove_flows_for_sb_uuid(flow_table, &ofrn->sb_uuid,
> + sb_uuids[n++] = ofrn->sb_uuid;
> + }
> +
> + for (i = 0; i < n; i++) {
> + flood_remove_flows_for_sb_uuid(flow_table, &sb_uuids[i],
> flood_remove_nodes);
> }
>
> + free(sb_uuids);
> +
> /* remove any related group and meter info */
> HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) {
> ovn_extend_table_remove_desired(groups, &ofrn->sb_uuid);
> ---
>
>>
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1928012
>> CC: Han Zhou <hzhou at ovn.org>
>> CC: Dumitru Ceara <dceara at ovn.org>
^^ This is not a correct email. Got bounced. :)
CC: correct one.
>> Signed-off-by: Numan Siddique <numans at ovn.org>
>> ---
>> controller/ofctrl.c | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>>
>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
>> index 9d62e1260..42443acb2 100644
>> --- a/controller/ofctrl.c
>> +++ b/controller/ofctrl.c
>> @@ -1247,10 +1247,28 @@ ofctrl_flood_remove_flows(struct ovn_desired_flow_table *flow_table,
>> struct hmap *flood_remove_nodes)
>> {
>> struct ofctrl_flood_remove_node *ofrn;
>> +
>> + /* flood_remove_flows_for_sb_uuid() modifies the hmap by inserting
>> + * few entries when it calls itself recursively. Clone the
>> + * hmap 'flood_remove_nodes' before calling. Inserting an item to
>> + * hmap may expand it. */
>> + struct hmap flood_remove_uuids = HMAP_INITIALIZER(&flood_remove_uuids);
>> HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) {
>> + ofctrl_flood_remove_add_node(&flood_remove_uuids, &ofrn->sb_uuid);
>> + }
>> +
>> + /* Iterate using the cloned hmap - 'flood_remove_uuids', but pass
>> + * the hmap 'flood_remove_nodes' provided by the caller.
>> + * flood_remove_flows_for_sb_uuid() will delete the other lflows
>> + * referenced by the sb_uuid, which needs to be re-added later
>> + * if those sb_uuids were not deleted. The caller
>> + * (in lflow.c) will add re-add lflows which are not deleted. */
>
> These comments seems very complex and overloaded with information. It makes
> understanding of what is going on not easy. I'd replace them with something
> simple, e.g. the comment that I have in me example above.
>
>> + HMAP_FOR_EACH_POP (ofrn, hmap_node, &flood_remove_uuids) {
>> flood_remove_flows_for_sb_uuid(flow_table, &ofrn->sb_uuid,
>> flood_remove_nodes);
>> + free(ofrn);
>> }
>> + hmap_destroy(&flood_remove_uuids);
>>
>> /* remove any related group and meter info */
>> HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) {
>>
>
More information about the dev
mailing list