[ovs-dev] [PATCH ovn] ofctrl: Don't append actions for conj flows if called twice for same flow uuid.
Dumitru Ceara
dceara at redhat.com
Mon Feb 22 09:41:14 UTC 2021
On 2/22/21 10:23 AM, Numan Siddique wrote:
> On Mon, Feb 22, 2021 at 2:17 PM Dumitru Ceara <dceara at redhat.com> wrote:
>>
>> On 2/21/21 12:34 PM, numans at ovn.org wrote:
>>> From: Numan Siddique <numans at ovn.org>
>>>
>>> In one of the scaled deployments, ovn-controller is asserting with the
>>> below stack trace
>>>
>>> ***
>>> (gdb) bt
>>> 0 raise () from /lib64/libc.so.6
>>> 1 abort () from /lib64/libc.so.6
>>> 2 ovs_abort_valist ("%s: assertion %s failed in %s()") at lib/util.c:419
>>> 3 vlog_abort_valist ("%s: assertion %s failed in %s()") at lib/vlog.c:1249
>>> 4 vlog_abort ("%s: assertion %s failed in %s()") at lib/vlog.c:1263
>>> 5 ovs_assert_failure (where="controller/ofctrl.c:1216",
>>> function="flood_remove_flows_for_sb_uuid",
>>> condition="ovs_list_is_empty(&f->list_node)") at lib/util.c:86
>>> 6 flood_remove_flows_for_sb_uuid (sb_uuid=...134,
>>> flood_remove_nodes=...ef0) at controller/ofctrl.c:1216
>>> 9 ofctrl_flood_remove_flows (flood_remove_nodes=...ef0) at controller/ofctrl.c:1280
>>> 10 lflow_handle_changed_ref (ref_type=REF_TYPE_PORTGROUP,
>>> ref_name= "5564_pg_14...aac") at controller/lflow.c:522
>>> 11 _flow_output_resource_ref_handler (ref_type=REF_TYPE_PORTGROUP)
>>> at controller/ovn-controller.c:2220
>>> 12 engine_compute () at lib/inc-proc-eng.c:306
>>> 13 engine_run_node (recompute_allowed=true) at lib/inc-proc-eng.c:352
>>> 14 engine_run (recompute_allowed=true) at lib/inc-proc-eng.c:377
>>> 15 main () at controller/ovn-controller.c:2887
>>> ***
>>>
>>> The reason for this assert is different from the assertion bug fixed in
>>> the commit 858d1dd716("ofctrl: Fix the assert seen when flood removing
>>> flows.")
>>>
>>> The above assertion is seen if lflow.c calls ofctrl_add_or_append_flow()
>>> twice for the same flow uuid 'S'. When this function is called for
>>> the first time, a new desired flow 'f' is created and 'f->references'
>>> will have [(S, f)]. When it is called for the 2nd time, the list
>>> 'f->references' is updated as [(S, f), (S,f)]. Later when
>>> flood_remove_flows_for_sb_uuid() is called for 'S', the above assertion
>>> is seen.
>>>
>>> This scenarion can happen when ovn-controller receives an update message
>>> from SB ovsdb-server in which a new logical flow 'F' belonging to a
>>> datapath 'D' is added which would result in conjunction flows. If this
>>> datapath 'D' is added to 'local_datapaths' in the same loop, then
>>> logical flow 'F' is first processed in lflow_add_flows_for_datapath()
>>> and the second time in lflow_handle_changed_flows().
>>>
>>> This issue can be fixed in 2 ways
>>> 1. Flood remove all the tracked flows in lflow_handle_changed_flows()
>>> first and then process the modified or newly added logical flows.
>>>
>>> 2. In the function ofctrl_add_or_append_flow(), check if there is
>>> already a reference for the flow uuid 'S' in the existing desired
>>> flows and only append the actions if it doesn't exist.
>>>
>>> This patch has taken the approach (2) as that seems better and
>>> effecient.
>>>
>>> Signed-off-by: Numan Siddique <numans at ovn.org>
>>> ---
>>
>> Hi Numan,
>>
>> Nice work tracking this down!
>>
>> This is not a full review, just an initial question to see if there's
>> also an option #3 to fix this issue:
>>
>> Can we try to avoid processing the same lflow twice? E.g., in
>> lflow_add_flows_for_datapath(), call consider_logical_flow__() only for
>> logical flows that are not on the table's 'track_list', so have not
>> changed in the SB.
>
> Hi Dumitru,
>
> Thanks for the comments. I think this would require running a for
> loop of tracked
> flows for every logical flow for the datapath and checking if it is in
> the tracked list or not.
OVS IDL doesn't explicitly expose an API to check if a record has been
changed (i.e., is on the track list). We could add one, and use it in
lflow_add_flows_for_datapath() when iterating over lflows, to just skip
records that were marked as updates already.
> I think we could do this too. But in my opinion we should support
> ofctrl_add_or_append_flow() to be called twice for the same uuid.
I'm not sure about this. Why should we support adding the same flow
twice? It seems like a bug to me.
> When ofctrl_add_or_append_flow() is called twice, it should result in a no-op.
What if we implement (3) and also add an assert or check in
link_flow_to_sb() to not allow the same flow to be added twice to a SB uuid?
>
> We could end up in a similar scenario later when incrementally processing
> other changes.
As far as I see, we currently call ofctrl_add_or_append_flow() only for
logical flows so, at least for now, no other incrementally processed SB
record types can cause this issue.
>
> In my opinion we can take (2) to not result in this assert scenario.
> We can also take your suggested approach (3) so that we don't call
> ofctrl_add_or_append_flow() twice for the reported scenario.
Right now ofctrl_add_or_append_flow() has the implicit requirement that
a single flow shouldn't be added twice. This seems reasonable to me,
and I think it should be the responsibility of the caller to avoid that.
However, as you mentioned, we could also check inside the ofctrl_*()
implementation and make sure the caller isn't adding the same flow
twice, but I think that can be simplified by adding the check to
link_flow_to_sb(). The way I see it we have two options when such a
check would fail:
- either assert
- or ignore the second addition while logging an error but this might be
easily overlooked and so the real problem of adding the same flow twice
will not be addressed.
>
> What do you think?
>
> Thanks
> Numan
>
Regards,
Dumitru
More information about the dev
mailing list