[ovs-dev] [PATCH] odp-util.c: Fix dp_hash execution with slowpath actions.

Ben Pfaff blp at ovn.org
Tue Jun 16 21:40:11 UTC 2020

On Fri, May 15, 2020 at 12:17:47AM -0700, Han Zhou wrote:
> When dp_hash is executed with slowpath actions, it results in endless
> recirc loop in kernel datapath, and finally drops the packet, with
> kernel logs:
> openvswitch: ovs-system: deferred action limit reached, drop recirc action
> The root cause is that the dp_hash value calculated by slowpath is not
> passed to datapath when executing the recirc action, thus when the recirced
> packet miss upcall comes to userspace again, it generates the dp_hash
> and recirc action again, with same recirc_id, which in turn generates
> a megaflow with recirc action with the recird_id same as the recirc_id in
> its match condition, which causes a loop in datapath.
> For example, this can be reproduced with below setup of OVN environment:
>                          LS1            LS2
>                           |              |
>                           |------R1------|
>         VIF--LS0---R0-----|              |------R3
>                           |------R2------|
> Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there are two
> routes (ECMP) from R3 to the VIF:
> R3 -> R1 -> R0
> R3 -> R2 -> R0
> Now if we ping from the VIF to R3, the OVS flow execution on the HV of the VIF
> will hit the R3's datapath which has flows that responds to the ICMP packet
> by setting ICMP fields, which requires slowpath actions, and in later flow
> tables it will hit the "group" action that selects between the ECMP routes.
> By default OVN uses "dp_hash" method for the "group" action.
> For the first miss upcall packet, dp_hash value is empty, so the group action
> will be translated to "dp_hash" and "recirc".
> During action execution, because of the previous actions that sets ICMP fields,
> the whole execution requires slowpath, so it tries to execute all actions in
> userspace in odp_execute_actions(), including dp_hash action, except the
> recirc action, which can only be executed in datapath. So the dp_hash value
> is calculated in userspace, and then the packet is injected to datapath for
> recirc action execution.
> However, the dp_hash calculated by the userspace is not passed to datapath.
> Because of this, the packet recirc in datapath doesn't have dp_hash value,
> and the miss upcall for the recirced packet hits the same flow tables and
> triggers same "dp_hash" and "recirc" action again, with exactly same recirc_id!
> This time, the new upcall doesn't require any slowpath execution, so both
> the dp_hash and recirc actions are executed in datapath, after creating a
> datapath megaflow like:
> recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)
> with match recirc_id equals the recirc id in the action, thus creating a loop.
> This patch fixes the problem by passing the calculated dp_hash value to
> datapath in odp_key_from_dp_packet().
> Signed-off-by: Han Zhou <hzhou at ovn.org>

I remember that we discussed whether dp_hash should be passed to and
from the kernel back when we introduced the field, in commit
572f732ab078 ("dpif-netdev: user space datapath recirculation").  The
concern was that userspace and the datapath might compute different hash
values, which could cause a problem for flows that ended up being
handled two different ways with different hash values.  However, not
doing that caused the problem that this commit solves.

I added a Fixes: tag for that original commit, ran the tests, and pushed
this to master.  Thank you!

Do you want this backported at all?

More information about the dev mailing list