[ovs-dev] [PATCH] odp-util.c: Fix dp_hash execution with slowpath actions.

Han Zhou hzhou at ovn.org
Tue Jun 16 22:51:06 UTC 2020


On Tue, Jun 16, 2020 at 2:40 PM Ben Pfaff <blp at ovn.org> wrote:
>
> On Fri, May 15, 2020 at 12:17:47AM -0700, Han Zhou wrote:
> > When dp_hash is executed with slowpath actions, it results in endless
> > recirc loop in kernel datapath, and finally drops the packet, with
> > kernel logs:
> >
> > openvswitch: ovs-system: deferred action limit reached, drop recirc
action
> >
> > The root cause is that the dp_hash value calculated by slowpath is not
> > passed to datapath when executing the recirc action, thus when the
recirced
> > packet miss upcall comes to userspace again, it generates the dp_hash
> > and recirc action again, with same recirc_id, which in turn generates
> > a megaflow with recirc action with the recird_id same as the recirc_id
in
> > its match condition, which causes a loop in datapath.
> >
> > For example, this can be reproduced with below setup of OVN environment:
> >
> >                          LS1            LS2
> >                           |              |
> >                           |------R1------|
> >         VIF--LS0---R0-----|              |------R3
> >                           |------R2------|
> >
> > Assume there is a route from the VIF to R3: R0 -> R1 -> R3, and there
are two
> > routes (ECMP) from R3 to the VIF:
> > R3 -> R1 -> R0
> > R3 -> R2 -> R0
> >
> > Now if we ping from the VIF to R3, the OVS flow execution on the HV of
the VIF
> > will hit the R3's datapath which has flows that responds to the ICMP
packet
> > by setting ICMP fields, which requires slowpath actions, and in later
flow
> > tables it will hit the "group" action that selects between the ECMP
routes.
> >
> > By default OVN uses "dp_hash" method for the "group" action.
> >
> > For the first miss upcall packet, dp_hash value is empty, so the group
action
> > will be translated to "dp_hash" and "recirc".
> >
> > During action execution, because of the previous actions that sets ICMP
fields,
> > the whole execution requires slowpath, so it tries to execute all
actions in
> > userspace in odp_execute_actions(), including dp_hash action, except the
> > recirc action, which can only be executed in datapath. So the dp_hash
value
> > is calculated in userspace, and then the packet is injected to datapath
for
> > recirc action execution.
> >
> > However, the dp_hash calculated by the userspace is not passed to
datapath.
> >
> > Because of this, the packet recirc in datapath doesn't have dp_hash
value,
> > and the miss upcall for the recirced packet hits the same flow tables
and
> > triggers same "dp_hash" and "recirc" action again, with exactly same
recirc_id!
> >
> > This time, the new upcall doesn't require any slowpath execution, so
both
> > the dp_hash and recirc actions are executed in datapath, after creating
a
> > datapath megaflow like:
> >
> > recirc_id(XYZ),..., actions:hash(l4(0)),recirc(XYZ)
> >
> > with match recirc_id equals the recirc id in the action, thus creating
a loop.
> >
> > This patch fixes the problem by passing the calculated dp_hash value to
> > datapath in odp_key_from_dp_packet().
> >
> > Signed-off-by: Han Zhou <hzhou at ovn.org>
>
> I remember that we discussed whether dp_hash should be passed to and
> from the kernel back when we introduced the field, in commit
> 572f732ab078 ("dpif-netdev: user space datapath recirculation").  The
> concern was that userspace and the datapath might compute different hash
> values, which could cause a problem for flows that ended up being
> handled two different ways with different hash values.  However, not
> doing that caused the problem that this commit solves.
>
> I added a Fixes: tag for that original commit, ran the tests, and pushed
> this to master.  Thank you!
>
> Do you want this backported at all?

Thanks Ben. I think it is better to be backported to at least 2.12, 2.11. I
am not sure about older branches.


More information about the dev mailing list