[ovs-dev] [PATCH net-next] net: openvswitch: Add support to lookup invalid packet in ct action.

Numan Siddique nusiddiq at redhat.com
Tue Oct 6 12:19:38 UTC 2020

On Tue, Oct 6, 2020 at 4:46 PM Florian Westphal <fw at strlen.de> wrote:
> nusiddiq at redhat.com <nusiddiq at redhat.com> wrote:
> > From: Numan Siddique <nusiddiq at redhat.com>
> >
> > For a tcp packet which is part of an existing committed connection,
> > nf_conntrack_in() will return err and set skb->_nfct to NULL if it is
> > out of tcp window. ct action for this packet will set the ct_state
> > to +inv which is as expected.
> This is because from conntrack p.o.v., such TCP packet is NOT part of
> the existing connection.
> For example, because it is considered part of a previous incarnation
> of the same connection.
> > But a controller cannot add an OVS flow as
> >
> > table=21,priority=100,ct_state=+inv, actions=drop
> >
> > to drop such packets. That is because when ct action is executed on other
> > packets which are not part of existing committed connections, ct_state
> > can be set to invalid. Few such cases are:
> >    - ICMP reply packets.
> Can you elaborate? Echo reply should not be invalid. Conntrack should
> mark it as established (unless such echo reply came out of the blue).

Hi Florian,

Thanks for providing the comments.

Sorry for not being very clear.

Let me brief about the present problem we see in OVN (which is a
controller using ovs)

When a VM/container sends a packet (in the ingress direction), we don't send all
the packets to conntrack. If a packet is destined to an OVN load
balancer virtual ip,
only then we send the packet to conntrack in the ingress direction and
then we do dnat
to the backend.

Eg. in the ingress direction

table=1, match = (ip && ip4.dst == VIP) action = ct(table=2)
tablle=2, ct_state=+new+trk && ip4.dst == VIP, action = ct(commit,

However for the egress direction (when the packet is to be delivered
to the VM/container),
we send all the packets to conntrack and if the ct.est is set, we do
undnat before delivering
the packet to the VM/container.
table=40, match = ip, action = ct(table=41)
table=41, match = ct_state=+est+trk, action = ct(nat)

What I mean here is that, since we send all the packets in the egress
pipeline to conntrack,
we can't add a flow like - match = ct_state=+inv, action = drop.

i.e When a VM/container sends an ICMP request packet, it will not be
sent to conntrack, but
the reply ICMP will be sent to conntrack and it will be marked as invalid.

So is the case with TCP, the TCP SYN from the VM is not sent to
conntrack, but the SYN/ACK
from the server would be sent to conntrack and it will be marked as invalid.

> >    - TCP SYN/ACK packets during connection establishment.
> SYN/ACK should also be established state.
> INVALID should only be matched for packets that were never seen
> by conntrack, or that are deemed out of date / corrupted.
> > To distinguish between an invalid packet part of committed connection
> > and others, this patch introduces as a new ct attribute
> > OVS_CT_ATTR_LOOKUP_INV. If this is set in the ct action (without commit),
> > it tries to find the ct entry and if present, sets the ct_state to
> > +inv,+trk and also sets the mark and labels associated with the
> > connection.
> >
> > With this,  a controller can add flows like
> >
> > ....
> > ....
> > table=20,ip, action=ct(table=21, lookup_invalid)
> > table=21,priority=100,ct_state=+inv+trk,ct_label=0x2/0x2 actions=drop
> > table=21,ip, actions=resubmit(,22)
> > ....
> > ....
> What exactly is the feature/problem that needs to be solved?
> I suspect this would help me to provide better feedback than the
> semi-random comments below .... :-)
> My only problem with how conntrack does things ATM is that the ruleset
> cannot distinguish:
> 1. packet was not even seen by conntrack
> 2. packet matches existing connection, but is "bad", for example:
>   - contradicting tcp flags
>   - out of window
>   - invalid checksum

We want the below to be solved (using OVS flows) :
  - If the packet is marked as invalid due to (2) which you mentioned above,
    we would like to read the ct_mark and ct_label fields as the packet is
    part of existing connection, so that we can add an OVS flow like

ct_state=+inv+trk,ct_label=0x2 actions=drop

Right now it is not possible.

This patch does another lookup if skb->_nfct is NULL after
nf_conntrack_in() to check
if (2) is the case. If the lookup is successful, it updates the ct flow
key with the ct_mark and ct_label. This is made optional using a
netlink attribute.

I'm not sure if it's possible for nf_conntrack_in() to provide this
information for
its callers so that the caller can come to know that the state is
invalid because of (2).

I tested by setting 'be_liberal' sysctl flag and since skb->_nfct was
set for (2), OVS
datapath module set the ct_state to +est.


> There are a few sysctls to modify default behaviour, e.g. relax window
> checks, or ignore/skip checksum validation.
> The other problem i see (solveable for sure by yet-another-sysctl but i
> see that as last-resort) is usual compatibility problem:
> ct state invalid drop
> ct mark gt 0 accept
> If standard netfilter conntrack were to set skb->_nfct e.g. even if
> state is invalid, we could still make the above work via some internal
> flag.
> But if you reverse it, you get different behaviour:
> ct mark gt 0 accept
> ct state invalid drop
> First rule might now accept out-of-window packet even when "be_liberal"
> sysctl is off.

More information about the dev mailing list