[ovs-dev] [PATCH net-next] net: openvswitch: Add support to lookup invalid packet in ct action.

Tue Oct 6 18:52:04 UTC 2020

Numan Siddique <nusiddiq at redhat.com> wrote:
> On Tue, Oct 6, 2020 at 4:46 PM Florian Westphal <fw at strlen.de> wrote:
> >
> > nusiddiq at redhat.com <nusiddiq at redhat.com> wrote:
> > > From: Numan Siddique <nusiddiq at redhat.com>
> > >
> > > For a tcp packet which is part of an existing committed connection,
> > > nf_conntrack_in() will return err and set skb->_nfct to NULL if it is
> > > out of tcp window. ct action for this packet will set the ct_state
> > > to +inv which is as expected.
> >
> > This is because from conntrack p.o.v., such TCP packet is NOT part of
> > the existing connection.
> >
> > For example, because it is considered part of a previous incarnation
> > of the same connection.
> >
> > > But a controller cannot add an OVS flow as
> > >
> > > table=21,priority=100,ct_state=+inv, actions=drop
> > >
> > > to drop such packets. That is because when ct action is executed on other
> > > packets which are not part of existing committed connections, ct_state
> > > can be set to invalid. Few such cases are:
> > >    - ICMP reply packets.
> >
> > Can you elaborate? Echo reply should not be invalid. Conntrack should
> > mark it as established (unless such echo reply came out of the blue).
> 
> Hi Florian,
> 
> Thanks for providing the comments.
> 
> Sorry for not being very clear.
> 
> Let me brief about the present problem we see in OVN (which is a
> controller using ovs)
> 
> When a VM/container sends a packet (in the ingress direction), we don't send all
> the packets to conntrack. If a packet is destined to an OVN load
> balancer virtual ip,
> only then we send the packet to conntrack in the ingress direction and
> then we do dnat
> to the backend.

Ah, okay.  That explains the INVALID then, but in that case I think this
patch/direction is less desirable from conntrack point of view.

More below.

> table=1, match = (ip && ip4.dst == VIP) action = ct(table=2)
> tablle=2, ct_state=+new+trk && ip4.dst == VIP, action = ct(commit,
> nat=BACKEND_IP)
> ...
> ..
> 
> However for the egress direction (when the packet is to be delivered
> to the VM/container),
> we send all the packets to conntrack and if the ct.est is set, we do
> undnat before delivering
> the packet to the VM/container.
> ...
> table=40, match = ip, action = ct(table=41)
> table=41, match = ct_state=+est+trk, action = ct(nat)
> ...
> 
> What I mean here is that, since we send all the packets in the egress
> pipeline to conntrack,
> we can't add a flow like - match = ct_state=+inv, action = drop.

Basically this is like a normal routing/netfitler (no ovs), where
the machine in question only sees unidirectional traffic (reply packets
taking different route).

> i.e When a VM/container sends an ICMP request packet, it will not be
> sent to conntrack, but
> the reply ICMP will be sent to conntrack and it will be marked as invalid.

Yes.  For plain netfilter, the solution would be to accept INVALID icmp
replies in the iptables/nftables ruleset.

> So is the case with TCP, the TCP SYN from the VM is not sent to
> conntrack, but the SYN/ACK
> from the server would be sent to conntrack and it will be marked as invalid.

Right.

> > 1. packet was not even seen by conntrack
> > 2. packet matches existing connection, but is "bad", for example:
> >   - contradicting tcp flags
> >   - out of window
> >   - invalid checksum
> 
> We want the below to be solved (using OVS flows) :
>   - If the packet is marked as invalid due to (2) which you mentioned above,
>     we would like to read the ct_mark and ct_label fields as the packet is
>     part of existing connection, so that we can add an OVS flow like
>
> ct_state=+inv+trk,ct_label=0x2 actions=drop

Wouldn't it make more sense to let tcp conntrack skip the in-window
check?  AFAIU thats the only problem and its identical to other cases
that we have at the moment, for example, conntrack supports mid-stream
pickup (on by default) which also disables tcp window checks.

> I tested by setting 'be_liberal' sysctl flag and since skb->_nfct was
> set for (2), OVS
> datapath module set the ct_state to +est.

Yes.  This flag can be set programatically on a per-ct basis.

See nft_flow_offload_eval() for example (BE_LIBERAL).
Given we now have multiple places that set this I think it would make
sense to add a helper, say, e.g.

void nf_ct_tcp_be_liberal(struct nf_conn *ct);
?

(Or perhaps nf_ct_tcp_disable_window_checks() , might be more
 clear/descriptive as to what this is doing).

Do you think that this resolves the OVN problem?