[ovs-dev] [PATCH] conntrack: Reset ct_state when entering a new zone.

Dumitru Ceara dceara at redhat.com
Wed Mar 4 19:44:09 UTC 2020


On 3/4/20 7:45 PM, Ilya Maximets wrote:
> On 3/4/20 2:01 PM, Dumitru Ceara wrote:
>> On 1/30/20 3:16 PM, Dumitru Ceara wrote:
>>> When a new conntrack zone is entered, the ct_state field is zeroed in
>>> order to avoid using state information from different zones.
>>>
>>> One such scenario is when a packet is double NATed. Assuming two zones
>>> and 3 flows performing the following actions in order on the packet:
>>> 1. ct(zone=5,nat), recirc
>>> 2. ct(zone=1), recirc
>>> 3. ct(zone=1,nat)
>>>
>>> If at step #1 the packet matches an existing NAT entry, it will get
>>> translated and pkt->md.ct_state is set to CS_DST_NAT or CS_SRC_NAT.
>>> At step #2 the new tuple might match an existing connection and
>>> pkt->md.ct_zone is set to 1.
>>> If at step #3 the packet matches an existing NAT entry in zone 1,
>>> handle_nat() will be called to perform the translation but it will
>>> return early because the packet's zone matches the conntrack zone and
>>> the ct_state field still contains CS_DST_NAT or CS_SRC_NAT from the
>>> translations in zone 5.
>>>
>>> In order to reliably detect when a packet enters a new conntrack zone
>>> we also need to zero out the pkt->md.ct_zone field when initializing
>>> metadata in pkt_metadata_init().
>>>
>>> CC: Darrell Ball <dlu998 at gmail.com>
>>> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
>>> ---
>>>  lib/conntrack.c | 5 +++++
>>>  lib/packets.h   | 5 +++++
>>>  2 files changed, 10 insertions(+)
>>>
>>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>>> index ff5a894..e4d934a 100644
>>> --- a/lib/conntrack.c
>>> +++ b/lib/conntrack.c
>>> @@ -1277,6 +1277,11 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
>>>              const struct nat_action_info_t *nat_action_info,
>>>              ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper)
>>>  {
>>> +    /* Reset ct_state whenever entering a new zone. */
>>> +    if (pkt->md.ct_zone != zone) {
>>> +        pkt->md.ct_state = 0;
>>> +    }
>>> +
>>>      bool create_new_conn = false;
>>>      conn_key_lookup(ct, &ctx->key, ctx->hash, now, &ctx->conn, &ctx->reply);
>>>      struct conn *conn = ctx->conn;
>>> diff --git a/lib/packets.h b/lib/packets.h
>>> index 5d7f82c..fae64bb 100644
>>> --- a/lib/packets.h
>>> +++ b/lib/packets.h
>>> @@ -161,6 +161,11 @@ pkt_metadata_init(struct pkt_metadata *md, odp_port_t port)
>>>       */
>>>      memset(md, 0, offsetof(struct pkt_metadata, ct_orig_tuple_ipv6));
>>>  
>>> +    /* Explicitly zero out ct_zone in order to be able to properly determine
>>> +     * when a packet enters a new conntrack zone.
>>> +     */
>>> +    md->ct_zone = 0;
> 

Hi Ilya,

Thanks for reviewing this!

> I'm not an expert in conntrack, but I'm bothered about this change in
> init function.  This function works for every single packet and any
> modification might significantly affect performance.

I agree, I'm not too happy with this either but it seemed the safest
from a correctness perspective.

> 
> There is an assumption that conntrack fields of packet metadata are
> never used if ct_state is zero.  So, every user of these fields must
> be sure that ct_state is correctly initialized.

Yes, but as far as I understand, this doesn't imply that if ct_state is
non-zero the pkt->md.ct_zone field is initialized and safe to use.

> Will it work if we'll change your 'if' statement in 'process_one'
> function to something like:
> 
>     if (pkt->md.ct_state && pkt->md.ct_zone != zone) {
>         pkt->md.ct_state = 0;
>     }
> 
> and will not change metadata initialization code?
> 

I think one example of reading unitialized pkt->md.ct_zone is if
process_one() was called for a packet and returned early in case of
CT_CONN_TYPE_UN_NAT:

https://github.com/openvswitch/ovs/blob/master/lib/conntrack.c#L1303

Here we set pkt->md.ct_state to "CS_TRACKED | CS_INVALID" but we don't
touch pkt->md.ct_zone. I think there are cases when this packet could
reach process_one() again further down the pipeline and we'd be reading
the uninitialized pkt->md.ct_zone.

Right now I don't see any other places where we set pkt->md.ct_state
while leaving pkt->md.ct_zone uninitialized. If that's true I can make
sure ct_zone is properly initialized here too and then the check you
suggested would be ok in all cases and we could get rid of the change in
pkt_metadata_init().

I'll give it a try and test it out to see if all scenarios are covered.

Thanks,
Dumitru

>>> +
>>>      /* It can be expensive to zero out all of the tunnel metadata. However,
>>>       * we can just zero out ip_dst and the rest of the data will never be
>>>       * looked at. */
>>>
>>
>> Hi,
>>
>> Just a reminder about this patch.
>>
>> OVN LB harpinning system tests on OVN master and OVN-20.03 with OVS
>> userspace datapath fail because of the issue this patch addresses.
>>
>> Thanks,
>> Dumitru
>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 



More information about the dev mailing list