[ovs-dev] [PATCH] conntrack: Reset ct_state when entering a new zone.
Dumitru Ceara
dceara at redhat.com
Wed Mar 4 19:44:09 UTC 2020
On 3/4/20 7:45 PM, Ilya Maximets wrote:
> On 3/4/20 2:01 PM, Dumitru Ceara wrote:
>> On 1/30/20 3:16 PM, Dumitru Ceara wrote:
>>> When a new conntrack zone is entered, the ct_state field is zeroed in
>>> order to avoid using state information from different zones.
>>>
>>> One such scenario is when a packet is double NATed. Assuming two zones
>>> and 3 flows performing the following actions in order on the packet:
>>> 1. ct(zone=5,nat), recirc
>>> 2. ct(zone=1), recirc
>>> 3. ct(zone=1,nat)
>>>
>>> If at step #1 the packet matches an existing NAT entry, it will get
>>> translated and pkt->md.ct_state is set to CS_DST_NAT or CS_SRC_NAT.
>>> At step #2 the new tuple might match an existing connection and
>>> pkt->md.ct_zone is set to 1.
>>> If at step #3 the packet matches an existing NAT entry in zone 1,
>>> handle_nat() will be called to perform the translation but it will
>>> return early because the packet's zone matches the conntrack zone and
>>> the ct_state field still contains CS_DST_NAT or CS_SRC_NAT from the
>>> translations in zone 5.
>>>
>>> In order to reliably detect when a packet enters a new conntrack zone
>>> we also need to zero out the pkt->md.ct_zone field when initializing
>>> metadata in pkt_metadata_init().
>>>
>>> CC: Darrell Ball <dlu998 at gmail.com>
>>> Signed-off-by: Dumitru Ceara <dceara at redhat.com>
>>> ---
>>> lib/conntrack.c | 5 +++++
>>> lib/packets.h | 5 +++++
>>> 2 files changed, 10 insertions(+)
>>>
>>> diff --git a/lib/conntrack.c b/lib/conntrack.c
>>> index ff5a894..e4d934a 100644
>>> --- a/lib/conntrack.c
>>> +++ b/lib/conntrack.c
>>> @@ -1277,6 +1277,11 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
>>> const struct nat_action_info_t *nat_action_info,
>>> ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper)
>>> {
>>> + /* Reset ct_state whenever entering a new zone. */
>>> + if (pkt->md.ct_zone != zone) {
>>> + pkt->md.ct_state = 0;
>>> + }
>>> +
>>> bool create_new_conn = false;
>>> conn_key_lookup(ct, &ctx->key, ctx->hash, now, &ctx->conn, &ctx->reply);
>>> struct conn *conn = ctx->conn;
>>> diff --git a/lib/packets.h b/lib/packets.h
>>> index 5d7f82c..fae64bb 100644
>>> --- a/lib/packets.h
>>> +++ b/lib/packets.h
>>> @@ -161,6 +161,11 @@ pkt_metadata_init(struct pkt_metadata *md, odp_port_t port)
>>> */
>>> memset(md, 0, offsetof(struct pkt_metadata, ct_orig_tuple_ipv6));
>>>
>>> + /* Explicitly zero out ct_zone in order to be able to properly determine
>>> + * when a packet enters a new conntrack zone.
>>> + */
>>> + md->ct_zone = 0;
>
Hi Ilya,
Thanks for reviewing this!
> I'm not an expert in conntrack, but I'm bothered about this change in
> init function. This function works for every single packet and any
> modification might significantly affect performance.
I agree, I'm not too happy with this either but it seemed the safest
from a correctness perspective.
>
> There is an assumption that conntrack fields of packet metadata are
> never used if ct_state is zero. So, every user of these fields must
> be sure that ct_state is correctly initialized.
Yes, but as far as I understand, this doesn't imply that if ct_state is
non-zero the pkt->md.ct_zone field is initialized and safe to use.
> Will it work if we'll change your 'if' statement in 'process_one'
> function to something like:
>
> if (pkt->md.ct_state && pkt->md.ct_zone != zone) {
> pkt->md.ct_state = 0;
> }
>
> and will not change metadata initialization code?
>
I think one example of reading unitialized pkt->md.ct_zone is if
process_one() was called for a packet and returned early in case of
CT_CONN_TYPE_UN_NAT:
https://github.com/openvswitch/ovs/blob/master/lib/conntrack.c#L1303
Here we set pkt->md.ct_state to "CS_TRACKED | CS_INVALID" but we don't
touch pkt->md.ct_zone. I think there are cases when this packet could
reach process_one() again further down the pipeline and we'd be reading
the uninitialized pkt->md.ct_zone.
Right now I don't see any other places where we set pkt->md.ct_state
while leaving pkt->md.ct_zone uninitialized. If that's true I can make
sure ct_zone is properly initialized here too and then the check you
suggested would be ok in all cases and we could get rid of the change in
pkt_metadata_init().
I'll give it a try and test it out to see if all scenarios are covered.
Thanks,
Dumitru
>>> +
>>> /* It can be expensive to zero out all of the tunnel metadata. However,
>>> * we can just zero out ip_dst and the rest of the data will never be
>>> * looked at. */
>>>
>>
>> Hi,
>>
>> Just a reminder about this patch.
>>
>> OVN LB harpinning system tests on OVN master and OVN-20.03 with OVS
>> userspace datapath fail because of the issue this patch addresses.
>>
>> Thanks,
>> Dumitru
>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
More information about the dev
mailing list