[ovs-discuss] Bad checksums observed with nsh encapsulation

Tue Jun 26 12:25:39 UTC 2018

Hi, Jaime

Thank you so much for trying this, Jiri Benc is expert on this. So cc him.

Jiri, do we need to calculate checksum before push_nsh if checksum offload is on?

-----Original Message-----
From: ovs-discuss-bounces at openvswitch.org [mailto:ovs-discuss-bounces at openvswitch.org] On Behalf Of Jaime Caamaño Ruiz
Sent: Monday, June 25, 2018 5:45 PM
To: jcaamano at suse.com; ovs-discuss at openvswitch.org
Subject: Re: [ovs-discuss] Bad checksums observed with nsh encapsulation

Hello

I looked a bit more into the issue.

This is happenning when OVS receives a CHECKSUM_PARTIAL. For a normal vm2vm non nsh scenario, OVS provides the same CHECKSUM_PARTIAL to the receiver which wont then verify the checksum.

But when we are pushing nsh headers, the first receiver may not be the final receiver and CHECKSUM_PARTIAL may not reach the final reciever which will then verify and reject a bad checksum.

So I think it may be necessary to handle the CHECKSUM_PARTIAL case on nsh_push, something like adding

if (skb->ip_summed == CHECKSUM_PARTIAL) {
    skb_checksum_help(skb);
}

Tried that and got rid of my problem.

Any thoughts?

BR
Jaime.

-----Original Message-----
From: Jaime Caamaño Ruiz <jcaamano at suse.de>
Reply-To: jcaamano at suse.com
To: jcaamano at suse.com, ovs-discuss at openvswitch.org
Subject: Re: [ovs-discuss] Bad checksums observed with nsh encapsulation
Date: Thu, 14 Jun 2018 18:15:10 +0200

Hello

I have done a follow-up test very similar to the previous one, but this time using two computes such that client and server reside in one of them and the vnf on the other one. This means that packets coming from either client/server that are being nsh encapsulated are then forwarded to the vnf compute egressing through a vxlan tunnel port (vxlan+eth+nsh+payload). 

In this scenario I dont observe the checksum problem. So it is a combination of nsh encasulation + tap port egress when the checksum is sometimes observed to be incorrect.

BR
Jaime.

-----Original Message-----
From: Jaime Caamaño Ruiz  <jcaamano at suse.de>
Reply-To: jcaamano at suse.com
To: ovs-discuss at openvswitch.org, jcaamano at suse.de
Subject: [ovs-discuss] Bad checksums observed with nsh encapsulation
Date: Wed, 13 Jun 2018 12:51:59 +0200

Hello

I am facing a problem where eth+nsh encapsulated packets egress OVS with incorrect checksum. 

The scenario is

client ---- vnf ---- server

all guests on the same host so this is vm2vm traffic, tap ports are directly added to the ovs bridge. TCP traffic from/to server port 80 is encapsulated with eth+nsh and traverse the vnf. I exercise the traffic by using nc both on client and server.

I include captures at the client [1] and at the vnf [2] where I attempt three tcp connections on port 80. The general observation is that packets generated on client/server are seen there with wrong checksums due to offloading but then arrive at the vnf with correct checksum. But not all of them. For the first conenction attempt you can see that SYN (frame 74) and ACK (78) are ok, but then FIN (79) is not ok. A retransmitted FIN (80) is still not ok and then a further FIN (93) retranmission is ok. Much of the same happens for the second attempt.
The third attempt shows a bad SYN (104) coming from the server.

Two additional observations:

- This does not happen if I try the same on a port different than 80 so that the traffic goes directly from the client to the server with no
eth+nsh encapsulation.

- This does not happen if I disable tx offloading both in the server and the client.

I include also the flows [3] and the ofproto trace [4] for the FIN (79), generated by the client, which is eth+nsh encapsulated and forwarded to the vnf. The decision on whether packet should be eth+nsh encapsulated or no happens on table 101 by setting reg2 which is then checked on 221. Packet is nsh encapsulated on table 222 and then ethernet encapsulated on table 83. If not encapsulated packet would go from 221 back to 220 and output there without any further actions.

Using OVS 2.9.2 with OVS tree kernel module. Kernel is 4.4.

I am understanding the problem correctly in regards to OVS being responsible for these checksums when offloading is enabled?
Any pointers on how I can debug this further?
Why would just some of the eth+nsh packets exhibit this problem and not all?
Why would these bad packets be ok after retransmissions?

[1] https://filebin.net/8mnypc2qm4vninof/client.pcap?t=b097kh0m
[2] https://filebin.net/8mnypc2qm4vninof/vnf_eth0.pcap?t=b097kh0m
[3] https://hastebin.com/nuhexufaze.sql
[4] https://hastebin.com/yevufanula.http

Thanks for your help,
Jaime.

_______________________________________________
discuss mailing list
discuss at openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
_______________________________________________
discuss mailing list
discuss at openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss