[ovs-dev] WFP and tunneling and packet fragments

Eitan Eliahu eliahue at vmware.com
Tue Aug 5 12:25:45 UTC 2014


Yes.
Eitan

-----Original Message-----
From: Samuel Ghinet [mailto:sghinet at cloudbasesolutions.com] 
Sent: Tuesday, August 05, 2014 5:12 AM
To: Eitan Eliahu
Cc: dev at openvswitch.org
Subject: RE: WFP and tunneling and packet fragments

Thanks Eitan!

Regarding point 2: do you mean to set the MTU from within the VM?
As I remember, I had found no powershell cmdlet that changes the MTU of a VNic.

Sam
________________________________________
From: Eitan Eliahu [eliahue at vmware.com]
Sent: Monday, August 04, 2014 5:17 AM
To: Samuel Ghinet
Subject: RE: WFP and tunneling and packet fragments

Sam,
Here are some answers for your comments:
[1] WFP is used for Rx only and as you mentioned for fragmented packets only.
[2] Setting the VM MTU to accommodate the tunnel header is the correct configuration.
[3] We need to match the external packet in the flow table as other VXLAN packets could be received. (The external port is set to promiscuous mode by the VM switch). (There might be other reasons as well).
Thanks,
Eitan

-----Original Message-----
From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Samuel Ghinet
Sent: Sunday, August 03, 2014 11:53 AM
To: dev at openvswitch.org
Subject: [ovs-dev] WFP and tunneling and packet fragments

Hello guys,

I have studied a bit more the part of your code that deals with tunneling and WFP.

A summary of the flow, as I understand it:

ON RECEIVE (from external):
A. If there's a VXLAN encapsulated packet coming from outside, one that is NOT fragmented, the flow is like this:
1. Extract packet info (i.e. flow key)
2. find flow
3. if flow found => out to port X (or to multiple ports) 3.1. else => send to userspace: a flow will be created to handle the vxlan encapsulated packets that are NOT fragmented (but we'll later need to make a new flow for vxlan encapsulated packets that are fragmented)

For the case we have a flow, we output to a port X (which should be the manag os nic) After received by the manag os, the WFP will come in and call the registered callout / callback. This will decapsulate the vxlan packet, find a flow for it, and then execute the actions on the decapsulated packet (e.g. output to port Y).

The problem I find here is that the search for a flow is done twice.

B. If there's a VXLAN encapsulated packet coming from outside, one that IS fragmented, the flow is similar:
1. Extract packet info (i.e. flow key)
2. find flow
3. if flow found => out to port X (or to multiple ports) 3.1. else => send to userspace: a flow will be created to handle the vxlan encapsulated packets that are fragmented (but we'll later need to make a new flow for vxlan encapsulated packets that are not fragmented)

For the case we have a flow, we output to a port X (which should be the manag os nic) After received by the manag os, the WFP will come in, reassemble the fragmented vxlan packets, and then call the registered callout / callback. This will decapsulate the vxlan packet, find a flow for it, and then execute the actions on the decapsulated packet (e.g. output to port Y).

Again we have two searches for flow for the same packet.

ON SEND (to external / VXLAN):
There are three situations, as I see them:
1. the packet is small, and thus not an LSO either => encapsulate and output, all is perfect

2. the packet is LSO. The only case I found it in my tests (as I remember) was if the packet was coming from the manag os. If LSO is enabled in a VM, then, when reaching the switch, it is already fragmented and no longer has LSO (as NBL info).
Regarding LSO packets coming from manag os: As I remember, packets can be LSO here.
However, I believe there is no practical case in which we need to do "if in port = manag os => out to VXLAN".
I mean, tunneling is used for outputting packets from VMs only, as I understand.

3. The packet is not an LSO, but packet size + encap additional bytes > MTU (e.g. packet size = 1500).
Here we have two cases:
3.1. The packet is coming from manag os: In this case, if we do a netsh to lower the MTU below 1500 (i.e. taking in to account the max encap additional bytes), then, when a packet will need to be encapsulated, the MTU in the driver will be 1500, and the packet will be, say, 1420, instead of 1500. So it will work ok.
3.2. If the packet is coming from a VM: In this case, as I had tested, lowering the MTU in the manag os below 1500 did not solve the problem, as the packets coming from that VM were having size = 1500, so after being encapsulated they were too big.

I understand there is no WFP part for the sending of packets (to external) - and I actually believe there would be no place for WFP on send to external, since WFP callouts are called in a higher level on the driver stack than our driver.

So I've got several questions:
1. For receive (from external):
1.1. if we detect that the packet is an encapsulated packet (e.g. VXLAN) and also fragmented, should we not better match the flow disregarding the fragment type?
1.2. Could there be any method to avoid double flow lookup for the received encapsulated packets? A way to do this, I'm thinking, would be to defer the flow lookup when the packet is encapsulated, and simply output it to manag os port (that's where it must go anywhere), and the flow lookup only to be done in the WFP callback.
But I'm not sure... do only manag os packets reach the WFP callback, or packets from VMs as well?

2. For send (to external / VXLAN):
2.1. Do you deal with non-LSO 1500 bytes packets that arrive from VMs and must be sent to VXLAN?
2.2. I personally believe it is no practical scenario to send packets coming from the manag os to VXLAN port. If you believe otherwise, please let me know.

Thanks!
Sam
_______________________________________________
dev mailing list
dev at openvswitch.org
https://urldefense.proofpoint.com/v1/url?u=http://openvswitch.org/mailman/listinfo/dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=yTvML8OxA42Jb6ViHe7fUXbvPVOYDPVq87w43doxtlY%3D%0A&m=h2eGELlO1TY5x%2F6q%2BrWLhIWKQWsrjS101oerjTT7XdE%3D%0A&s=ea7fcc480ae0607ac5e4f20afbf516746584dba74e78ecce09871211164e8612



More information about the dev mailing list