[ovs-dev] The NET_BUFFER_LIST issue

Eitan Eliahu eliahue at vmware.com
Sat Aug 2 16:15:34 UTC 2014

Hi Samuel,
You are correct and we are well aware of this issue,  in fact  this issue was in our to do list for a long time. The plan is to create an NBL from each NB and to run the new created NBL(s) against the flow table.
If you guys have some free cycles please go ahead and implement it. 

-----Original Message-----
From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Samuel Ghinet
Sent: Saturday, August 02, 2014 3:08 AM
To: dev at openvswitch.org
Subject: [ovs-dev] The NET_BUFFER_LIST issue

Hello guys,

While working on the integration between our kernel netlink component and your driver, I had my attention drawn a bit more towards your implementation of the buffer management.
I expect that there was a discussion upon it, but I'd like to highlight a few things myself, hoping that together we'll come up with a solution that will work best for the future.

Namely, in the OvsExtractFlow function, in the code pushed upstream, the first NET_BUFFER of the NET_BUFFER_LIST is used as the basis for the flow key extraction.

I hope you had read this MSDN documentattion:

The relevant issues here, for our project, are:
"Each NET_BUFFER structure that is linked to a NET_BUFFER_LIST structure describes a single Ethernet frame."

"All NET_BUFFER structures that are associated with a NET_BUFFER_LIST structure must have the same Ethernet frame type and IP protocol version (IPv4 or IPv6)."

"All NET_BUFFER structures that are associated with a NET_BUFFER_LIST structure must have the same source and destination MAC addresses."

"If a driver is sending TCP or UDP frames, all of the NET_BUFFER structures that are associated with a NET_BUFFER_LIST structure must be associated with same TCP or UDP connection."


In other words, we have 3 special cases:
a) We've got one NET_BUFFER_LIST, that contains one NET_BUFFER - best case for us, nothing special to do.

b) We've got a TCP or UDP packet split in multiple NET_BUFFER-s, in a NET_BUFFER_LIST.
This is not a very good case for us, because the packet fields that are guaranteed to be the same for all NET_BUFFERs are:
eth, net layer info (transport protocol only), transport layer (source and dest port only), ipv4 info (source and dest ip) and arp info, and ipv6 info.

Whereas the fields that we have NO guarantee that are the same among NET_BUFFER-s, are:
net layer: TOS, TTL, fragment type
transport layer: tcp flags
ipv6 info: flow label.

c) We have something else than TCP / UDP packet (e.g. icmp4, icmp6, etc.):
NONE of these fields are guaranteed to be the same for each NET_BUFFER in the NET_BUFFER_LIST:
net layer: TOS, TTL, fragment type (don't know about mpls, where it fits) transpot layer: icmp type, icmp code (icmp4 and icmp6)
ipv4: src and dest address (they are the same only for UDP and TCP)
arp: perhaps the same problem as above, for ip address
ipv6: src address, dest address, flow label It can also happen (i.e. we have no guarantee otherwise) that we get a NET_BUFFER_LIST with 2 NET_BUFFERs: first is some icmp6, the second contains an ICMP6 Neighbor Discovery. By the current implementation of OvsExtractFlow , we would fail matching a flow for the Neighbor Discovery packet - instead, the flow for the first icmp6 packet would be used for both packets.


As a one line conclusion, given the current implementation of the buffer management mechanism, the flow mechanics can at any time create or match a wrong flow for a given NET_BUFFER.

Our implementation - create a NET_BUFFER_LIST for each existing NET_BUFFER, and wrap it in an OVS_NET_BUFFER - may be less efficient, but it takes into account all these cases, by the fact that it considers each NET_BUFFER in a NET_BUFFER_LIST as a totally different packet.
An improvement for our OVS_NET_BUFFER implementation would've been to have a cache of pre-allocated buffers, so as not to waste time allocating and de-allocating memory for them each time; also to take one NET_BUFFER packet in a NET_BUFFER_LIST as it is, without duplicating it.

I believe this can be considered an architectural issue, and should best be addressed early, rather than to continue to build functionality upon it, and come to change it later - given that the network packets are used in many places in the project.

I am eager to hear any suggestions. As well as, perhaps, how you thought to address this issue.
Samuel Ghinet

dev mailing list
dev at openvswitch.org

More information about the dev mailing list