[ovs-dev] Cloning packets for "action: set field"
ssaurabh at vmware.com
Tue Aug 5 21:50:51 UTC 2014
> -----Original Message-----
> From: Samuel Ghinet [mailto:sghinet at cloudbasesolutions.com]
> Sent: Tuesday, August 05, 2014 5:00 AM
> To: Saurabh Shah
> Cc: dev at openvswitch.org
> Subject: RE: Cloning packets for "action: set field"
> Hello Saurabh,
> I believe there are more complexities associated to this packet cloning.
> Consider for instance the following flow:
> if (conditions on packet) =>
> out_to_port: vm1;
> set ip_dest: X1; out_to_port: vm2
> set ip_dest: X2; out_to_port: vm3
> Given that multiple destinations are set, and only then the packet is
> outputted, then, given the issue with cloning, all packets (that will be
> outputted to vm1, vm2, vm3) will have the same, last destination ip.
For the set case, we actually send out the packet to all the existing output ports before going ahead with the set. This is what the function OvsOutputBeforeSetAction does. So, at any given point we only have one NBL to deal with.
> I believe we cannot actually copy only the headers: if the packet contains a
> single MDL, there is nothing else we can do but to duplicate the packet.
> For the case where the NB has multiple MDLs chained, we must copy the first;
> MDL(s), i.e. the MDL or MDLs that contain the headers that must be copied.
We actually can copy parts of the header (even if they are in a single MDL), in fact our code has the infrastructure to do this. Please take a look at OvsPartialCopyNBL.
> There is also a problem with NBL completion (say we do the MDL copy and
> for the case above, consider:
> out_to_port: vm1; (NB1)
> set ip_dest: X1; out_to_port: vm2 (clone to NB2, create new MDL, store ptr
> to original MDL) set ip_dest: X2; out_to_port: vm3(clone to NB3, create new
> MDL, store ptr to original MDL)
> At completion, we may need to do this:
> NB3: Put back old MDL into original NB, free clone NB3
> NB2: Put back old MDL (but already done before), and free NB2
> NB1: it's the original, so nothing special.
> In order to implement this "but already done before" case, we need to use
> ref counting for NBL context data (which I have seen you do have some use
> of ref counting) However, consider the complexity of restoring an original NB
> when multiple MDLs have been replaced, by multiple clones. The big
> problem here is that the whole MDL link would be broken.
The Buffer Management subsystem is one of the major difference between both the drivers. Since doing a deep copy of all packets is pretty sub-optimal, we (Guolin, in particular) designed the sophisticated buffer management system. Yes, it is intricate at first, but we already have it! (And for a while now. :)) So, all that you have mentioned here has already been implemented. Please take a look at our Buffer Management code & let me know if something is still unclear.
> I strongly believe that building this component with the mind set "optimize
> first, fix bugs later" is not at all a good way to do it. Plus, it would make the
> code more and more intrincate, as we continue to both find new problems
> and we get to have large pieces of code (dealing with NBLs) spread all over.
> I personally believe a better approach would be "start with something
> simpler, which is fully functional even though suboptimal, and then continue
> by building on top of it, improving incrementally for each of various cases".
Quite the contrary. The whole point of implementing the buffer management code was to layout the right infrastructure to help simplify NBL management for all the consumers & avoid doing a major enhancement at a later point. I can go off on a rant explaining our rationales in detail, but to allay your concerns, let me assure you that we don't have a mindset of optimize-over-bugfix. Once the Netlink integration is complete, I expect all of us to work towards closing all critical bugs at a priority.
More information about the dev