[ovs-dev] [PATCH] Create a NBL for each NB when required

Nithin Raju nithin at vmware.com
Tue Sep 9 23:19:58 UTC 2014


Samuel and I discussed about this patch over IRC, and we agreed that it is best to split the patch into two:
1. Fix for handling NBL with mutiple NBs. Any refactoring of OvsStartNBLIngress() can be done as part of this patch.
2. Updating the parsing, packet io and actions code to handle NBs instead of NBLs. This is an optimization/refactoring.

So, we'll wait for the patches.

-- Nithin


On Sep 9, 2014, at 1:05 PM, Samuel Ghinet <sghinet at cloudbasesolutions.com> wrote:

> Thanks Saurabh for clarifications,
> 
> @ Saurabh and Nithin: so you want me to re-make the patches so that #1 would be refactor and #2 would be the bug fix? or for this one time we can go with #1 and #2 merged together?
> 
>> There's some reference to this convention as follows. datapath-windows/CodingStyle is an addendum to the OVS CodingStyle. It says in the top:
>> 
>> datapath-windows/CodingStyle:
>>  9 Most of the coding conventions applicable for the Open vSwitch distribution are
>> 10 applicable to the Windows kernel datapath as well.  There are some exceptions
>> 11 and new guidlines owing to the commonly followed practices in Windows
>> 12 kernel/driver code.  They are noted as follows:
>> 
>> and in OVS' CodingStyle, it says:
>> 441   Pointer declarators bind to the variable name, not the type name.
>> 442 Write "int *x", not "int* x" and definitely not "int * x".
>> 
>> If you want to mention this explicitly in datapath-windows/CodingStyle, please feel free. I am also OK with making 'datapath-windows/CodingStyle' an independent document in itself by duplicating the content from OVS CodingStyle, but personally, I prefer not to document the content.
> Nithin, I really believe it is difficult for one who is not very familiar with the (traditional) ovs CodingStyle to study the ovs CodingStyle, then datapath-windows/CodingStyle, then make comparisons and put together in his mind the resulting coding style.
> I personally believe that making the datapath-windows coding style independent is the best way to go.
> 
>> We should prefer #c in general, since the readability-wise, the name of the function stands out and is not shrouded by parameters. But, like you can see '(sourceVPort->portId == switchContext->externalPortId),' is too long. So, I prefer a style like this:
>> 
>> d)
>> status = OvsCreateAndAddPackets(OVS_DEFAULT_PACKET_QUEUE,
>>             NULL, 0, OVS_PACKET_CMD_MISS, sourceVPort->portNo,
>>             (key.tunKey.dst != 0 ? &key.tunKey : NULL),
>>             curNbl, (sourceVPort->portId == switchContext->externalPortId),
>>             &layers, switchContext, missedPackets, countMissedPackets);
>> 
>> I do recognize that #a and #b are also valid, but I prefer #c and #d. But in #c, the arg #2 onwards, they need to be moved one space to the right to align it with arg #1.
> I personally prefer the d) style (i.e. on the next line, next param goes after 4 spaces) for any function call.
> For "each param aligned with #1", it also looks somewhat strange when you have something like:
> LongTypeName variableName = LongFunctionName(firstParam,
> And the params #2 to #n would have too little space if aligned to param #1.
> 
> And I find it somewhat awkward to combine this style with the "each param aligned with #1" style.
> And every time I want to use d) I am thinking that someone out there might actually strongly prefer "each param aligned with #1" for that particular case.
> 
>> The optimization to use a completion list is only use when the NBLs originate from NDIS. ie. when OvsActionsExecute() is called from OvsStartNBLIngress(). The completion list is passed as an argument to OvsActionsExecute(), and is set in the 'ovsFwdCtx' using 'OvsInitForwardingCtx()'. Whenever, 'ovsFwdCtx' is re-initialized > using a copied NBL, then 'ovsFwdCtx->completionList' is always set to NULL so as to call OvsCompleteNBL() on the copied NBL. As you know, OvsCompleteNBL() triggers the completion of the original NBL that came from NDIS. The existing code is correct in that sense. If you see any discrepancy, please feel free to point > it out. We'll fix it.
> So, technically, in the NDIS Send callback, I should call at every exit point OvsCompleteNBL() only to decrease the reference counter by one, instead of decreasing it in the very beginning and not worry about any missed exit point.
> As for OvsTunnelPortTx, you mean I should use OvsInitForwardingCtx() instead of simply dropping, at the beginning, the reference to the splitNbl (last patch version)?
> 
> Thanks,
> Samuel
> ________________________________________
> From: Nithin Raju [nithin at vmware.com]
> Sent: Tuesday, September 09, 2014 6:36 PM
> To: Samuel Ghinet
> Cc: Alin Serdean; Saurabh Shah; dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH] Create a NBL for each NB when required
> 
> hi Sam,
> Pls. find my answers inlined. Pls. feel free to repost the patches for the two different functionalities.
> 
> On Sep 6, 2014, at 2:39 PM, Samuel Ghinet <sghinet at cloudbasesolutions.com>
> wrote:
> 
>> Yes, you were right with the TSO. I'll send a new version of the patch, though I am not sure how it would be easier for you to review.
> 
> Splitting the change into two would be a good idea, esp. since the two chance are not dependent on each other. I do appreciate the effort to optimize the code. Just that the changes are not related. Also, the optimization patch needs more work for the TSO case. Splitting up the patch, patch #1 can go in without any issue, and that I have heard you guys hit very often. So, let's get that out of the way.
> 
> I kind of paid attention to coding style specifically in this change since this the first big patch I was reviewing of yours, and I thought it might be a good idea to establish some practices. I wouldn't really point out all these things in future reviews :) I am not super particular about some of them, and I have indicated that with a "can be" or a "I'm don't feel strongly about it".
> 
>>> minor: NET_BUFFER* => PNET_BUFFER, VOID* => PVOID.
>> I am quite confused here, the CodingStyle says:
>>> It is a common practice to define a pointer type by prefixing the letter
>>> 'P' to a data type.  The same practice can be followed here as well.
>> It sounds as if I can choose NET_BUFFER* if I prefer so. Perhaps an update of the CodingStyle, saying "The same practice should be followed here as well".
>> Also, note that existing code uses both "NET_BUFFER *" and PNET_BUFFER.
>> We also have code like "OvsFlow *flow;" which we cannot put as "POvsFlow flow;"
>> Anyway, I fixed that here. I hope that in files where "NET_BUFFER *" and "NET_BUFFER_LIST *" style is already used, I don't need to change to PNET_BUFFER and PNET_BUFFER_LIST.
> 
> The CodingStyle is more of a guideline here. Hence the language "can be followed", rather than "must be followed". If you prefer to use '*', you are most welcome. Even amongst the developers here at VMware, there's not 100% consistency in prefixing 'P' or using '*' for a pointer. Personally, I have tried to use the 'P' convention recently, since it is closer to writing code in Windows in general. I don't want to impose it though, since there's no correctness issue either way. Pls. feel free to use what you are comfortable.
> 
> I agree that OvsFlow should be updated to OVS_FLOW in accordance to the CodingStyle.
> 
>>> Also, if you are declaring a pointer, the typical way is to put the '*' just before the variable name rather than just after the type name:
>>> ie. VOID *vlanTagValue rather than VOID* vlanTagValue.
>> I did not find it in the coding style of datapath-windows. Pershaps someone should update it. I will fix for the cases you mentioned.
> 
> Thanks for fixing them.
> 
> There's some reference to this convention as follows. datapath-windows/CodingStyle is an addendum to the OVS CodingStyle. It says in the top:
> 
> datapath-windows/CodingStyle:
>  9 Most of the coding conventions applicable for the Open vSwitch distribution are
> 10 applicable to the Windows kernel datapath as well.  There are some exceptions
> 11 and new guidlines owing to the commonly followed practices in Windows
> 12 kernel/driver code.  They are noted as follows:
> 
> and in OVS' CodingStyle, it says:
> 441   Pointer declarators bind to the variable name, not the type name.
> 442 Write "int *x", not "int* x" and definitely not "int * x".
> 
> If you want to mention this explicitly in datapath-windows/CodingStyle, please feel free. I am also OK with making 'datapath-windows/CodingStyle' an independent document in itself by duplicating the content from OVS CodingStyle, but personally, I prefer not to document the content.
> 
>>>> ASSERT(ovsFwdCtx->curNbl->FirstNetBuffer->Next == NULL);
>>> minor: you could use the macros, but I don't feel strongly:
>>> NET_BUFFER_NEXT_NB(NET_BUFFER_LIST_FIRST_NB(ovsFwdCtx->curNbl) == NULL);
>> I usually prefer the macros NET_BUFFER_NEXT_NB and NET_BUFFER_LIST_FIRST_NB, but I don't like combining them. That is, I think it is much cleaner as curNbl->FirstNetBuffer->Next.
> 
> NP. This is fine.
> 
>>> Generally, OvsDoFlowLookupOutput() is used for the second leg of the pipeline after encapsulation. There's an implicit assumptions in the code that a packet after encapsulation should be sent out to a 'PIF bridge'. Hence we wrote this function OvsDoFlowLookupOutput(). This is almost the same as OvsStartNBLIngress(), but unlike OvsStartNBLIngress() is not called from NDIS directly, and OvsDoFlowLookupOutput() ALWAYS works on an encapsulated value.
>> There is also the possibility that we get to encapsulate a VLAN-tagged packet. I am not sure the existing code handles this scenario when encapsulating the packet.
> 
> As I mentioned, OvsDoFlowLookupOutput() works on an encapsulated packet. We cannot have encapsulation (using VXLAN for eg) and also VLAN tagged on the outer packet at the same time. OVS doesn't do that in general. The current code handles this scenario. The actions for such a flows would look something like:
> Flow #1 on br-int:
> 1. <INPORT=VIF>,<inner packet's key> actions=set(..),<OUTPORT=tunnel port>
> 
> Flow #2 on br-pif:
> 2. <INPORT=internal port ie. VTEP>,<outer packet's key> actions=set_vlan<...>,<OUTPORT=physical/external port>
> 
> I am not sure if we can configure flow #2 using OVS. Even if we do, OvsActionsExecute() will do the right thing. Pls. look at the 'case OVS_ACTION_ATTR_PUSH_VLAN:'. We push the VLAN tag in the OOB data of the NBL.
> 
>>> Also, pls use _pCurNbl and _nblList as the parameter names for the macro.
>> Also,
>>> Generally, we have not used '_' prefix for function parameter names. We use them for macros etc, but not for functions. Do you prefer it this way? I am not against it, but I don't think it is necessary. For a macro, it is necessary to avoid conflicts in variable names.
>> Nithin, it would help if such rules would be specified in the CodingStyle, if other people agree with this, of course.
>> I personally do not prefer "_" prefixes for parameters. And I personally do not find it necessary: macros are usually very short, so there is very little possibility of confusion.
>> I see that other macros use the "_" prefix. So I will do the same for my macros.
> 
> Thanks. I'll update the datapath-windows/CodingStyle to indicate this.
> 
>>>> +        NdisAcquireRWLockRead(switchContext->dispatchLock, &lockState,
>>>> +            dispatch);
>> 
>>> minor: alignment of argument #3 can be with alignment of argument #1.
>> Nithin, I have quite seen different variations of style in calling functions. I am a bit confused on the style I should prefer.
>> 
>> Say, which style would you prefer here:
>> a)
>> status = OvsCreateAndAddPackets(OVS_DEFAULT_PACKET_QUEUE,
>>       NULL, 0, OVS_PACKET_CMD_MISS,
>>       sourceVPort->portNo,
>>       (key.tunKey.dst != 0 ? &key.tunKey : NULL),
>>       curNbl,
>>       (sourceVPort->portId == switchContext->externalPortId),
>>       &layers, switchContext,
>>       missedPackets, countMissedPackets);
>> 
>> b)
>> status = OvsCreateAndAddPackets(OVS_DEFAULT_PACKET_QUEUE, NULL, 0,
>>        OVS_PACKET_CMD_MISS, sourceVPort->portNo,
>>        (key.tunKey.dst != 0 ? &key.tunKey : NULL), curNbl,
>>        (sourceVPort->portId == switchContext->externalPortId), &layers,
>>        switchContext, missedPackets, countMissedPackets);
>> 
>> c)
>> status = OvsCreateAndAddPackets(OVS_DEFAULT_PACKET_QUEUE,
>>                               NULL,
>>                               0,
>>                               OVS_PACKET_CMD_MISS,
>>                               sourceVPort->portNo,
>>                               (key.tunKey.dst != 0 ? &key.tunKey : NULL),
>>                               curNbl,
>>                               (sourceVPort->portId == switchContext->externalPortId),
>>                               &layers,
>>                               switchContext,
>>                               missedPackets,
>>                               countMissedPackets);
> 
> 
> We should prefer #c in general, since the readability-wise, the name of the function stands out and is not shrouded by parameters. But, like you can see '(sourceVPort->portId == switchContext->externalPortId),' is too long. So, I prefer a style like this:
> 
> d)
> status = OvsCreateAndAddPackets(OVS_DEFAULT_PACKET_QUEUE,
>             NULL, 0, OVS_PACKET_CMD_MISS, sourceVPort->portNo,
>             (key.tunKey.dst != 0 ? &key.tunKey : NULL),
>             curNbl, (sourceVPort->portId == switchContext->externalPortId),
>             &layers, switchContext, missedPackets, countMissedPackets);
> 
> I do recognize that #a and #b are also valid, but I prefer #c and #d. But in #c, the arg #2 onwards, they need to be moved one space to the right to align it with arg #1.
> 
>> 
>>>> +            InterlockedDecrement((volatile LONG*)&ctx->refCount);
>> ?
>> 
>>> So, why do we decrement the refcount on the original NBL here? XXX
>> Also,
>> 
>>>> +            if (!pNewNbl) {
>>>>            RtlInitUnicodeString(&filterReason,
>>>> -                                     L"Cannot allocate external NBL context.");
>>>> +                    L"Cannot allocate new NBL: partial copy NB to "
>>>> +                    L"multiple NBLs.");
>>>> 
>>>>            OvsStartNBLIngressError(switchContext, curNbl,
>>>>                                    sendCompleteFlags, &filterReason,
>>>>                                    NDIS_STATUS_RESOURCES);
>>> 
>>> Since an external context has been allocated for the NBL using OvsInitExternalNBLContext(), you need to call OvsCompleteNBL(), and then NdisFSendNetBufferListsComplete(). This is nicely abstracted in OvsCompleteNBLIngress(). You can do dropit, and that will take care of it. If you want to report error, you'll probably >have to do something like:
>>> 
>>> OvsCompleteNBL();
>>> OvsStartNBLIngressError();
>>> 
>>> Also, I don't think you should be decrementing the refcount on the NBL directly. You should use OvsCompleteNBL() for it.
>> The problem is a bit more sensitive, since the method used here (i.e. before my modifs came) is not to Complete the packets each as it comes. Instead, they are queued in a completion list.
>> And I had to decrement the refCount of the original when doing a partial copy to multiple, in order for the original to get completed when all the partial copies were completed. Otherwise (if you don't deref), then, on complete (e.g. the NDIS complete callback) will not complete the original, because its refcount will be 1 - this happens with the completion list only.
>> NOTE: OvsCompleteNBLForwardingCtx does NOT call OvsCompleteNBL if a completion list is being used. And when we do multiple nbs -> multiple nbls, we have this exact case.
> 
> The optimization to use a completion list is only use when the NBLs originate from NDIS. ie. when OvsActionsExecute() is called from OvsStartNBLIngress(). The completion list is passed as an argument to OvsActionsExecute(), and is set in the 'ovsFwdCtx' using 'OvsInitForwardingCtx()'. Whenever, 'ovsFwdCtx' is re-initialized using a copied NBL, then 'ovsFwdCtx->completionList' is always set to NULL so as to call OvsCompleteNBL() on the copied NBL. As you know, OvsCompleteNBL() triggers the completion of the original NBL that came from NDIS. The existing code is correct in that sense. If you see any discrepancy, please feel free to point it out. We'll fix it.
> 
>>>> VOID
>>>> -OvsParseTcp(const NET_BUFFER_LIST *packet,
>>>> +OvsParseTcp(const NET_BUFFER *packet,
>>>>     L4Key *flow,
>>>>     POVS_PACKET_HDR_INFO layers)
>>> 
>>> Alignment of parameters #2 onwards is off.
>> Was misaligned before. The same for udp and icmpv6.
>> Anyway, I'll fix it.
> 
> Thank you. Appreciate it.
> 
>> 
>>>> @@ -228,6 +228,7 @@ OvsInjectPacketThroughActions(PNET_BUFFER_LIST pNbl,
>>>> OVS_DATAPATH *datapath = &gOvsSwitchContext->datapath;
>>>> 
>>>> ASSERT(gOvsSwitchContext);
>>>> +    ASSERT(pNbl->FirstNetBuffer->Next == NULL);
>>> 
>>> Just an FYI:
>>> We have the same check later:
>>>   curNb = NET_BUFFER_LIST_FIRST_NB(pNbl); <<<
>>>   ASSERT(curNb->Next == NULL);
>>> 
>>>   NdisAcquireRWLockRead(gOvsSwitchContext->dispatchLock, &lockState, dispatch);
>> I know. I put that assert above, at the 'prologue' of the function to be more visible. I could remove the "ASSERT(curNb->Next == NULL);" which is put below.
> 
> Sounds good. An additional ASSERT doesn't hurt.
> 
>>> You can be assured here also that IP helper won't call you back with the chain of NBs. Each callback returns the NBL that was enqueued. Your assert to check for nb->next == NULL is valid here. WE don't need the loop.
>> The fact that the function is not currently used had confused me.
> 
> Yes, it is a TBD :). I have a patch internally. Let me polish it up and submit it.
> 
>> Regarding OvsSlowPathDecapVxlan:
>>> You can be ASSURED that you'll not get an NBL with multiple NBs here. OvsInjectPacketThroughActions() has already checked for this case.
>> Nevertheless, it makes the code in OvsSlowPathDecapVxlan make it clear that only one NB is expected. Also, there is no telling how the code in the project would be changed in the future, and this additional check could help if one day somebody else calls OvsSlowPathDecapVxlan as well.
> 
> Sounds good. An additional ASSERT doesn't hurt.
> 
> thanks,
> Nithin




More information about the dev mailing list