[ovs-dev] [PATCH] Create a NBL for each NB when required

Samuel Ghinet sghinet at cloudbasesolutions.com
Sat Sep 6 21:39:47 UTC 2014


Nithin,

Thanks a lot for taking the time to review the code!

Yes, you were right with the TSO. I'll send a new version of the patch, though I am not sure how it would be easier for you to review.

> minor: NET_BUFFER* => PNET_BUFFER, VOID* => PVOID.
I am quite confused here, the CodingStyle says:
> It is a common practice to define a pointer type by prefixing the letter
> 'P' to a data type.  The same practice can be followed here as well.
It sounds as if I can choose NET_BUFFER* if I prefer so. Perhaps an update of the CodingStyle, saying "The same practice should be followed here as well".
Also, note that existing code uses both "NET_BUFFER *" and PNET_BUFFER. 
We also have code like "OvsFlow *flow;" which we cannot put as "POvsFlow flow;"
Anyway, I fixed that here. I hope that in files where "NET_BUFFER *" and "NET_BUFFER_LIST *" style is already used, I don't need to change to PNET_BUFFER and PNET_BUFFER_LIST.

> Also, if you are declaring a pointer, the typical way is to put the '*' just before the variable name rather than just after the type name:
> ie. VOID *vlanTagValue rather than VOID* vlanTagValue.
I did not find it in the coding style of datapath-windows. Pershaps someone should update it. I will fix for the cases you mentioned.

>> ASSERT(ovsFwdCtx->curNbl->FirstNetBuffer->Next == NULL);
> minor: you could use the macros, but I don't feel strongly:
> NET_BUFFER_NEXT_NB(NET_BUFFER_LIST_FIRST_NB(ovsFwdCtx->curNbl) == NULL);
I usually prefer the macros NET_BUFFER_NEXT_NB and NET_BUFFER_LIST_FIRST_NB, but I don't like combining them. That is, I think it is much cleaner as curNbl->FirstNetBuffer->Next.

> Generally, OvsDoFlowLookupOutput() is used for the second leg of the pipeline after encapsulation. There's an implicit assumptions in the code that a packet after encapsulation should be sent out to a 'PIF bridge'. Hence we wrote this function OvsDoFlowLookupOutput(). This is almost the same as OvsStartNBLIngress(), but unlike OvsStartNBLIngress() is not called from NDIS directly, and OvsDoFlowLookupOutput() ALWAYS works on an encapsulated value.
There is also the possibility that we get to encapsulate a VLAN-tagged packet. I am not sure the existing code handles this scenario when encapsulating the packet.

> Also, pls use _pCurNbl and _nblList as the parameter names for the macro.
Also,
> Generally, we have not used '_' prefix for function parameter names. We use them for macros etc, but not for functions. Do you prefer it this way? I am not against it, but I don't think it is necessary. For a macro, it is necessary to avoid conflicts in variable names.
Nithin, it would help if such rules would be specified in the CodingStyle, if other people agree with this, of course.
I personally do not prefer "_" prefixes for parameters. And I personally do not find it necessary: macros are usually very short, so there is very little possibility of confusion.
I see that other macros use the "_" prefix. So I will do the same for my macros.

>> +        NdisAcquireRWLockRead(switchContext->dispatchLock, &lockState,
>> +            dispatch);

> minor: alignment of argument #3 can be with alignment of argument #1.
Nithin, I have quite seen different variations of style in calling functions. I am a bit confused on the style I should prefer.

Say, which style would you prefer here:
a) 
status = OvsCreateAndAddPackets(OVS_DEFAULT_PACKET_QUEUE,
        NULL, 0, OVS_PACKET_CMD_MISS,
        sourceVPort->portNo,
        (key.tunKey.dst != 0 ? &key.tunKey : NULL),
        curNbl,
        (sourceVPort->portId == switchContext->externalPortId),
        &layers, switchContext,
        missedPackets, countMissedPackets);

b)
status = OvsCreateAndAddPackets(OVS_DEFAULT_PACKET_QUEUE, NULL, 0,
         OVS_PACKET_CMD_MISS, sourceVPort->portNo,
         (key.tunKey.dst != 0 ? &key.tunKey : NULL), curNbl,
         (sourceVPort->portId == switchContext->externalPortId), &layers,
         switchContext, missedPackets, countMissedPackets);

c)
status = OvsCreateAndAddPackets(OVS_DEFAULT_PACKET_QUEUE,
                                NULL,
                                0,
                                OVS_PACKET_CMD_MISS,
                                sourceVPort->portNo,
                                (key.tunKey.dst != 0 ? &key.tunKey : NULL),
                                curNbl,
                                (sourceVPort->portId == switchContext->externalPortId),
                                &layers,
                                switchContext,
                                missedPackets,
                                countMissedPackets);

> > +            InterlockedDecrement((volatile LONG*)&ctx->refCount);
?

> So, why do we decrement the refcount on the original NBL here? XXX
Also,

>> +            if (!pNewNbl) {
>>              RtlInitUnicodeString(&filterReason,
>> -                                     L"Cannot allocate external NBL context.");
>> +                    L"Cannot allocate new NBL: partial copy NB to "
>> +                    L"multiple NBLs.");
>>
>>              OvsStartNBLIngressError(switchContext, curNbl,
>>                                      sendCompleteFlags, &filterReason,
>>                                      NDIS_STATUS_RESOURCES);
>
>Since an external context has been allocated for the NBL using OvsInitExternalNBLContext(), you need to call OvsCompleteNBL(), and then NdisFSendNetBufferListsComplete(). This is nicely abstracted in OvsCompleteNBLIngress(). You can do dropit, and that will take care of it. If you want to report error, you'll probably >have to do something like:
>
>OvsCompleteNBL();
>OvsStartNBLIngressError();
>
>Also, I don't think you should be decrementing the refcount on the NBL directly. You should use OvsCompleteNBL() for it.
The problem is a bit more sensitive, since the method used here (i.e. before my modifs came) is not to Complete the packets each as it comes. Instead, they are queued in a completion list.
And I had to decrement the refCount of the original when doing a partial copy to multiple, in order for the original to get completed when all the partial copies were completed. Otherwise (if you don't deref), then, on complete (e.g. the NDIS complete callback) will not complete the original, because its refcount will be 1 - this happens with the completion list only.
NOTE: OvsCompleteNBLForwardingCtx does NOT call OvsCompleteNBL if a completion list is being used. And when we do multiple nbs -> multiple nbls, we have this exact case.


>> VOID
>> -OvsParseTcp(const NET_BUFFER_LIST *packet,
>> +OvsParseTcp(const NET_BUFFER *packet,
>>       L4Key *flow,
>>       POVS_PACKET_HDR_INFO layers)
>
>Alignment of parameters #2 onwards is off.
Was misaligned before. The same for udp and icmpv6.
Anyway, I'll fix it.

>> @@ -228,6 +228,7 @@ OvsInjectPacketThroughActions(PNET_BUFFER_LIST pNbl,
>>  OVS_DATAPATH *datapath = &gOvsSwitchContext->datapath;
>>
>>  ASSERT(gOvsSwitchContext);
>> +    ASSERT(pNbl->FirstNetBuffer->Next == NULL);
>
>Just an FYI:
>We have the same check later:
>     curNb = NET_BUFFER_LIST_FIRST_NB(pNbl); <<<
>     ASSERT(curNb->Next == NULL);
>
>     NdisAcquireRWLockRead(gOvsSwitchContext->dispatchLock, &lockState, dispatch);
I know. I put that assert above, at the 'prologue' of the function to be more visible. I could remove the "ASSERT(curNb->Next == NULL);" which is put below.

> You can be assured here also that IP helper won't call you back with the chain of NBs. Each callback returns the NBL that was enqueued. Your assert to check for nb->next == NULL is valid here. WE don't need the loop.
The fact that the function is not currently used had confused me.

Regarding OvsSlowPathDecapVxlan:
> You can be ASSURED that you'll not get an NBL with multiple NBs here. OvsInjectPacketThroughActions() has already checked for this case.
Nevertheless, it makes the code in OvsSlowPathDecapVxlan make it clear that only one NB is expected. Also, there is no telling how the code in the project would be changed in the future, and this additional check could help if one day somebody else calls OvsSlowPathDecapVxlan as well.

Thanks,
Samuel
________________________________________
From: Nithin Raju [nithin at vmware.com]
Sent: Friday, August 29, 2014 11:10 PM
To: Alin Serdean
Cc: dev at openvswitch.org; Samuel Ghinet
Subject: Re: [ovs-dev] [PATCH] Create a NBL for each NB when required

hi Sam/Alin,
Thanks for working on this.

In general, it would have been better if the two changes:
1. handle NBL with multiple NBs
2. Optimizing the parsing code to handle NBs rather than NBLs.

could have been split into different review. I was not a big fan of split up changes into multiple reviews, but I am coming around to realizing that it is easier to review :)

Anyway, change #1 looks. Regaring change #2, I am not sure how it would work with TSOed packet that gets encapsulated. Eg. if a packet is 16k, and gets split into 16 packets, and each of them gets encapsulated, all of them are put into the same NBL as different NBs. Do look at OvsDoEncapVxlan() -> OvsTcpSegmentNBL(). Such a packet would be passed through OvsDoFlowLookupOutput() -> Flow_Extract(), and I think your current change might break that code.

Did you get a chance to verify this change with a TSOed packet? If it is working, that is great.

Regardless, I have given comments to the code. Pls. have a look.

On Aug 19, 2014, at 5:31 AM, Alin Serdean <aserdean at cloudbasesolutions.com> wrote:

> ovs/ovs-issues#15
>
> All NET_BUFFERs of a NET_BUFFER_LIST must go through
> the pipeline: extract, find flow, execute. Previously,
> only the first NET_BUFFER of a NET_BUFFER_LIST was going
> through this pipeline, which was erroneous.
>
> OvsPartialCopyToMultipleNBLs is used to make each NET_BUFFER
> have its own NET_BUFFER_LIST.
>
> Some functions that used to take as argument a NET_BUFFER_LIST
> now take as argument a NET_BUFFER. I have also added a few
> ASSERTs where the NET_BUFFER_LIST is expected to have
> only one NET_BUFFER."
>
> Signed-off-by: Samuel Ghinet <sghinet at cloudbasesolutions.com>
> Co-authored-by: Alin Gabriel Serdean <aserdean at cloudbasesolutions.com>
> ---
> datapath-windows/ovsext/OvsActions.c      |  20 ++++-
> datapath-windows/ovsext/OvsBufferMgmt.c   |   8 +-
> datapath-windows/ovsext/OvsBufferMgmt.h   |   5 ++
> datapath-windows/ovsext/OvsChecksum.c     |   3 +-
> datapath-windows/ovsext/OvsFlow.c         |  44 +++++----
> datapath-windows/ovsext/OvsFlow.h         |   6 +-
> datapath-windows/ovsext/OvsPacketIO.c     | 143 +++++++++++++++++++-----------
> datapath-windows/ovsext/OvsPacketParser.c |  24 ++---
> datapath-windows/ovsext/OvsPacketParser.h |  30 ++++---
> datapath-windows/ovsext/OvsTunnel.c       |  12 ++-
> datapath-windows/ovsext/OvsUser.c         |  19 +++-
> datapath-windows/ovsext/OvsVxlan.c        |  37 ++++++--
> 12 files changed, 221 insertions(+), 130 deletions(-)
>
> diff --git a/datapath-windows/ovsext/OvsActions.c b/datapath-windows/ovsext/OvsActions.c
> index 635becc..c5c5116 100644
> --- a/datapath-windows/ovsext/OvsActions.c
> +++ b/datapath-windows/ovsext/OvsActions.c
> @@ -517,6 +517,9 @@ OvsDoFlowLookupOutput(OvsForwardingContext *ovsFwdCtx)
>  OvsFlow *flow;
>  UINT64 hash;
>  NDIS_STATUS status;
> +    VOID* vlanTagValue;
> +    NET_BUFFER* pNb;

minor: NET_BUFFER* => PNET_BUFFER, VOID* => PVOID.

Also, if you are declaring a pointer, the typical way is to put the '*' just before the variable name rather than just after the type name:
ie. VOID *vlanTagValue rather than VOID* vlanTagValue.

> +
>  POVS_VPORT_ENTRY vport =
>      OvsFindVportByPortNo(ovsFwdCtx->switchContext, ovsFwdCtx->srcVportNo);
>  if (vport == NULL || vport->ovsState != OVS_STATE_CONNECTED) {
> @@ -527,12 +530,19 @@ OvsDoFlowLookupOutput(OvsForwardingContext *ovsFwdCtx)
>      return NDIS_STATUS_SUCCESS;
>  }
>  ASSERT(vport->nicState == NdisSwitchNicStateConnected);
> +    ASSERT(ovsFwdCtx->curNbl->FirstNetBuffer->Next == NULL);

So, this WILL not be true for packets that packets TSOed packets that get encapsulated. Eg. if a packet is 16k, and gets split into 16 packets, and each of them gets encapsulated, all of them are put into the same NBL. Do look at OvsDoEncapVxlan() -> OvsTcpSegmentNBL(). If you've updated that logic also, it is fine. XXX.

minor: you could use the macros, but I don't feel strongly:
NET_BUFFER_NEXT_NB(NET_BUFFER_LIST_FIRST_NB(ovsFwdCtx->curNbl) == NULL);

>  /* Assert that in the Rx direction, key is always setup. */
>  ASSERT(ovsFwdCtx->tunnelRxNic == NULL || ovsFwdCtx->tunKey.dst != 0);
> -    status = OvsExtractFlow(ovsFwdCtx->curNbl, ovsFwdCtx->srcVportNo,
> -                          &key, &ovsFwdCtx->layers, ovsFwdCtx->tunKey.dst != 0 ?
> -                                         &ovsFwdCtx->tunKey : NULL);
> +    vlanTagValue = NET_BUFFER_LIST_INFO(ovsFwdCtx->curNbl,
> +                                        Ieee8021QNetBufferListInfo);

Generally, OvsDoFlowLookupOutput() is used for the second leg of the pipeline after encapsulation. There's an implicit assumptions in the code that a packet after encapsulation should be sent out to a 'PIF bridge'. Hence we wrote this function OvsDoFlowLookupOutput(). This is almost the same as OvsStartNBLIngress(), but unlike OvsStartNBLIngress() is not called from NDIS directly, and OvsDoFlowLookupOutput() ALWAYS works on an encapsulated value.

So, I am not sure if we should be extracting the vlanTagValue here.

That said, the implicit assumption of the existance of a 'PIF bridge', and hence the call to OvsDoFlowLookupOutput() should be broken at some point. Linux does it by sending the packet out to the host IP stack, and then letting the host IP stack call into OVS again if there's a PIF bridge. We'll worry about this in the future.

> +
> +    pNb = NET_BUFFER_LIST_FIRST_NB(ovsFwdCtx->curNbl);
> +
> +    status = OvsExtractFlow(pNb, ovsFwdCtx->srcVportNo,
> +                            &key, &ovsFwdCtx->layers,
> +                            ovsFwdCtx->tunKey.dst != 0 ?
> +                            &ovsFwdCtx->tunKey : NULL, vlanTagValue);
>  if (status != NDIS_STATUS_SUCCESS) {
>      OvsCompleteNBLForwardingCtx(ovsFwdCtx,
>                                  L"OVS-Flow extract failed");
> @@ -542,7 +552,7 @@ OvsDoFlowLookupOutput(OvsForwardingContext *ovsFwdCtx)
>
>  flow = OvsLookupFlow(&ovsFwdCtx->switchContext->datapath, &key, &hash, FALSE);
>  if (flow) {
> -        OvsFlowUsed(flow, ovsFwdCtx->curNbl, &ovsFwdCtx->layers);
> +        OvsFlowUsed(flow, pNb, &ovsFwdCtx->layers);
>      ovsFwdCtx->switchContext->datapath.hits++;
>      status = OvsActionsExecute(ovsFwdCtx->switchContext,
>                               ovsFwdCtx->completionList, ovsFwdCtx->curNbl,
> @@ -1370,6 +1380,8 @@ OvsActionsExecute(POVS_SWITCH_CONTEXT switchContext,
>  PNDIS_SWITCH_FORWARDING_DETAIL_NET_BUFFER_LIST_INFO fwdDetail =
>      NET_BUFFER_LIST_SWITCH_FORWARDING_DETAIL(curNbl);
>
> +    ASSERT(curNbl->FirstNetBuffer->Next == NULL);
> +
>  /* XXX: ASSERT that the flow table lock is held. */
>  status = OvsInitForwardingCtx(&ovsFwdCtx, switchContext, curNbl, portNo,
>                                sendFlags, fwdDetail, completionList,

LG.

Overall, we have enforced that curNbl holds only one NB, and we have made changes to Flow_Extract() to take in a VLAN tag. No big deal.


> diff --git a/datapath-windows/ovsext/OvsBufferMgmt.c b/datapath-windows/ovsext/OvsBufferMgmt.c
> index 8aa8060..c367d2f 100644
> --- a/datapath-windows/ovsext/OvsBufferMgmt.c
> +++ b/datapath-windows/ovsext/OvsBufferMgmt.c
> @@ -863,7 +863,7 @@ OvsPartialCopyToMultipleNBLs(PVOID ovsContext,
>      if (prevNbl == NULL) {
>          firstNbl = newNbl;
>      } else {
> -            NET_BUFFER_LIST_NEXT_NBL(prevNbl) = nbl;
> +            NET_BUFFER_LIST_NEXT_NBL(prevNbl) = newNbl;

good catch!

>          NET_BUFFER_NEXT_NB(prevNb) = nb;
>      }
>      prevNbl = newNbl;
> @@ -1085,7 +1085,7 @@ nblcopy_error:
> * --------------------------------------------------------------------------
> */
> static NDIS_STATUS
> -GetSegmentHeaderInfo(PNET_BUFFER_LIST nbl,
> +GetSegmentHeaderInfo(PNET_BUFFER nb,
>                   const POVS_PACKET_HDR_INFO hdrInfo,
>                   UINT32 *hdrSize, UINT32 *seqNumber)
> {
> @@ -1093,7 +1093,7 @@ GetSegmentHeaderInfo(PNET_BUFFER_LIST nbl,
>  const TCPHdr *tcp;
>
>  /* Parse the orginal Eth/IP/TCP header */
> -    tcp = OvsGetPacketBytes(nbl, sizeof *tcp, hdrInfo->l4Offset, &tcpStorage);
> +    tcp = OvsGetPacketBytes(nb, sizeof *tcp, hdrInfo->l4Offset, &tcpStorage);
>  if (tcp == NULL) {
>      return NDIS_STATUS_FAILURE;
>  }
> @@ -1199,7 +1199,7 @@ OvsTcpSegmentNBL(PVOID ovsContext,
>  ASSERT(NET_BUFFER_NEXT_NB(nb) == NULL);
>
>  /* Figure out the segment header size */
> -    status = GetSegmentHeaderInfo(nbl, hdrInfo, &hdrSize, &seqNumber);
> +    status = GetSegmentHeaderInfo(nb, hdrInfo, &hdrSize, &seqNumber);
>  if (status != NDIS_STATUS_SUCCESS) {
>      OVS_LOG_INFO("Cannot parse NBL header");
>      return NULL;

LG.

> diff --git a/datapath-windows/ovsext/OvsBufferMgmt.h b/datapath-windows/ovsext/OvsBufferMgmt.h
> index 9c00b1b..8e822e0 100644
> --- a/datapath-windows/ovsext/OvsBufferMgmt.h
> +++ b/datapath-windows/ovsext/OvsBufferMgmt.h
> @@ -81,6 +81,11 @@ typedef struct _OVS_NBL_POOL {
> #endif
> } OVS_NBL_POOL, *POVS_NBL_POOL;
>
> +#define OVS_NBL_FOR_EACH(pCurNbl, nblList)         \
> +for (NET_BUFFER_LIST* pCurNbl = nblList;           \
> +    pCurNbl != NULL;                               \
> +    pCurNbl = NET_BUFFER_LIST_NEXT_NBL(pCurNbl))
> +

minor: alignment of 'pCurNbl' should be one space to the right.
Also, pls use _pCurNbl and _nblList as the parameter names for the macro.

> NDIS_STATUS OvsInitBufferPool(PVOID context);
> VOID OvsCleanupBufferPool(PVOID context);
> diff --git a/datapath-windows/ovsext/OvsChecksum.c b/datapath-windows/ovsext/OvsChecksum.c
> index e192373..0c65221 100644
> --- a/datapath-windows/ovsext/OvsChecksum.c
> +++ b/datapath-windows/ovsext/OvsChecksum.c
> @@ -536,7 +536,8 @@ OvsValidateIPChecksum(PNET_BUFFER_LIST curNbl,
>
>  /* Next, check if the NIC did not validate the RX checksum. */
>  if (!csumInfo.Receive.IpChecksumSucceeded) {
> -        ipHdr = OvsGetIp(curNbl, hdrInfo->l3Offset, &ip_storage);
> +        ipHdr = OvsGetIp(NET_BUFFER_LIST_FIRST_NB(curNbl),
> +                         hdrInfo->l3Offset, &ip_storage);
>      if (ipHdr) {
>          ip_storage = *ipHdr;
>          hdrChecksum = ipHdr->check;

LG.

> diff --git a/datapath-windows/ovsext/OvsFlow.c b/datapath-windows/ovsext/OvsFlow.c
> index daa64e0..19f6b2a 100644
> --- a/datapath-windows/ovsext/OvsFlow.c
> +++ b/datapath-windows/ovsext/OvsFlow.c
> @@ -104,7 +104,7 @@ OvsAllocateFlowTable(OVS_DATAPATH *datapath,
>
> /*
> *----------------------------------------------------------------------------
> - *  GetStartAddrNBL --
> + *  GetStartAddrNB --
> *    Get the virtual address of the frame.
> *
> *  Results:
> @@ -112,7 +112,7 @@ OvsAllocateFlowTable(OVS_DATAPATH *datapath,
> *----------------------------------------------------------------------------
> */
> static __inline VOID *
> -GetStartAddrNBL(const NET_BUFFER_LIST *_pNB)
> +GetStartAddrNB(const NET_BUFFER *_pNB)

Generally, we have not used '_' prefix for function parameter names. We use them for macros etc, but not for functions. Do you prefer it this way? I am not against it, but I don't think it is necessary. For a macro, it is necessary to avoid conflicts in variable names.

> {
>  PMDL curMdl;
>  PUINT8 curBuffer;
> @@ -121,21 +121,21 @@ GetStartAddrNBL(const NET_BUFFER_LIST *_pNB)
>  ASSERT(_pNB);
>
>  // Ethernet Header is a guaranteed safe access.
> -    curMdl = (NET_BUFFER_LIST_FIRST_NB(_pNB))->CurrentMdl;
> +    curMdl = (_pNB)->CurrentMdl;
>  curBuffer =  MmGetSystemAddressForMdlSafe(curMdl, LowPagePriority);
>  if (!curBuffer) {
>      return NULL;
>  }
>
>  curHeader = (PEthHdr)
> -    (curBuffer + (NET_BUFFER_LIST_FIRST_NB(_pNB))->CurrentMdlOffset);
> +        (curBuffer + (_pNB)->CurrentMdlOffset);

minor: lines can be joined.

>
>  return (VOID *) curHeader;
> }

LG.


> VOID
> OvsFlowUsed(OvsFlow *flow,
> -            const NET_BUFFER_LIST *packet,
> +            const NET_BUFFER *packet,
>          const POVS_PACKET_HDR_INFO layers)
> {
>  LARGE_INTEGER tickCount;
> @@ -144,7 +144,7 @@ OvsFlowUsed(OvsFlow *flow,
>  flow->used = tickCount.QuadPart * ovsTimeIncrementPerTick;
>  flow->used += ovsUserTimestampDelta;
>  flow->packetCount++;
> -    flow->byteCount += OvsPacketLenNBL(packet);
> +    flow->byteCount += NET_BUFFER_DATA_LENGTH(packet);
>  flow->tcpFlags |= OvsGetTcpFlags(packet, &flow->key, layers);
> }
LG

>
> @@ -191,15 +191,14 @@ DeleteAllFlows(OVS_DATAPATH *datapath)
> *----------------------------------------------------------------------------
> */
> NDIS_STATUS
> -OvsExtractFlow(const NET_BUFFER_LIST *packet,
> +OvsExtractFlow(const NET_BUFFER* pNb,
>             UINT32 inPort,
>             OvsFlowKey *flow,
>             POVS_PACKET_HDR_INFO layers,
> -               OvsIPv4TunnelKey *tunKey)
> +               OvsIPv4TunnelKey *tunKey, VOID* vlanTagValue)

'vlanTagValue' should be on a new line. Also, pls. use PVOID.

> {
>  struct Eth_Header *eth;
>  UINT8 offset = 0;
> -    PVOID vlanTagValue;
>
>  layers->value = 0;
>
> @@ -214,20 +213,19 @@ OvsExtractFlow(const NET_BUFFER_LIST *packet,
>
>  flow->l2.inPort = inPort;
>
> -    if ( OvsPacketLenNBL(packet) < ETH_HEADER_LEN_DIX) {
> +    if (NET_BUFFER_DATA_LENGTH(pNb) < ETH_HEADER_LEN_DIX) {
>      flow->l2.keyLen = OVS_WIN_TUNNEL_KEY_SIZE + 8 - flow->l2.offset;
>      return NDIS_STATUS_SUCCESS;
>  }
>
>  /* Link layer. */
> -    eth = (Eth_Header *)GetStartAddrNBL((NET_BUFFER_LIST *)packet);
> +    eth = (Eth_Header *)GetStartAddrNB(pNb);
>  memcpy(flow->l2.dlSrc, eth->src, ETH_ADDR_LENGTH);
>  memcpy(flow->l2.dlDst, eth->dst, ETH_ADDR_LENGTH);
>
>  /*
>   * vlan_tci.
>   */
> -    vlanTagValue = NET_BUFFER_LIST_INFO(packet, Ieee8021QNetBufferListInfo);
>  if (vlanTagValue) {
>      PNDIS_NET_BUFFER_LIST_8021Q_INFO vlanTag =
>          (PNDIS_NET_BUFFER_LIST_8021Q_INFO)(PVOID *)&vlanTagValue;
> @@ -262,7 +260,7 @@ OvsExtractFlow(const NET_BUFFER_LIST *packet,
>  if (ETH_TYPENOT8023(eth->dix.typeNBO)) {
>      flow->l2.dlType = eth->dix.typeNBO;
>      layers->l3Offset = ETH_HEADER_LEN_DIX + offset;
> -    } else if (OvsPacketLenNBL(packet)  >= ETH_HEADER_LEN_802_3 &&
> +    } else if (NET_BUFFER_DATA_LENGTH(pNb)  >= ETH_HEADER_LEN_802_3 &&
>            eth->e802_3.llc.dsap == 0xaa &&
>            eth->e802_3.llc.ssap == 0xaa &&
>            eth->e802_3.llc.control == ETH_LLC_CONTROL_UFRAME &&
> @@ -285,7 +283,7 @@ OvsExtractFlow(const NET_BUFFER_LIST *packet,
>
>      flow->l2.keyLen += OVS_IP_KEY_SIZE;
>      layers->isIPv4 = 1;
> -        nh = OvsGetIp(packet, layers->l3Offset, &ip_storage);
> +        nh = OvsGetIp(pNb, layers->l3Offset, &ip_storage);
>      if (nh) {
>          layers->l4Offset = layers->l3Offset + nh->ihl * 4;
>
> @@ -309,14 +307,14 @@ OvsExtractFlow(const NET_BUFFER_LIST *packet,
>
>          if (!(nh->frag_off & htons(IP_OFFSET))) {
>              if (ipKey->nwProto == SOCKET_IPPROTO_TCP) {
> -                    OvsParseTcp(packet, &ipKey->l4, layers);
> +                    OvsParseTcp(pNb, &ipKey->l4, layers);
>              } else if (ipKey->nwProto == SOCKET_IPPROTO_UDP) {
> -                    OvsParseUdp(packet, &ipKey->l4, layers);
> +                    OvsParseUdp(pNb, &ipKey->l4, layers);
>              } else if (ipKey->nwProto == SOCKET_IPPROTO_ICMP) {
>                  ICMPHdr icmpStorage;
>                  const ICMPHdr *icmp;
>
> -                    icmp = OvsGetIcmp(packet, layers->l4Offset, &icmpStorage);
> +                    icmp = OvsGetIcmp(pNb, layers->l4Offset, &icmpStorage);
>                  if (icmp) {
>                      ipKey->l4.tpSrc = htons(icmp->type);
>                      ipKey->l4.tpDst = htons(icmp->code);
> @@ -331,7 +329,7 @@ OvsExtractFlow(const NET_BUFFER_LIST *packet,
>  } else if (flow->l2.dlType == htons(ETH_TYPE_IPV6)) {
>      NDIS_STATUS status;
>      flow->l2.keyLen += OVS_IPV6_KEY_SIZE;
> -        status = OvsParseIPv6(packet, flow, layers);
> +        status = OvsParseIPv6(pNb, flow, layers);
>      if (status != NDIS_STATUS_SUCCESS) {
>          memset(&flow->ipv6Key, 0, sizeof (Ipv6Key));
>          return status;
> @@ -342,11 +340,11 @@ OvsExtractFlow(const NET_BUFFER_LIST *packet,
>      flow->ipv6Key.pad = 0;
>
>      if (flow->ipv6Key.nwProto == SOCKET_IPPROTO_TCP) {
> -            OvsParseTcp(packet, &(flow->ipv6Key.l4), layers);
> +            OvsParseTcp(pNb, &(flow->ipv6Key.l4), layers);
>      } else if (flow->ipv6Key.nwProto == SOCKET_IPPROTO_UDP) {
> -            OvsParseUdp(packet, &(flow->ipv6Key.l4), layers);
> +            OvsParseUdp(pNb, &(flow->ipv6Key.l4), layers);
>      } else if (flow->ipv6Key.nwProto == SOCKET_IPPROTO_ICMPV6) {
> -            OvsParseIcmpV6(packet, flow, layers);
> +            OvsParseIcmpV6(pNb, flow, layers);
>          flow->l2.keyLen += (OVS_ICMPV6_KEY_SIZE - OVS_IPV6_KEY_SIZE);
>      }
>  } else if (flow->l2.dlType == htons(ETH_TYPE_ARP)) {
> @@ -357,7 +355,7 @@ OvsExtractFlow(const NET_BUFFER_LIST *packet,
>      ((UINT64 *)arpKey)[1] = 0;
>      ((UINT64 *)arpKey)[2] = 0;
>      flow->l2.keyLen += OVS_ARP_KEY_SIZE;
> -        arp = OvsGetArp(packet, layers->l3Offset, &arpStorage);
> +        arp = OvsGetArp(pNb, layers->l3Offset, &arpStorage);
>      if (arp && arp->ea_hdr.ar_hrd == htons(1) &&
>          arp->ea_hdr.ar_pro == htons(ETH_TYPE_IPV4) &&
>          arp->ea_hdr.ar_hln == ETH_ADDR_LENGTH &&
> @@ -420,9 +418,7 @@ AddFlow(OVS_DATAPATH *datapath, OvsFlow *flow)
>   */
>  KeMemoryBarrier();
>
> -    //KeAcquireSpinLock(&FilterDeviceExtension->NblQueueLock, &oldIrql);
>  InsertTailList(head, &flow->ListEntry);
> -    //KeReleaseSpinLock(&FilterDeviceExtension->NblQueueLock, oldIrql);
>
>  datapath->nFlows++;

LG.

>
> diff --git a/datapath-windows/ovsext/OvsFlow.h b/datapath-windows/ovsext/OvsFlow.h
> index 93368b3..aeec803 100644
> --- a/datapath-windows/ovsext/OvsFlow.h
> +++ b/datapath-windows/ovsext/OvsFlow.h
> @@ -50,13 +50,13 @@ NDIS_STATUS OvsDeleteFlowTable(OVS_DATAPATH *datapath);
> NDIS_STATUS OvsAllocateFlowTable(OVS_DATAPATH *datapath,
>                               POVS_SWITCH_CONTEXT switchContext);
>
> -NDIS_STATUS OvsExtractFlow(const NET_BUFFER_LIST *pkt, UINT32 inPort,
> +NDIS_STATUS OvsExtractFlow(const NET_BUFFER *pNb, UINT32 inPort,
>                         OvsFlowKey *flow, POVS_PACKET_HDR_INFO layers,
> -                           OvsIPv4TunnelKey *tunKey);
> +                           OvsIPv4TunnelKey *tunKey, VOID* vlanTagValue);
> OvsFlow *OvsLookupFlow(OVS_DATAPATH *datapath, const OvsFlowKey *key,
>                     UINT64 *hash, BOOLEAN hashValid);
> UINT64 OvsHashFlow(const OvsFlowKey *key);
> -VOID OvsFlowUsed(OvsFlow *flow, const NET_BUFFER_LIST *pkt,
> +VOID OvsFlowUsed(OvsFlow *flow, const NET_BUFFER *pkt,
>               const POVS_PACKET_HDR_INFO layers);

LG

>
> NTSTATUS OvsDumpFlowIoctl(PVOID inputBuffer, UINT32 inputLength,
> diff --git a/datapath-windows/ovsext/OvsPacketIO.c b/datapath-windows/ovsext/OvsPacketIO.c
> index 39e5703..d6d2e0e 100644
> --- a/datapath-windows/ovsext/OvsPacketIO.c
> +++ b/datapath-windows/ovsext/OvsPacketIO.c
> @@ -211,61 +211,97 @@ OvsStartNBLIngress(POVS_SWITCH_CONTEXT switchContext,
>      OvsFlowKey key;
>      UINT64 hash;
>      PNET_BUFFER curNb;
> +        VOID* vlanTagValue;

s/VOID*/PVOID.

> +        NET_BUFFER_LIST* pNextNbl;

s/NET_BUFFER_LIST*/PNET_BUFFER_LIST

>      nextNbl = curNbl->Next;
>      curNbl->Next = NULL;
>
>      /* Ethernet Header is a guaranteed safe access. */
> -        curNb = NET_BUFFER_LIST_FIRST_NB(curNbl);
> -        if (curNb->Next != NULL) {
> -            /* XXX: This case is not handled yet. */
> -            ASSERT(FALSE);
> +        POVS_BUFFER_CONTEXT ctx;
> +        OvsFlow *flow;
> +        NET_BUFFER_LIST* pNewNbl;
> +        BOOLEAN dropNbl;
> +
> +        fwdDetail = NET_BUFFER_LIST_SWITCH_FORWARDING_DETAIL(curNbl);
> +        sourcePort = fwdDetail->SourcePortId;
> +        sourceIndex = (NDIS_SWITCH_NIC_INDEX)fwdDetail->SourceNicIndex;
> +
> +        /* Take the DispatchLock so none of the VPORTs disconnect while
> +        * we are setting destination ports.
> +        *
> +        * XXX: acquire/release the dispatch lock for a "batch" of packets
> +        * rather than for each packet. */
> +        NdisAcquireRWLockRead(switchContext->dispatchLock, &lockState,
> +            dispatch);

minor: alignment of argument #3 can be with alignment of argument #1.

> +
> +        ctx = OvsInitExternalNBLContext(switchContext, curNbl,
> +            sourcePort == switchContext->externalPortId);
> +        if (ctx == NULL) {
> +            RtlInitUnicodeString(&filterReason,
> +                L"Cannot allocate external NBL context.");
> +
> +            OvsStartNBLIngressError(switchContext, curNbl,
> +                sendCompleteFlags, &filterReason,
> +                NDIS_STATUS_RESOURCES);
> +            NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
> +            continue;
> +        }
> +
> +        vport = OvsFindVportByPortIdAndNicIndex(switchContext, sourcePort,
> +            sourceIndex);

minor: alignment of argument #3 can be with alignment of argument #1.

> +        if (vport == NULL || vport->ovsState != OVS_STATE_CONNECTED) {
> +            RtlInitUnicodeString(&filterReason,
> +                L"OVS-Cannot forward packet from unknown source port");
> +            goto dropit;
>      } else {
> -            POVS_BUFFER_CONTEXT ctx;
> -            OvsFlow *flow;
> -
> -            fwdDetail = NET_BUFFER_LIST_SWITCH_FORWARDING_DETAIL(curNbl);
> -            sourcePort = fwdDetail->SourcePortId;
> -            sourceIndex = (NDIS_SWITCH_NIC_INDEX)fwdDetail->SourceNicIndex;
> -
> -            /* Take the DispatchLock so none of the VPORTs disconnect while
> -             * we are setting destination ports.
> -             *
> -             * XXX: acquire/release the dispatch lock for a "batch" of packets
> -             * rather than for each packet. */
> -            NdisAcquireRWLockRead(switchContext->dispatchLock, &lockState,
> -                                  dispatch);
> -
> -            ctx = OvsInitExternalNBLContext(switchContext, curNbl,
> -                                  sourcePort == switchContext->externalPortId);
> -            if (ctx == NULL) {
> +            portNo = vport->portNo;
> +        }
> +
> +        curNb = NET_BUFFER_LIST_FIRST_NB(curNbl);
> +        vport->stats.rxPackets++;
> +        vport->stats.rxBytes += NET_BUFFER_DATA_LENGTH(curNb);

This stat will have to be incremented for each NB.

> +        vlanTagValue = NET_BUFFER_LIST_INFO(curNbl,
> +                                            Ieee8021QNetBufferListInfo);
> +
> +        if (curNb->Next == NULL)
> +        {
> +            pNewNbl = curNbl;
> +        }

Coding style:

if () {
 //
} else {
 //
}

> +        else
> +        {
> +            InterlockedDecrement((volatile LONG*)&ctx->refCount);

So, why do we decrement the refcount on the original NBL here? XXX

> +            pNewNbl = OvsPartialCopyToMultipleNBLs(switchContext, curNbl,
> +                0, 0, TRUE);

Alignment of argument #3 onwards can be with argument #1.

> +            if (!pNewNbl) {
>              RtlInitUnicodeString(&filterReason,
> -                                     L"Cannot allocate external NBL context.");
> +                    L"Cannot allocate new NBL: partial copy NB to "
> +                    L"multiple NBLs.");
>
>              OvsStartNBLIngressError(switchContext, curNbl,
>                                      sendCompleteFlags, &filterReason,
>                                      NDIS_STATUS_RESOURCES);

Since an external context has been allocated for the NBL using OvsInitExternalNBLContext(), you need to call OvsCompleteNBL(), and then NdisFSendNetBufferListsComplete(). This is nicely abstracted in OvsCompleteNBLIngress(). You can do dropit, and that will take care of it. If you want to report error, you'll probably have to do something like:

OvsCompleteNBL();
OvsStartNBLIngressError();

Also, I don't think you should be decrementing the refcount on the NBL directly. You should use OvsCompleteNBL() for it.


> +
>              NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
>              continue;
>          }
> +        }
>
> -            vport = OvsFindVportByPortIdAndNicIndex(switchContext, sourcePort,
> -                                                    sourceIndex);
> -            if (vport == NULL || vport->ovsState != OVS_STATE_CONNECTED) {
> -                RtlInitUnicodeString(&filterReason,
> -                    L"OVS-Cannot forward packet from unknown source port");
> -                goto dropit;
> -            } else {
> -                portNo = vport->portNo;
> -            }
> -
> -            vport->stats.rxPackets++;
> -            vport->stats.rxBytes += NET_BUFFER_DATA_LENGTH(curNb);
> +        dropNbl = FALSE;
> +        for (NET_BUFFER_LIST* pCurNbl = pNewNbl; pCurNbl != NULL; pCurNbl = pNextNbl)
> +        {

Line is too long. Also, the '{' need not be on a new line.
Something like:

for (PNET_BUFFER_LIST pCurNbl = pNewNbl;
  pCurNbl != NULL; pCurNbl = pNextNbl) {
...
}

> +            pNextNbl = NET_BUFFER_LIST_NEXT_NBL(pCurNbl);
> +            NET_BUFFER_LIST_NEXT_NBL(pCurNbl) = NULL;
>
> -            status = OvsExtractFlow(curNbl, vport->portNo, &key, &layers, NULL);
> +            status = OvsExtractFlow(NET_BUFFER_LIST_FIRST_NB(pCurNbl),
> +                                    vport->portNo, &key, &layers,
> +                                    NULL, vlanTagValue);
>          if (status != NDIS_STATUS_SUCCESS) {
>              RtlInitUnicodeString(&filterReason, L"OVS-Flow extract failed");
> -                goto dropit;
> +                dropNbl = TRUE;

Where are we using 'dropNbl'? In the current patch, it is redundant.

> +                OvsAddPktCompletionList(&completionList, TRUE, sourcePort,
> +                                        pCurNbl, 0, &filterReason);
> +                continue;
>          }
>
>          ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
> @@ -273,16 +309,17 @@ OvsStartNBLIngress(POVS_SWITCH_CONTEXT switchContext,
>
>          flow = OvsLookupFlow(datapath, &key, &hash, FALSE);
>          if (flow) {
> -                OvsFlowUsed(flow, curNbl, &layers);
> +                OvsFlowUsed(flow, NET_BUFFER_LIST_FIRST_NB(pCurNbl), &layers);
>              datapath->hits++;
> +
>              /* If successful, OvsActionsExecute() consumes the NBL.
> -                 * Otherwise, it adds it to the completionList. No need to
> -                 * check the return value. */
> -                OvsActionsExecute(switchContext, &completionList, curNbl,
> -                                portNo, SendFlags, &key, &hash, &layers,
> -                                flow->actions, flow->actionsLen);
> +                * Otherwise, it adds it to the completionList. No need to
> +                * check the return value. */
> +                OvsActionsExecute(switchContext, &completionList, pCurNbl,
> +                                  portNo, SendFlags, &key, &hash, &layers,
> +                                  flow->actions, flow->actionsLen);
> +
>              OvsReleaseDatapath(datapath, &dpLockState);
> -                NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
>              continue;
>          } else {
>              OvsReleaseDatapath(datapath, &dpLockState);
> @@ -293,28 +330,30 @@ OvsStartNBLIngress(POVS_SWITCH_CONTEXT switchContext,
>                                              portNo,
>                                              key.tunKey.dst != 0 ?
>                                              (OvsIPv4TunnelKey *)&key.tunKey :
> -                                                NULL, curNbl,
> +                                                NULL, pCurNbl,
>                                              sourcePort ==
>                                              switchContext->externalPortId,
>                                              &layers, switchContext,
>                                              &missedPackets, &num);
>              if (status == NDIS_STATUS_SUCCESS) {
>                  /* Complete the packet since it was copied to user
> -                     * buffer. */
> +                     * buffer.
> +                     */
>                  RtlInitUnicodeString(&filterReason,
>                      L"OVS-Dropped since packet was copied to userspace");
>              } else {
>                  RtlInitUnicodeString(&filterReason,
>                      L"OVS-Dropped due to failure to queue to userspace");
>              }
> -                goto dropit;
> +                dropNbl = TRUE;
> +                OvsAddPktCompletionList(&completionList, TRUE, sourcePort,
> +                                        pCurNbl, 0, &filterReason);
> +                continue;
>          }
> -
> -dropit:
> -            OvsAddPktCompletionList(&completionList, TRUE, sourcePort, curNbl, 0,
> -                                    &filterReason);
> -            NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
>      }
> +
> +    dropit:
> +        NdisReleaseRWLock(switchContext->dispatchLock, &lockState);

Now that 'dropit' does not really drop packets, we should consider renaming it. Also, since 'dropit' does not drop packets there's a leak introduced in:
if (vport == NULL || vport->ovsState != OVS_STATE_CONNECTED) {
 RtlInitUnicodeString(&filterReason,
                      L"OVS-Cannot forward packet from unknown source port");
 goto dropit;
}

You could either fix this by adding another call to OvsAddPktCompletionList(), or add another label like 'dropit' for the inner loop as well. Call the labels as:
dropNbl and dropNb or something like that, OR event 'dropit1' and 'dropit2'. Either of the ways is fine.


>  }
>
>  /* Queue the missed packets. */
> diff --git a/datapath-windows/ovsext/OvsPacketParser.c b/datapath-windows/ovsext/OvsPacketParser.c
> index 0a93435..0b971a7 100644
> --- a/datapath-windows/ovsext/OvsPacketParser.c
> +++ b/datapath-windows/ovsext/OvsPacketParser.c
> @@ -18,13 +18,12 @@
>
> //XXX consider moving to NdisGetDataBuffer.
> const VOID *
> -OvsGetPacketBytes(const NET_BUFFER_LIST *nbl,
> +OvsGetPacketBytes(const NET_BUFFER *netBuffer,
>                UINT32 len,
>                UINT32 srcOffset,
>                VOID *storage)
> {
>  NDIS_STATUS status = NDIS_STATUS_SUCCESS;
> -    PNET_BUFFER netBuffer = NET_BUFFER_LIST_FIRST_NB(nbl);
>  PMDL currentMdl;
>  BOOLEAN firstMDL = TRUE;
>  ULONG destOffset = 0;
> @@ -83,7 +82,7 @@ OvsGetPacketBytes(const NET_BUFFER_LIST *nbl,
> }

LG

> NDIS_STATUS
> -OvsParseIPv6(const NET_BUFFER_LIST *packet,
> +OvsParseIPv6(const NET_BUFFER *packet,
>        OvsFlowKey *key,
>        POVS_PACKET_HDR_INFO layers)
> {
> @@ -134,7 +133,8 @@ OvsParseIPv6(const NET_BUFFER_LIST *packet,
>          const IPv6ExtHdr *extHdr;
>          UINT8 len;
>
> -            extHdr = OvsGetPacketBytes(packet, sizeof *extHdr, ofs, &extHdrStorage);
> +            extHdr = OvsGetPacketBytes(packet, sizeof *extHdr, ofs,
> +                                       &extHdrStorage);
>          if (!extHdr) {
>              return NDIS_STATUS_FAILURE;
>          }
> @@ -142,7 +142,7 @@ OvsParseIPv6(const NET_BUFFER_LIST *packet,
>          len = extHdr->hdrExtLen;
>          ofs += nextHdr == SOCKET_IPPROTO_AH ? (len + 2) * 4 : (len + 1) * 8;
>          nextHdr = extHdr->nextHeader;
> -            if (OvsPacketLenNBL(packet) < ofs) {
> +            if (NET_BUFFER_DATA_LENGTH(packet) < ofs) {
>              return NDIS_STATUS_FAILURE;
>           }

LG.

>      } else if (nextHdr == SOCKET_IPPROTO_FRAGMENT) {
> @@ -177,7 +177,7 @@ OvsParseIPv6(const NET_BUFFER_LIST *packet,
> }
>
> VOID
> -OvsParseTcp(const NET_BUFFER_LIST *packet,
> +OvsParseTcp(const NET_BUFFER *packet,
>       L4Key *flow,
>       POVS_PACKET_HDR_INFO layers)

Alignment of parameters #2 onwards is off.

> {
> @@ -192,7 +192,7 @@ OvsParseTcp(const NET_BUFFER_LIST *packet,
> }
>
> VOID
> -OvsParseUdp(const NET_BUFFER_LIST *packet,
> +OvsParseUdp(const NET_BUFFER *packet,
>       L4Key *flow,
>       POVS_PACKET_HDR_INFO layers)
Alignment of parameters #2 onwards is off.

>
> {
> @@ -210,7 +210,7 @@ OvsParseUdp(const NET_BUFFER_LIST *packet,
> }
>
> NDIS_STATUS
> -OvsParseIcmpV6(const NET_BUFFER_LIST *packet,
> +OvsParseIcmpV6(const NET_BUFFER *packet,
>          OvsFlowKey *key,
>          POVS_PACKET_HDR_INFO layers)

Alignment of parameters #2 onwards is off.

> {
> @@ -249,7 +249,7 @@ OvsParseIcmpV6(const NET_BUFFER_LIST *packet,
>      }
>      flow->ndTarget = *ndTarget;
>
> -        while ((UINT32)(ofs + 8) <= OvsPacketLenNBL(packet)) {
> +        while ((UINT32)(ofs + 8) <= NET_BUFFER_DATA_LENGTH(packet)) {
>          /*
>           * The minimum size of an option is 8 bytes, which also is
>           * the size of Ethernet link-layer options.
> @@ -258,13 +258,15 @@ OvsParseIcmpV6(const NET_BUFFER_LIST *packet,
>          const IPv6NdOptHdr *ndOpt;
>          UINT16 optLen;
>
> -            ndOpt = OvsGetPacketBytes(packet, sizeof *ndOpt, ofs, &ndOptStorage);
> +            ndOpt = OvsGetPacketBytes(packet, sizeof *ndOpt, ofs,
> +                                      &ndOptStorage);
>          if (!ndOpt) {
>              return NDIS_STATUS_FAILURE;
>          }
>
>          optLen = ndOpt->len * 8;
> -            if (!optLen || (UINT32)(ofs + optLen) >  OvsPacketLenNBL(packet)) {
> +            if (!optLen || (UINT32)(ofs + optLen) >
> +                NET_BUFFER_DATA_LENGTH(packet)) {

LG

>              goto invalid;
>          }
>
> diff --git a/datapath-windows/ovsext/OvsPacketParser.h b/datapath-windows/ovsext/OvsPacketParser.h
> index ab3c613..0928932 100644
> --- a/datapath-windows/ovsext/OvsPacketParser.h
> +++ b/datapath-windows/ovsext/OvsPacketParser.h
> @@ -20,15 +20,15 @@
> #include "precomp.h"
> #include "OvsNetProto.h"
>
> -const VOID* OvsGetPacketBytes(const NET_BUFFER_LIST *_pNB, UINT32 len,
> +const VOID* OvsGetPacketBytes(const NET_BUFFER *_pNB, UINT32 len,

'_' is not necessary maybe. I see that the existing code did it as well :) So, not to blame you :)

>                            UINT32 SrcOffset, VOID *storage);
> -NDIS_STATUS OvsParseIPv6(const NET_BUFFER_LIST *packet, OvsFlowKey *key,
> +NDIS_STATUS OvsParseIPv6(const NET_BUFFER *packet, OvsFlowKey *key,
>                      POVS_PACKET_HDR_INFO layers);
> -VOID OvsParseTcp(const NET_BUFFER_LIST *packet, L4Key *flow,
> +VOID OvsParseTcp(const NET_BUFFER *packet, L4Key *flow,
>               POVS_PACKET_HDR_INFO layers);
> -VOID OvsParseUdp(const NET_BUFFER_LIST *packet, L4Key *flow,
> +VOID OvsParseUdp(const NET_BUFFER *packet, L4Key *flow,
>               POVS_PACKET_HDR_INFO layers);
> -NDIS_STATUS OvsParseIcmpV6(const NET_BUFFER_LIST *packet, OvsFlowKey *key,
> +NDIS_STATUS OvsParseIcmpV6(const NET_BUFFER *packet, OvsFlowKey *key,
>                          POVS_PACKET_HDR_INFO layers);

LG.

> static __inline ULONG
> @@ -58,7 +58,7 @@ OvsPacketLenNBL(const NET_BUFFER_LIST *_pNB)
> * which C does not allow.
> */
> static UINT16
> -OvsGetTcpCtl(const NET_BUFFER_LIST *packet, // IN
> +OvsGetTcpCtl(const NET_BUFFER *packet, // IN
>           const POVS_PACKET_HDR_INFO layers) // IN
> {
> #define TCP_CTL_OFS 12                // Offset of "ctl" field in TCP header.
> @@ -74,7 +74,7 @@ OvsGetTcpCtl(const NET_BUFFER_LIST *packet, // IN
>
>
> static UINT8
> -OvsGetTcpFlags(const NET_BUFFER_LIST *packet,    // IN
> +OvsGetTcpFlags(const NET_BUFFER *packet,    // IN
>             const OvsFlowKey *key,   // IN
>             const POVS_PACKET_HDR_INFO layers) // IN
> {
> @@ -88,7 +88,7 @@ OvsGetTcpFlags(const NET_BUFFER_LIST *packet,    // IN
> }
>
> static const EtherArp *
> -OvsGetArp(const NET_BUFFER_LIST *packet,
> +OvsGetArp(const NET_BUFFER *packet,
>        UINT32 ofs,
>        EtherArp *storage)
> {
> @@ -96,14 +96,15 @@ OvsGetArp(const NET_BUFFER_LIST *packet,
> }
>
> static const IPHdr *
> -OvsGetIp(const NET_BUFFER_LIST *packet,
> +OvsGetIp(const NET_BUFFER *packet,
>       UINT32 ofs,
>       IPHdr *storage)
> {
>  const IPHdr *ip = OvsGetPacketBytes(packet, sizeof *ip, ofs, storage);
>  if (ip) {
>      int ipLen = ip->ihl * 4;
> -        if (ipLen >= sizeof *ip && OvsPacketLenNBL(packet) >= ofs + ipLen) {
> +        if (ipLen >= sizeof *ip &&
> +            NET_BUFFER_DATA_LENGTH(packet) >= ofs + ipLen) {
>          return ip;
>      }
>  }
> @@ -111,14 +112,15 @@ OvsGetIp(const NET_BUFFER_LIST *packet,
> }
>
> static const TCPHdr *
> -OvsGetTcp(const NET_BUFFER_LIST *packet,
> +OvsGetTcp(const NET_BUFFER *packet,
>        UINT32 ofs,
>        TCPHdr *storage)
> {
>  const TCPHdr *tcp = OvsGetPacketBytes(packet, sizeof *tcp, ofs, storage);
>  if (tcp) {
>      int tcpLen = tcp->doff * 4;
> -        if (tcpLen >= sizeof *tcp && OvsPacketLenNBL(packet) >= ofs + tcpLen) {
> +        if (tcpLen >= sizeof *tcp &&
> +            NET_BUFFER_DATA_LENGTH(packet) >= ofs + tcpLen) {
>          return tcp;
>      }
>  }
> @@ -126,7 +128,7 @@ OvsGetTcp(const NET_BUFFER_LIST *packet,
> }
>
> static const UDPHdr *
> -OvsGetUdp(const NET_BUFFER_LIST *packet,
> +OvsGetUdp(const NET_BUFFER *packet,
>        UINT32 ofs,
>        UDPHdr *storage)
> {
> @@ -134,7 +136,7 @@ OvsGetUdp(const NET_BUFFER_LIST *packet,
> }
>
> static const ICMPHdr *
> -OvsGetIcmp(const NET_BUFFER_LIST *packet,
> +OvsGetIcmp(const NET_BUFFER *packet,
>         UINT32 ofs,
>         ICMPHdr *storage)
> {

All LG.

> diff --git a/datapath-windows/ovsext/OvsTunnel.c b/datapath-windows/ovsext/OvsTunnel.c
> index b5a369a..9c96aec 100644
> --- a/datapath-windows/ovsext/OvsTunnel.c
> +++ b/datapath-windows/ovsext/OvsTunnel.c
> @@ -228,6 +228,7 @@ OvsInjectPacketThroughActions(PNET_BUFFER_LIST pNbl,
>  OVS_DATAPATH *datapath = &gOvsSwitchContext->datapath;
>
>  ASSERT(gOvsSwitchContext);
> +    ASSERT(pNbl->FirstNetBuffer->Next == NULL);

Just an FYI:
We have the same check later:
     curNb = NET_BUFFER_LIST_FIRST_NB(pNbl); <<<
     ASSERT(curNb->Next == NULL);

     NdisAcquireRWLockRead(gOvsSwitchContext->dispatchLock, &lockState, dispatch);

>
>  /* Fill the tunnel key */
>  status = OvsSlowPathDecapVxlan(pNbl, &tunnelKey);
> @@ -265,6 +266,7 @@ OvsInjectPacketThroughActions(PNET_BUFFER_LIST pNbl,
>      UINT64 hash;
>      PNET_BUFFER curNb;
>      OvsFlow *flow;
> +        VOID* vlanTagValue;

s/VOID */PVOID

>      fwdDetail = NET_BUFFER_LIST_SWITCH_FORWARDING_DETAIL(pNbl);
>
> @@ -295,20 +297,22 @@ OvsInjectPacketThroughActions(PNET_BUFFER_LIST pNbl,
>      ASSERT(vport->ovsType == OVSWIN_VPORT_TYPE_VXLAN);
>
>      portNo = vport->portNo;
> +        vlanTagValue = NET_BUFFER_LIST_INFO(pNbl, Ieee8021QNetBufferListInfo);
>
> -        status = OvsExtractFlow(pNbl, portNo, &key, &layers, &tunnelKey);
> +        status = OvsExtractFlow(curNb, portNo, &key, &layers,
> +                                &tunnelKey, vlanTagValue);
>      if (status != NDIS_STATUS_SUCCESS) {
>          goto unlockAndDrop;
>      }
>
>      flow = OvsLookupFlow(datapath, &key, &hash, FALSE);
>      if (flow) {
> -            OvsFlowUsed(flow, pNbl, &layers);
> +            OvsFlowUsed(flow, curNb, &layers);
>          datapath->hits++;
>
>          OvsActionsExecute(gOvsSwitchContext, &completionList, pNbl,
> -                            portNo, SendFlags, &key, &hash, &layers,
> -                            flow->actions, flow->actionsLen);
> +                              portNo, SendFlags, &key, &hash, &layers,
> +                              flow->actions, flow->actionsLen);

LG

>
>          OvsReleaseDatapath(datapath, &dpLockState);
>      } else {
> diff --git a/datapath-windows/ovsext/OvsUser.c b/datapath-windows/ovsext/OvsUser.c
> index 5093f20..9f717ee 100644
> --- a/datapath-windows/ovsext/OvsUser.c
> +++ b/datapath-windows/ovsext/OvsUser.c
> @@ -315,6 +315,7 @@ OvsExecuteDpIoctl(PVOID inputBuffer,
>  OvsFlowKey key;
>  OVS_PACKET_HDR_INFO layers;
>  POVS_VPORT_ENTRY vport;
> +    VOID* vlanTagValue;

s/VOID*/PVOID

>
>  if (inputLength < sizeof(*execute) || outputLength != 0) {
>      return STATUS_INFO_LENGTH_MISMATCH;
> @@ -360,14 +361,22 @@ OvsExecuteDpIoctl(PVOID inputBuffer,
>      fwdDetail->SourcePortId = NDIS_SWITCH_DEFAULT_PORT_ID;
>      fwdDetail->SourceNicIndex = 0;
>  }
> -    // XXX: Figure out if any of the other members of fwdDetail need to be set.
> +    /*
> +     * XXX: Figure out if any of the other members of fwdDetail need to be
> +     * set.
> +     */
> +
> +    ASSERT(pNbl->FirstNetBuffer->Next == NULL);
>
> -    ndisStatus = OvsExtractFlow(pNbl, fwdDetail->SourcePortId, &key, &layers,
> -                              NULL);
> +    vlanTagValue = NET_BUFFER_LIST_INFO(pNbl, Ieee8021QNetBufferListInfo);

You can be assured that an NBL sent from userspace won't have the vlanTagValue set in the OOB context. We are creating the NBL at the top of the function.

> +
> +    ndisStatus = OvsExtractFlow(NET_BUFFER_LIST_FIRST_NB(pNbl),
> +                                fwdDetail->SourcePortId, &key, &layers, NULL,
> +                                vlanTagValue);
>  if (ndisStatus == NDIS_STATUS_SUCCESS) {
>      ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
>      NdisAcquireRWLockRead(gOvsSwitchContext->dispatchLock, &lockState,
> -                              NDIS_RWL_AT_DISPATCH_LEVEL);
> +            NDIS_RWL_AT_DISPATCH_LEVEL);

I think it is neater of the arguments aligned it the way it was previously.

>      ndisStatus = OvsActionsExecute(gOvsSwitchContext, NULL, pNbl,
>                                     vport ? vport->portNo : 0,
>                                     NDIS_SEND_FLAGS_SWITCH_DESTINATION_GROUP,
> @@ -827,6 +836,8 @@ OvsCreateAndAddPackets(UINT32 queueId,
>  PNET_BUFFER_LIST newNbl = NULL;
>  PNET_BUFFER nb;
>
> +    ASSERT(nbl->FirstNetBuffer->Next == NULL);
> +
>  if (hdrInfo->isTcp) {
>      NDIS_TCP_LARGE_SEND_OFFLOAD_NET_BUFFER_LIST_INFO tsoInfo;
>      UINT32 packetLength;

LG

> diff --git a/datapath-windows/ovsext/OvsVxlan.c b/datapath-windows/ovsext/OvsVxlan.c
> index 63909ae..c50d2d9 100644
> --- a/datapath-windows/ovsext/OvsVxlan.c
> +++ b/datapath-windows/ovsext/OvsVxlan.c
> @@ -116,6 +116,8 @@ OvsDoEncapVxlan(PNET_BUFFER_LIST curNbl,
>  UINT32 headRoom = OvsGetVxlanTunHdrSize();
>  UINT32 packetLength;
>
> +    ASSERT(curNbl->FirstNetBuffer->Next == NULL);
> +
>  /*
>   * XXX: the assumption currently is that the NBL is owned by OVS, and
>   * headroom has already been allocated as part of allocating the NBL and
> @@ -286,13 +288,19 @@ OvsIpHlprCbVxlan(PNET_BUFFER_LIST curNbl,
>  OvsFlowKey key;
>  NDIS_STATUS status;
>  UNREFERENCED_PARAMETER(inPort);
> -
> -    status = OvsExtractFlow(curNbl, inPort, &key, &layers, NULL);
> -    if (result == STATUS_SUCCESS) {
> -        status = OvsDoEncapVxlan(curNbl, tunKey, fwdInfo, &layers,
> +    VOID* vlanTagValue;
> +
> +    vlanTagValue = NET_BUFFER_LIST_INFO(curNbl, Ieee8021QNetBufferListInfo);
> +    for (NET_BUFFER* pNb = NET_BUFFER_LIST_FIRST_NB(curNbl);
> +         pNb != NULL; pNb = NET_BUFFER_NEXT_NB(pNb)) {
> +        status = OvsExtractFlow(pNb, inPort, &key, &layers, NULL,
> +                                vlanTagValue);
> +        if (result == STATUS_SUCCESS) {
> +            status = OvsDoEncapVxlan(curNbl, tunKey, fwdInfo, &layers,
>              (POVS_SWITCH_CONTEXT)cbData1, NULL);
> -    } else {
> -        status = NDIS_STATUS_FAILURE;
> +        } else {
> +            status = NDIS_STATUS_FAILURE;
> +        }

You can be assured here also that IP helper won't call you back with the chain of NBs. Each callback returns the NBL that was enqueued. Your assert to check for nb->next == NULL is valid here. WE don't need the loop.

>  }
>
>  if (status != NDIS_STATUS_SUCCESS) {
> @@ -457,9 +465,16 @@ OvsSlowPathDecapVxlan(const PNET_BUFFER_LIST packet,
>  OVS_PACKET_HDR_INFO layers;
>
>  layers.value = 0;
> +    ASSERT(packet->FirstNetBuffer->Next == NULL);
>
>  do {
> -        nh = OvsGetIp(packet, layers.l3Offset, &ip_storage);
> +        NET_BUFFER* pNb = NET_BUFFER_LIST_FIRST_NB(packet);
> +
> +        /*
> +         * It is an encapsulated packet (UDP), we can use the first NB
> +         * to start the check
> +         */
> +        nh = OvsGetIp(pNb, layers.l3Offset, &ip_storage);

You can be ASSURED that you'll not get an NBL with multiple NBs here. OvsInjectPacketThroughActions() has already checked for this case.

>      if (nh) {
>          layers.l4Offset = layers.l3Offset + nh->ihl * 4;
>      } else {
> @@ -467,7 +482,7 @@ OvsSlowPathDecapVxlan(const PNET_BUFFER_LIST packet,
>      }
>
>      /* make sure it's a VXLAN packet */
> -        udp = OvsGetUdp(packet, layers.l4Offset, &udpStorage);
> +        udp = OvsGetUdp(pNb, layers.l4Offset, &udpStorage);
>      if (udp) {
>          layers.l7Offset = layers.l4Offset + sizeof *udp;
>      } else {
> @@ -477,7 +492,11 @@ OvsSlowPathDecapVxlan(const PNET_BUFFER_LIST packet,
>      /* XXX Should be tested against the dynamic port # in the VXLAN vport */
>      ASSERT(udp->dest == RtlUshortByteSwap(VXLAN_UDP_PORT));
>
> -        VxlanHeader = (VXLANHdr *)OvsGetPacketBytes(packet,
> +        /*
> +         * We may have multiple VXLAN packets here that need to be decapsulated
> +         * For the VXLAN header, we need only the first NET_BUFFER
> +         */
> +        VxlanHeader = (VXLANHdr *)OvsGetPacketBytes(pNb,
>                                                  sizeof(*VxlanHeader),
>                                                  layers.l7Offset,
>                                                  &VxlanHeaderBuffer);

LG


thanks,
Nithin



More information about the dev mailing list