[ovs-dev] [PATCH] datapath-windows: Handle NBLs with multiple NBs

Eitan Eliahu eliahue at vmware.com
Mon Sep 15 16:06:04 UTC 2014


"Actually, we should be at DPC level, because OvsProcessOneNb is called by the NDIS Send callback, which is always called (from observation) at DPC level."
This is usually true but we could not count on that as it can be called in a passive IRQL as well.
Eitan

-----Original Message-----
From: Samuel Ghinet [mailto:sghinet at cloudbasesolutions.com] 
Sent: Sunday, September 14, 2014 9:13 PM
To: Nithin Raju
Cc: dev at openvswitch.org; Alin Serdean; Saurabh Shah; Eitan Eliahu; Ankur Sharma
Subject: RE: [PATCH] datapath-windows: Handle NBLs with multiple NBs

Hello Nithin,

Thanks a lot again for the review!
-------------------------
> More than that I was a little concerned about the changes in OvsTunnelTx to split a TSOed NBL (with multiple NBs) into multiple smaller NBLs. I believe this would have a performance impact (when we measure it).

> The reason why we did not call OvsDoFlowLookupOutput() on each NB earlier is that we know all of the NBs have the same L2, L3 and L4 header. The fields that are different in the L3 and L4 headers are not part of the flow key, so there's no reason to do a separate flow lookup and actions execute. It is an optimization we had, and there was no correctness issue with that optimization.

> But, I understand that this does not suit the change you want to make where you want to update the pipeline to process one NB at a time rather than NBL at a time. Can I ask what is the advantage of that change? This change can potentially cause bad performance.
There was an email I had sent about this, long ago.
And the answer is that MS, in its doc, makes very few guarantees about the things shared in common by the NBs in a NBL.
Namely, we do have L2 the same. Yes.
L3? we don't. We have here only the protocol type the same. If the packet is TCP or UDP, we also have src ip and src destination the same, and src ports and dest ports the same. But that's quite it.
There are a lot of fields in the packet info / flow keys regarding L3 and L4 headers that the MS documentation does not give any guarantee about.
We could theoretically assume, for instance, that if we have NBs of a tcp connection, we would generally treat them the same. But practically, we can apply a 'set action' if tcp flags = X (which may be for only one of the NBs in the NBL), or if it is the first ipv4 / ipv6 fragment in a series or not, etc. (which may all be NBs in the same NBL).

An optimized version of the "NBs-> NBLs" would have to do advanced flow match checks, where it knows that say, tcp flags = wildcard, the same with others, and it knows it needs not check flow match for the NBs that follow in the NBL. 
We could have an IRC chat on this, if you think it could use more clarifications.

> > +    atDispatch = NDIS_TEST_SEND_AT_DISPATCH_LEVEL(sendFlags) ?
> > +                    NDIS_RWL_AT_DISPATCH_LEVEL : 0;
> 
> If there's some way to check if the 'switchContext->dispatchLock' is already taken, we should do it. At a minimum, we can ASSERT if we are already at DISPATCH level. I'm implying that the DPC level would have been upgrated to DISPATCH_LEVEL because we have taken the 'switchContext->dispatchLock'.
Actually, we should be at DPC level, because OvsProcessOneNb is called by the NDIS Send callback, which is always called (from observation) at DPC level.

> > +    ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
> 
> Cool, we have this ASSERT. Let's assume that we are at DISPATCH_LEVEL then. We don't need the 'atDispatch' variable.
Actually, I think I should remove this ASSERT instead. Even though, from observation, the NDIS Send callback is always called at dispatch level: they give us a flag, and I think it is better to use that only.

> > +    newNbl = OvsSplitNblByNB(origNbl, switchContext, TRUE);
> > +    if (!newNbl) {
> > +        RtlInitUnicodeString(&filterReason, L"Cannot allocate new NBL: partial"
> > +                             L"copy NB to multiple NBLs.");
> > +        status = NDIS_STATUS_RESOURCES;
> > +        goto Cleanup;
> > +    }
> 
> Jumping to 'Cleanup' here or from code down the line would add 'origNbl' to the completion list, and we'd end up calling OVSCompleteNBl() on it. But, the refCount is already 0, so we can have bad behavior. You can fix it either by not jumping to Cleanup or by preserving the refCount and letting code code flow through.
Actually, we have 3 cases:
1. if OvsSplitNblByNB returns NULL - fixing the refcount decrement to be done after the partial copy will solve the problem 2. if origNbl has one NB: no deref happens.
3. if origNbl has N NBs (N > 1). Then origNbl will have refCount = N (normally, N + 1, 1 for the initial value, but the deref happened, resulting in refCount == N). And this allows for the completion of the children NBLs (i.e. cloned) to also complete the parent (original).

> We generally use all lowercase for the the label. I can update the CodingStyle to make this explicit. What do you think? Having both lower and upper case makes it a little untidy.
I would strongly prefer labels starting with uppercase (like Cleanup).

> > +    /* Queue the missed packets. */
> > +    OvsQueuePackets(OVS_DEFAULT_PACKET_QUEUE, &missedPackets, 
> > + countMissedPackets);
> 
> A small optimization is to call OvsQueuePackets() only if countMissedPackets > 0. Also, the line is longer than 79 characters. You can reduce the length if possible.
Hmm, perhaps it would be better to put the check inside OvsQueuePackets: this would make the code clearer if other callers would like this optimizations.

Thanks,
Sam
________________________________________
From: Nithin Raju [nithin at vmware.com]
Sent: Friday, September 12, 2014 2:41 AM
To: Samuel Ghinet
Cc: dev at openvswitch.org; Alin Serdean; Saurabh Shah; Eitan Eliahu; Ankur Sharma
Subject: Re: [PATCH] datapath-windows: Handle NBLs with multiple NBs

hi Sam,
Thanks for the updating the patch to make it specific to "handle NBLs with multiple NBs".

In general it looks good. There are a few bugs w.r.t NBL completion that I have highlighted.

More than that I was a little concerned about the changes in OvsTunnelTx to split a TSOed NBL (with multiple NBs) into multiple smaller NBLs. I believe this would have a performance impact (when we measure it).

The reason why we did not call OvsDoFlowLookupOutput() on each NB earlier is that we know all of the NBs have the same L2, L3 and L4 header. The fields that are different in the L3 and L4 headers are not part of the flow key, so there's no reason to do a separate flow lookup and actions execute. It is an optimization we had, and there was no correctness issue with that optimization.

But, I understand that this does not suit the change you want to make where you want to update the pipeline to process one NB at a time rather than NBL at a time. Can I ask what is the advantage of that change? This change can potentially cause bad performance.

So, I don't think we should make that change unless it is actually buying us something.

Do, send out another version with the fixes, and I'll take a look.

> @@ -630,12 +631,37 @@ OvsTunnelPortTx(OvsForwardingContext *ovsFwdCtx)
>     OvsClearTunTxCtx(ovsFwdCtx);
>
>     if (status == NDIS_STATUS_SUCCESS) {
> +        PNET_BUFFER_LIST splitNbl, nextNbl;
> +
>         ASSERT(newNbl);
> +
>         OvsCompleteNBLForwardingCtx(ovsFwdCtx,
> -                                    L"Complete after cloning NBL for encapsulation");
> -        ovsFwdCtx->curNbl = newNbl;
> -        status = OvsDoFlowLookupOutput(ovsFwdCtx);
> -        ASSERT(ovsFwdCtx->curNbl == NULL);
> +                                    L"Complete after cloning NBL for "
> +                                    L"encapsulation");
> +
> +        /*
> +        * split NBLs: each NBL must contain only one NB.
> +        * NOTE: if newNbl has only one NB => splitNbl will be newNbl
> +        * NOTE: if we partial copy nbl, we decrement refcount now, so we would
> +        * not need to add an OvsCompleteNBL() call later.
> +        */

Comment should be more like:
 /*
  * split NBLs: each NBL must contain only one NB.
  * NOTE: if newNbl has only one NB => splitNbl will be newNbl ...

> +        splitNbl = OvsSplitNblByNB(newNbl, ovsFwdCtx->switchContext, TRUE);
> +        if (!splitNbl) {
> +            return NDIS_STATUS_RESOURCES;
> +        }
> +
> +        OVS_NBL_FOR_EACH_NEXT(curNbl, splitNbl, nextNbl) {
> +            nextNbl = NET_BUFFER_LIST_NEXT_NBL(curNbl);
> +            NET_BUFFER_LIST_NEXT_NBL(curNbl) = NULL;
> +
> +            ovsFwdCtx->curNbl = curNbl;
> +            status = OvsDoFlowLookupOutput(ovsFwdCtx);
> +            ASSERT(ovsFwdCtx->curNbl == NULL);
> +
> +            if (status != STATUS_SUCCESS) {
> +                break;
> +            }
> +        }

The reason why we did not call OvsDoFlowLookupOutput() on each NB earlier is that we know all of the NBs have the same L2, L3 and L4 header. The fields that are different in the L3 and L4 headers are not part of the flow key, so there's no reason to do a separate flow lookup and actions execute. It is an optimization we had, and there was not correctness issue with that optimization.

But, I understand that this does not suit the change you want to make where you want to update the pipeline to process one NB at a time rather than NBL at a time. Can I ask what is the advantage of that change? This change can potentially cause bad performance.

>     } else {
>         /*
>         * XXX: Temporary freeing of the packet until we register a @@ 
> -1368,6 +1394,8 @@ OvsActionsExecute(POVS_SWITCH_CONTEXT switchContext,
>     PNDIS_SWITCH_FORWARDING_DETAIL_NET_BUFFER_LIST_INFO fwdDetail =
>         NET_BUFFER_LIST_SWITCH_FORWARDING_DETAIL(curNbl);
>
> +    ASSERT(curNbl->FirstNetBuffer->Next == NULL);
> +
>     /* XXX: ASSERT that the flow table lock is held. */
>     status = OvsInitForwardingCtx(&ovsFwdCtx, switchContext, curNbl, portNo,
>                                   sendFlags, fwdDetail, 
> completionList, diff --git a/datapath-windows/ovsext/BufferMgmt.c 
> b/datapath-windows/ovsext/BufferMgmt.c
> index e0377c1..27d1068 100644
> --- a/datapath-windows/ovsext/BufferMgmt.c
> +++ b/datapath-windows/ovsext/BufferMgmt.c
> @@ -863,7 +863,7 @@ OvsPartialCopyToMultipleNBLs(PVOID ovsContext,
>         if (prevNbl == NULL) {
>             firstNbl = newNbl;
>         } else {
> -            NET_BUFFER_LIST_NEXT_NBL(prevNbl) = nbl;
> +            NET_BUFFER_LIST_NEXT_NBL(prevNbl) = newNbl;
>             NET_BUFFER_NEXT_NB(prevNb) = nb;
>         }
>         prevNbl = newNbl;
> @@ -1533,3 +1533,34 @@ OvsGetCtxSourcePortNo(PNET_BUFFER_LIST nbl,
>     *portNo = ctx->srcPortNo;
>     return NDIS_STATUS_SUCCESS;
> }
> +
> +PNET_BUFFER_LIST
> +OvsSplitNblByNB(PNET_BUFFER_LIST nblList,
> +                POVS_SWITCH_CONTEXT switchContext,
> +                BOOLEAN decrementBufRefCount) {
> +    PNET_BUFFER curNb;
> +    PNET_BUFFER_LIST newNblList;
> +    POVS_BUFFER_CONTEXT bufferContext;
> +
> +    curNb = NET_BUFFER_LIST_FIRST_NB(nblList);
> +
> +    if (curNb->Next == NULL) {
> +        return nblList;
> +    }
> +
> +    bufferContext = (POVS_BUFFER_CONTEXT)
> +                     NET_BUFFER_LIST_CONTEXT_DATA_START(nblList);
> +
> +    if (decrementBufRefCount) {
> +        InterlockedDecrement((volatile LONG*)&bufferContext->refCount);
> +    }

It will be a better idea to decrement the refCount after the call to OvsPartialCopyToMultipleNBLs() succeeds. Otherwise, upon failure of OvsPartialCopyToMultipleNBLs(), you'll end up with a refCount of 0 on the original NBL. Calling OVSCompleteNbl() on it would be wrong.

> +
> +    newNblList = OvsPartialCopyToMultipleNBLs(switchContext, nblList, 0, 0,
> +                                              TRUE);
> +    if (!newNblList) {
> +        return NULL;
> +    }
> +
> +    return newNblList;
> +}
> \ No newline at end of file
> diff --git a/datapath-windows/ovsext/BufferMgmt.h 
> b/datapath-windows/ovsext/BufferMgmt.h
> index 915d7f5..a71d100 100644
> --- a/datapath-windows/ovsext/BufferMgmt.h
> +++ b/datapath-windows/ovsext/BufferMgmt.h
> @@ -17,6 +17,10 @@
> #ifndef __BUFFER_MGMT_H_
> #define __BUFFER_MGMT_H_ 1
>
> +#include "precomp.h"
> +
> +typedef struct _OVS_SWITCH_CONTEXT *POVS_SWITCH_CONTEXT;
> +
> #define MEM_ALIGN                       MEMORY_ALLOCATION_ALIGNMENT
> #define MEM_ALIGN_SIZE(_x)  ((MEM_ALIGN - 1 + (_x))/MEM_ALIGN * MEM_ALIGN)
> #define OVS_CTX_MAGIC                   0xabcd
> @@ -81,6 +85,16 @@ typedef struct _OVS_NBL_POOL { #endif } 
> OVS_NBL_POOL, *POVS_NBL_POOL;
>
> +#define OVS_NBL_FOR_EACH(_curNbl, _nblList)       \
> +for (PNET_BUFFER_LIST _curNbl = _nblList;         \
> +     _curNbl != NULL;                             \
> +     _curNbl = NET_BUFFER_LIST_NEXT_NBL(_curNbl))
> +
> +#define OVS_NBL_FOR_EACH_NEXT(_curNbl, _nblList, _nextNbl)   \
> +for (PNET_BUFFER_LIST _curNbl = _nblList;                    \
> +     _curNbl != NULL;                                        \
> +     _curNbl = _nextNbl)
> +
>
> NDIS_STATUS OvsInitBufferPool(PVOID context); VOID 
> OvsCleanupBufferPool(PVOID context); @@ -121,4 +135,9 @@ NDIS_STATUS 
> OvsSetCtxSourcePortNo(PNET_BUFFER_LIST nbl, UINT32 portNo);
>
> NDIS_STATUS OvsGetCtxSourcePortNo(PNET_BUFFER_LIST nbl, UINT32 
> *portNo);
>
> +PNET_BUFFER_LIST
> +OvsSplitNblByNB(PNET_BUFFER_LIST nblList,
> +                POVS_SWITCH_CONTEXT switchContext,
> +                BOOLEAN decrementBufRefCount);
> +
> #endif /* __BUFFER_MGMT_H_ */
> diff --git a/datapath-windows/ovsext/PacketIO.c 
> b/datapath-windows/ovsext/PacketIO.c
> index ac7862d..22e3a06 100644
> --- a/datapath-windows/ovsext/PacketIO.c
> +++ b/datapath-windows/ovsext/PacketIO.c
> @@ -28,6 +28,7 @@
> #include "Flow.h"
> #include "Event.h"
> #include "User.h"
> +#include "BufferMgmt.h"
>
> /* Due to an imported header file */
> #pragma warning( disable:4505 )
> @@ -176,149 +177,204 @@ OvsStartNBLIngressError(POVS_SWITCH_CONTEXT switchContext,
>                                     sendCompleteFlags); }
>
> +/*
> +* 
> +---------------------------------------------------------------------
> +-----
> +*  Processes one NB, which means:
> +*  o) Extract Packet Info (i.e. flow key)
> +*  o) Find flow matching packet info
> +*  o) If a matching flow was found => execute flow's actions
> +*  o) If a matching flow was not found => queue packet to userspace
> +* 
> +---------------------------------------------------------------------
> +-----
> +*/
> +
> static VOID
> -OvsStartNBLIngress(POVS_SWITCH_CONTEXT switchContext,
> -                   PNET_BUFFER_LIST netBufferLists,
> -                   ULONG SendFlags)

Generally, we don't add a new line after the comment block describing the function. I'm not opposed to it if you want to do it. It seems a little unnecessary.

> +OvsProcessOneNb(PNET_BUFFER_LIST curNbl,
> +                POVS_SWITCH_CONTEXT switchContext,
> +                POVS_VPORT_ENTRY sourceVPort,
> +                OvsCompletionList *completionList,
> +                ULONG sendFlags,
> +                PLIST_ENTRY missedPackets,
> +                PUINT32 countMissedPackets)
> {
> -    NDIS_SWITCH_PORT_ID sourcePort = 0;
> -    NDIS_SWITCH_NIC_INDEX sourceIndex = 0;
> -    PNDIS_SWITCH_FORWARDING_DETAIL_NET_BUFFER_LIST_INFO fwdDetail;
> -    PNET_BUFFER_LIST curNbl = NULL, nextNbl = NULL;
> -    ULONG sendCompleteFlags;
> -    UCHAR dispatch;
> -    LOCK_STATE_EX lockState, dpLockState;
>     NDIS_STATUS status;
> +    OvsFlowKey key;
> +    OVS_PACKET_HDR_INFO layers;
>     NDIS_STRING filterReason;
> -    LIST_ENTRY missedPackets;
> -    UINT32 num = 0;
> -    OvsCompletionList completionList;
> +    LOCK_STATE_EX dpLockState;
> +    OvsFlow *flow;
> +    UINT64 hash;
> +    BOOLEAN atDispatch;
> +    POVS_DATAPATH datapath = &(switchContext->datapath);
> +

We can definitely add an ASSERT() here to check if the NBL->Next point is NULL.


> +    atDispatch = NDIS_TEST_SEND_AT_DISPATCH_LEVEL(sendFlags) ?
> +                    NDIS_RWL_AT_DISPATCH_LEVEL : 0;

If there's some way to check if the 'switchContext->dispatchLock' is already taken, we should do it. At a minimum, we can ASSERT if we are already at DISPATCH level. I'm implying that the DPC level would have been upgrated to DISPATCH_LEVEL because we have taken the 'switchContext->dispatchLock'.

> +
> +    /* 1. Extract flow key*/
> +    status = OvsExtractFlow(curNbl, sourceVPort->portNo, &key, 
> + &layers, NULL);
> +
> +    if (status != NDIS_STATUS_SUCCESS) {
> +        RtlInitUnicodeString(&filterReason, L"OVS-Flow extract failed");
> +        OvsAddPktCompletionList(completionList, TRUE, sourceVPort->portId,
> +                                curNbl, 0, &filterReason);
> +        return;
> +    }
>
> -    dispatch = NDIS_TEST_SEND_AT_DISPATCH_LEVEL(SendFlags)?
> -                                            NDIS_RWL_AT_DISPATCH_LEVEL : 0;
> -    sendCompleteFlags = OvsGetSendCompleteFlags(SendFlags);
> -    SendFlags |= NDIS_SEND_FLAGS_SWITCH_DESTINATION_GROUP;
> +    ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);

Cool, we have this ASSERT. Let's assume that we are at DISPATCH_LEVEL then. We don't need the 'atDispatch' variable.

> +    OvsAcquireDatapathRead(datapath, &dpLockState, atDispatch);
> +
> +    /* 2. Find flow matching key */
> +    flow = OvsLookupFlow(datapath, &key, &hash, FALSE);
> +    if (flow) {
> +        OvsFlowUsed(flow, curNbl, &layers);
> +        datapath->hits++;
> +
> +        /*
> +        * 2.a) execute actions on packet.
> +        * If successful, OvsActionsExecute() consumes the NBL.
> +        * Otherwise, it adds it to the completionList. No need to
> +        * check the return value.
> +        */
> +        OvsActionsExecute(switchContext, completionList, curNbl,
> +                          sourceVPort->portNo, sendFlags, &key, &hash,
> +                          &layers, flow->actions, flow->actionsLen);
> +
> +        OvsReleaseDatapath(datapath, &dpLockState);
> +        return;
> +    }
>
> -    InitializeListHead(&missedPackets);
> -    OvsInitCompletionList(&completionList, switchContext, sendCompleteFlags);
> +    OvsReleaseDatapath(datapath, &dpLockState);
> +
> +    /* 2.b) no matching flow found => queue packet to userspace */
> +    datapath->misses++;
> +    status = OvsCreateAndAddPackets(OVS_DEFAULT_PACKET_QUEUE,
> +                                    /*user data*/ NULL, /*user data length*/ 0,
> +                                    OVS_PACKET_CMD_MISS, sourceVPort->portNo,
> +                                    (key.tunKey.dst != 0 ? &key.tunKey : NULL),
> +                                    curNbl,
> +                                    (sourceVPort->portId ==
> +                                    switchContext->externalPortId),
> +                                    &layers, switchContext, missedPackets,
> +                                    countMissedPackets);
> +    if (status == NDIS_STATUS_SUCCESS) {
> +        /* Complete the packet since it was copied to user
> +        * buffer.
> +        */
> +        RtlInitUnicodeString(&filterReason,
> +                             L"OVS-Dropped since packet was copied to "
> +                             L"userspace");
> +    } else {
> +        RtlInitUnicodeString(&filterReason,
> +                             L"OVS-Dropped due to failure to queue to "
> +                             L"userspace");
> +    }
>
> -    for (curNbl = netBufferLists; curNbl != NULL; curNbl = nextNbl) {
> -        POVS_VPORT_ENTRY vport;
> -        UINT32 portNo;
> -        OVS_DATAPATH *datapath = &switchContext->datapath;
> -        OVS_PACKET_HDR_INFO layers;
> -        OvsFlowKey key;
> -        UINT64 hash;
> -        PNET_BUFFER curNb;
> +    OvsAddPktCompletionList(completionList, TRUE, sourceVPort->portId, curNbl,
> +                            0, &filterReason); }
>
> -        nextNbl = curNbl->Next;
> -        curNbl->Next = NULL;
> +/*
> +* 
> +---------------------------------------------------------------------
> +-----
> +*  Processes one NBL, which may contain one or more NBs.
> +*  Processing means: if the NBL contains multiple NBs, the NBL is 
> +cloned to
> +*  multiple NBLs (i.e. split), and each NB (wrapped in a new NBL) is 
> +being
> +*  processed separately. If the original NBL contains only one NB, 
> +then only
> +*  one NB is processed.
> +* 
> +---------------------------------------------------------------------
> +-----
> +*/
>
> -        /* Ethernet Header is a guaranteed safe access. */
> -        curNb = NET_BUFFER_LIST_FIRST_NB(curNbl);
> -        if (curNb->Next != NULL) {
> -            /* XXX: This case is not handled yet. */
> -            ASSERT(FALSE);
> -        } else {
> -            POVS_BUFFER_CONTEXT ctx;
> -            OvsFlow *flow;
> -
> -            fwdDetail = NET_BUFFER_LIST_SWITCH_FORWARDING_DETAIL(curNbl);
> -            sourcePort = fwdDetail->SourcePortId;
> -            sourceIndex = (NDIS_SWITCH_NIC_INDEX)fwdDetail->SourceNicIndex;
> -
> -            /* Take the DispatchLock so none of the VPORTs disconnect while
> -             * we are setting destination ports.
> -             *
> -             * XXX: acquire/release the dispatch lock for a "batch" of packets
> -             * rather than for each packet. */
> -            NdisAcquireRWLockRead(switchContext->dispatchLock, &lockState,
> -                                  dispatch);
> -
> -            ctx = OvsInitExternalNBLContext(switchContext, curNbl,
> -                                  sourcePort == switchContext->externalPortId);
> -            if (ctx == NULL) {
> -                RtlInitUnicodeString(&filterReason,
> -                                     L"Cannot allocate external NBL context.");
> -
> -                OvsStartNBLIngressError(switchContext, curNbl,
> -                                        sendCompleteFlags, &filterReason,
> -                                        NDIS_STATUS_RESOURCES);
> -                NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
> -                continue;
> -            }
> -
> -            vport = OvsFindVportByPortIdAndNicIndex(switchContext, sourcePort,
> -                                                    sourceIndex);
> -            if (vport == NULL || vport->ovsState != OVS_STATE_CONNECTED) {
> -                RtlInitUnicodeString(&filterReason,
> -                    L"OVS-Cannot forward packet from unknown source port");
> -                goto dropit;
> -            } else {
> -                portNo = vport->portNo;
> -            }
> -
> -            vport->stats.rxPackets++;
> -            vport->stats.rxBytes += NET_BUFFER_DATA_LENGTH(curNb);
> -
> -            status = OvsExtractFlow(curNbl, vport->portNo, &key, &layers, NULL);
> -            if (status != NDIS_STATUS_SUCCESS) {
> -                RtlInitUnicodeString(&filterReason, L"OVS-Flow extract failed");
> -                goto dropit;
> -            }
> -
> -            ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
> -            OvsAcquireDatapathRead(datapath, &dpLockState, dispatch);
> -
> -            flow = OvsLookupFlow(datapath, &key, &hash, FALSE);
> -            if (flow) {
> -                OvsFlowUsed(flow, curNbl, &layers);
> -                datapath->hits++;
> -                /* If successful, OvsActionsExecute() consumes the NBL.
> -                 * Otherwise, it adds it to the completionList. No need to
> -                 * check the return value. */
> -                OvsActionsExecute(switchContext, &completionList, curNbl,
> -                                portNo, SendFlags, &key, &hash, &layers,
> -                                flow->actions, flow->actionsLen);
> -                OvsReleaseDatapath(datapath, &dpLockState);
> -                NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
> -                continue;
> -            } else {
> -                OvsReleaseDatapath(datapath, &dpLockState);
> -
> -                datapath->misses++;
> -                status = OvsCreateAndAddPackets(OVS_DEFAULT_PACKET_QUEUE,
> -                                                NULL, 0, OVS_PACKET_CMD_MISS,
> -                                                portNo,
> -                                                key.tunKey.dst != 0 ?
> -                                                (OvsIPv4TunnelKey *)&key.tunKey :
> -                                                NULL, curNbl,
> -                                                sourcePort ==
> -                                                switchContext->externalPortId,
> -                                                &layers, switchContext,
> -                                                &missedPackets, &num);
> -                if (status == NDIS_STATUS_SUCCESS) {
> -                    /* Complete the packet since it was copied to user
> -                     * buffer. */
> -                    RtlInitUnicodeString(&filterReason,
> -                        L"OVS-Dropped since packet was copied to userspace");
> -                } else {
> -                    RtlInitUnicodeString(&filterReason,
> -                        L"OVS-Dropped due to failure to queue to userspace");
> -                }
> -                goto dropit;
> -            }
> -
> -dropit:
> -            OvsAddPktCompletionList(&completionList, TRUE, sourcePort, curNbl, 0,
> -                                    &filterReason);
> -            NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
> -        }
> +static VOID
> +OvsProcessOneNbl(PNET_BUFFER_LIST origNbl,
> +                 POVS_SWITCH_CONTEXT switchContext,
> +                 ULONG sendFlags,
> +                 OvsCompletionList *completionList,
> +                 PLIST_ENTRY missedPackets,
> +                 PUINT32 countMissedPackets) {
> +    POVS_VPORT_ENTRY sourceVPort;
> +    PNET_BUFFER_LIST nextNbl, newNbl;
> +    POVS_BUFFER_CONTEXT bufferContext;
> +    LOCK_STATE_EX lockState;
> +    BOOLEAN atDispatch;
> +    NDIS_STATUS status = STATUS_SUCCESS;
> +    NDIS_STRING filterReason;
> +
> +    PNDIS_SWITCH_FORWARDING_DETAIL_NET_BUFFER_LIST_INFO fwdDetail;
> +    NDIS_SWITCH_PORT_ID hvSourcePortId = NDIS_SWITCH_DEFAULT_PORT_ID;
> +    NDIS_SWITCH_NIC_INDEX hvSourceIndex = 
> + NDIS_SWITCH_DEFAULT_NIC_INDEX;

Initialization of 'hvSourcePortId' and 'hvSourceIndex' seems unnecessary.

> +
> +    atDispatch = (NDIS_TEST_SEND_AT_DISPATCH_LEVEL(sendFlags) ?
> +                  NDIS_RWL_AT_DISPATCH_LEVEL : 0);
> +    fwdDetail = NET_BUFFER_LIST_SWITCH_FORWARDING_DETAIL(origNbl);
> +    hvSourcePortId = fwdDetail->SourcePortId;
> +    hvSourceIndex = (NDIS_SWITCH_NIC_INDEX)fwdDetail->SourceNicIndex;
> +
> +    /* create buffer context */
> +    bufferContext = OvsInitExternalNBLContext(switchContext, origNbl,
> +        hvSourcePortId == switchContext->externalPortId);
> +    if (bufferContext == NULL) {
> +        RtlInitUnicodeString(&filterReason,
> +                             L"Cannot allocate external NBL context.");
> +        status = NDIS_STATUS_RESOURCES;
> +        goto Cleanup;

One an NBL that does not have the OVS_BUFFER_CONTEXT allocated for it using OvsInitExternalNBLContext(), we should not be calling OVSCompleteNbl. In other words, we should not be jumping to Cleanup here, since cleanup unconditionally adds the NBL to the completionList and subsequently calls OvsCompleteNbl(). We should complete the NBL to NDIS directly. In the original code, we called OvsStartNBLIngressError() on such NBLs.

>     }
>
> -    /* Queue the missed packets. */
> -    OvsQueuePackets(OVS_DEFAULT_PACKET_QUEUE, &missedPackets, num);
> -    OvsFinalizeCompletionList(&completionList);
> +    /*
> +    * split NBLs: each NBL must contain only one NB.
> +    * NOTE: if origNbl has only one NB => newNbl will be origNbl
> +    */

Comment style should be (align the asterisks) here and in other places:
/*
 * ....
 */

> +    newNbl = OvsSplitNblByNB(origNbl, switchContext, TRUE);
> +    if (!newNbl) {
> +        RtlInitUnicodeString(&filterReason, L"Cannot allocate new NBL: partial"
> +                             L"copy NB to multiple NBLs.");
> +        status = NDIS_STATUS_RESOURCES;
> +        goto Cleanup;
> +    }

Jumping to 'Cleanup' here or from code down the line would add 'origNbl' to the completion list, and we'd end up calling OVSCompleteNBl() on it. But, the refCount is already 0, so we can have bad behavior. You can fix it either by not jumping to Cleanup or by preserving the refCount and letting code code flow through.

> +
> +    /*
> +    * Take the DispatchLock so none of the VPORTs disconnect while
> +    * we are setting hyper-v switch ports as destinations to the NBL
> +    *
> +    * XXX: we may be waiting very long here (given we're at DPC level) if
> +    * there is a lot of network traffic and one of many VMs connects or
> +    * disconnects.
> +    */
> +    NdisAcquireRWLockRead(switchContext->dispatchLock, &lockState, 
> + atDispatch);
> +
> +    sourceVPort = OvsFindVportByPortIdAndNicIndex(switchContext, hvSourcePortId,
> +                                                  hvSourceIndex);
> +    if (sourceVPort == NULL || sourceVPort->ovsState != OVS_STATE_CONNECTED) {
> +        NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
> +
> +        RtlInitUnicodeString(&filterReason, L"OVS-Cannot forward packet from "
> +                             L"unknown source port");
> +        status = NDIS_STATUS_INVALID_PORT;
> +        goto Cleanup;
> +    }
> +
> +    /* update vport stats */
> +    OVS_NBL_FOR_EACH(nbl, newNbl) {
> +        PNET_BUFFER firstNb = NET_BUFFER_LIST_FIRST_NB(nbl);
> +
> +        sourceVPort->stats.rxPackets++;
> +        sourceVPort->stats.rxBytes += NET_BUFFER_DATA_LENGTH(firstNb);
> +    }
> +
> +    /* process each NB: each NB is wrapped in an NBL in the newNbl chain */
> +    OVS_NBL_FOR_EACH_NEXT(curNbl, newNbl, nextNbl) {
> +        nextNbl = NET_BUFFER_LIST_NEXT_NBL(curNbl);
> +        NET_BUFFER_LIST_NEXT_NBL(curNbl) = NULL;
> +
> +        OvsProcessOneNb(curNbl, switchContext, sourceVPort, completionList,
> +                        sendFlags, missedPackets, countMissedPackets);
> +    }

Any reason the two loops cannot be combined?

> +
> +    NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
> +
> +Cleanup:

We generally use all lowercase for the the label. I can update the CodingStyle to make this explicit. What do you think? Having both lower and upper case makes it a little untidy.

> +    if (NT_ERROR(status)) {
> +        OvsAddPktCompletionList(completionList, TRUE, hvSourcePortId, origNbl,
> +                                0, &filterReason);
> +    }
> }
>
>
> @@ -340,9 +396,17 @@ OvsExtSendNBL(NDIS_HANDLE filterModuleContext,
>     POVS_SWITCH_CONTEXT switchContext;
>     switchContext = (POVS_SWITCH_CONTEXT) filterModuleContext;
>
> +    PNET_BUFFER_LIST nextNbl = NULL;
> +    ULONG sendCompleteFlags;
> +    LIST_ENTRY missedPackets;
> +    UINT32 countMissedPackets = 0;
> +    OvsCompletionList completionList;
> +
> +    sendCompleteFlags = OvsGetSendCompleteFlags(sendFlags);
> +    sendFlags |= NDIS_SEND_FLAGS_SWITCH_DESTINATION_GROUP;
> +
>     if (switchContext->dataFlowState == OvsSwitchPaused) {
>         NDIS_STRING filterReason;
> -        ULONG sendCompleteFlags = OvsGetSendCompleteFlags(sendFlags);
>
>         RtlInitUnicodeString(&filterReason,
>                              L"Switch state PAUSED, drop on 
> ingress."); @@ -354,7 +418,21 @@ OvsExtSendNBL(NDIS_HANDLE 
> filterModuleContext,
>
>     ASSERT(switchContext->dataFlowState == OvsSwitchRunning);
>
> -    OvsStartNBLIngress(switchContext, netBufferLists, sendFlags);
> +    InitializeListHead(&missedPackets);
> +    OvsInitCompletionList(&completionList, switchContext, 
> + sendCompleteFlags);
> +
> +    OVS_NBL_FOR_EACH_NEXT(curNbl, netBufferLists, nextNbl) {
> +        nextNbl = curNbl->Next;
> +        curNbl->Next = NULL;
> +
> +        OvsProcessOneNbl(curNbl, switchContext, sendFlags, &completionList,
> +                         &missedPackets, &countMissedPackets);
> +    }
> +
> +    /* Queue the missed packets. */
> +    OvsQueuePackets(OVS_DEFAULT_PACKET_QUEUE, &missedPackets, 
> + countMissedPackets);

A small optimization is to call OvsQueuePackets() only if countMissedPackets > 0. Also, the line is longer than 79 characters. You can reduce the length if possible.

> +    /* Complete all packets that were added to the completion list */
> +    OvsFinalizeCompletionList(&completionList);
> }
>
> static VOID
> @@ -382,7 +460,8 @@ OvsCompleteNBLIngress(POVS_SWITCH_CONTEXT 
> switchContext,
>
>     /* Complete the NBL's that were sent by the upper layer. */
>     if (newList.dropNbl != NULL) {
> -        NdisFSendNetBufferListsComplete(switchContext->NdisFilterHandle, newList.dropNbl,
> +        NdisFSendNetBufferListsComplete(switchContext->NdisFilterHandle,
> +                                        newList.dropNbl,
>                                         sendCompleteFlags);
>     }
> }
> diff --git a/datapath-windows/ovsext/Tunnel.c 
> b/datapath-windows/ovsext/Tunnel.c
> index 2e7da10..d52d788 100644
> --- a/datapath-windows/ovsext/Tunnel.c
> +++ b/datapath-windows/ovsext/Tunnel.c
> @@ -227,6 +227,7 @@ OvsInjectPacketThroughActions(PNET_BUFFER_LIST pNbl,
>     OVS_DATAPATH *datapath = &gOvsSwitchContext->datapath;
>
>     ASSERT(gOvsSwitchContext);
> +    ASSERT(pNbl->FirstNetBuffer->Next == NULL);
>
>     /* Fill the tunnel key */
>     status = OvsSlowPathDecapVxlan(pNbl, &tunnelKey); @@ -306,8 +307,8 
> @@ OvsInjectPacketThroughActions(PNET_BUFFER_LIST pNbl,
>             datapath->hits++;
>
>             OvsActionsExecute(gOvsSwitchContext, &completionList, pNbl,
> -                            portNo, SendFlags, &key, &hash, &layers,
> -                            flow->actions, flow->actionsLen);
> +                              portNo, SendFlags, &key, &hash, &layers,
> +                              flow->actions, flow->actionsLen);
>
>             OvsReleaseDatapath(datapath, &dpLockState);
>         } else {
> diff --git a/datapath-windows/ovsext/User.c 
> b/datapath-windows/ovsext/User.c index 612a4bd..11f7aca 100644
> --- a/datapath-windows/ovsext/User.c
> +++ b/datapath-windows/ovsext/User.c
> @@ -360,10 +360,15 @@ OvsExecuteDpIoctl(PVOID inputBuffer,
>         fwdDetail->SourcePortId = NDIS_SWITCH_DEFAULT_PORT_ID;
>         fwdDetail->SourceNicIndex = 0;
>     }
> -    // XXX: Figure out if any of the other members of fwdDetail need to be set.
> +    /*
> +     * XXX: Figure out if any of the other members of fwdDetail need to be
> +     * set.
> +     */
> +
> +    ASSERT(pNbl->FirstNetBuffer->Next == NULL);
>
>     ndisStatus = OvsExtractFlow(pNbl, fwdDetail->SourcePortId, &key, &layers,
> -                              NULL);
> +                                NULL);
>     if (ndisStatus == NDIS_STATUS_SUCCESS) {
>         ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
>         NdisAcquireRWLockRead(gOvsSwitchContext->dispatchLock, 
> &lockState, @@ -828,6 +833,8 @@ OvsCreateAndAddPackets(UINT32 queueId,
>     PNET_BUFFER_LIST newNbl = NULL;
>     PNET_BUFFER nb;
>
> +    ASSERT(nbl->FirstNetBuffer->Next == NULL);

If we remove the code changes in OvsTunnelTx(), this change won't be necessary.

thanks,
-- Nithin






More information about the dev mailing list