[ovs-dev] [PATCH v2] Handle NBLs with multiple NBs

Nithin Raju nithin at vmware.com
Tue Sep 23 19:15:32 UTC 2014


hi Samuel,
Thanks for the re-spin. I'm afraid, there'll have to be another iteration. I see a refCounting issue when origNbl->FirstNetBuffer->Next != NULL. I have noted that down in the code.

Other comments are very minor.

Thanks again for working on this. I'll be on IRC if you want to chat.

thanks,
-- Nithin

On Sep 23, 2014, at 7:19 AM, Samuel Ghinet <sghinet at cloudbasesolutions.com> wrote:

> All NET_BUFFERs of a NET_BUFFER_LIST must go through the pipeline: extract,
> find flow, execute. Previously, only the first NET_BUFFER of a NET_BUFFER_LIST
> was going through this pipeline, which was erroneous.

Minor: the commit message seems a little inaccurate. One suggestion I have is
we should probably say something like:
'Previously, for NBLs sent from NDIS, we handled only cases where the NBL had a
single NB. Handling NBLs with multiple NBs was not implemented'.


> OvsPartialCopyToMultipleNBLs is used to make each NET_BUFFER have its own
> NET_BUFFER_LIST.
> 
> I have also added a few ASSERTs where the NET_BUFFER_LIST is expected to have
> only one NET_BUFFER."
> 
> By fixing this bug, the function OvsStartNBLIngress was also refactored,
> because the function was too big and intricate, and adding the new code
> necessary for this bug fix made it bigger and more intricate, and also
> more difficult to review.
> 
> The refactor meant the following modifications:
> o) merge OvsExtSendNBL and OvsStartNBLIngress into OvsExtSendNBL.
>   There is no reason to have functionalities in these two functions
>   separate
> o) all the processing for one NBL of the original NBL chain is now processed
>   in a separate function: OvsProcessOneNbl
> o) all the processing for one NB (where the NB is wrapped inside a NBL) is now
>   processed in a separate function: OvsProcessOneNb
> o) OvsSplitNblByNB was created to return either:
>   - the original NBL (if the original NBL has only one NB)
>   or
>   - a cloned / partial copy to multiple NBLs, if the original NBL had more NBs.
> o) clearer variable names
> 
> This patch also fixes a bug in OvsPartialCopyToMultipleNBLs:
> The OvsPartialCopyToMultipleNBLs function creates a new link of NBLs out of one
> NBL: it creates one NBL for each NB in the original NBL. The result is returned
> in firstNbl, which is the first in the linked-list chain of created NBLs.
> 
> This function mistakenly linked the resulting NBLs, mixing the original NBL in
> the chain.
> 
> This patch also addresses this bug, by making the created NBLs be linked
> together correctly.
> 
> Tested:
> - vxlan
> - vlan (using patch ports, as specified in INSTALL.Windows)
> 
> using:
> - ping
> - tcp
> - tcp LSO (tcp segmentation)
> 
> Reported-at: ovs/ovs-issues#15
> Signed-off-by: Samuel Ghinet <sghinet at cloudbasesolutions.com>

Suggestion: you can annotate the github issue #:
https://github.com/openvswitch/ovs-issues/issues/2 (If I am not wrong).

> ---
> datapath-windows/ovsext/Actions.c    |  20 +-
> datapath-windows/ovsext/BufferMgmt.c |  33 +++-
> datapath-windows/ovsext/BufferMgmt.h |  19 ++
> datapath-windows/ovsext/PacketIO.c   | 353 +++++++++++++++++++++--------------
> datapath-windows/ovsext/Tunnel.c     |   5 +-
> datapath-windows/ovsext/User.c       |  15 +-
> datapath-windows/ovsext/Vxlan.c      |   5 +
> 7 files changed, 309 insertions(+), 141 deletions(-)
> 
> diff --git a/datapath-windows/ovsext/Actions.c b/datapath-windows/ovsext/Actions.c
> index 35ebfdf..cfda805 100644
> --- a/datapath-windows/ovsext/Actions.c
> +++ b/datapath-windows/ovsext/Actions.c
> @@ -504,6 +504,11 @@ OvsCompleteNBLForwardingCtx(OvsForwardingContext *ovsFwdCtx,
>  *     XXX: It is assumed that the NBL in 'ovsFwdCtx' is owned by OVS. This is
>  *     until the new buffer management framework is adopted.
>  *
> + *     NOTE: ovsFwdCtx->curNbl is expected to have one or more NBs. This is a
> + *           case that should happen only as result to TCP segmentation, when
> + *           used by tunneling (send to tunnel port) where all the NBs in the
> + *           ovsFwdCtx->curNbl share the same packet info (flow key).

Typo: 'result to' => 'result of'.

Nice comment!

> + *
>  * Side effects:
>  *     The NBL in 'ovsFwdCtx' is consumed.
>  * --------------------------------------------------------------------------
> @@ -597,6 +602,9 @@ OvsTunnelPortTx(OvsForwardingContext *ovsFwdCtx)
>     NDIS_STATUS status = NDIS_STATUS_FAILURE;
>     PNET_BUFFER_LIST newNbl = NULL;
> 
> +    ASSERT(ovsFwdCtx->curNbl->Next == NULL);
> +    ASSERT(ovsFwdCtx->curNbl->FirstNetBuffer->Next == NULL);
> +
>     /*
>      * Setup the source port to be the internal port to as to facilitate the
>      * second OvsLookupFlow.
> @@ -631,8 +639,16 @@ OvsTunnelPortTx(OvsForwardingContext *ovsFwdCtx)
> 
>     if (status == NDIS_STATUS_SUCCESS) {
>         ASSERT(newNbl);
> +
>         OvsCompleteNBLForwardingCtx(ovsFwdCtx,
> -                                    L"Complete after cloning NBL for encapsulation");
> +                                    L"Complete after cloning NBL for "
> +                                    L"encapsulation");
> +
> +        /*
> +         * If TCP segmentation happened, newNbl will have multiple NBs.
> +         * In this case, we need to find matching flow only for the first
> +         * NB - because all NBs will share the same packet info.
> +         */
>         ovsFwdCtx->curNbl = newNbl;
>         status = OvsDoFlowLookupOutput(ovsFwdCtx);
>         ASSERT(ovsFwdCtx->curNbl == NULL);
> @@ -1368,6 +1384,8 @@ OvsActionsExecute(POVS_SWITCH_CONTEXT switchContext,
>     PNDIS_SWITCH_FORWARDING_DETAIL_NET_BUFFER_LIST_INFO fwdDetail =
>         NET_BUFFER_LIST_SWITCH_FORWARDING_DETAIL(curNbl);
> 
> +    ASSERT(curNbl->FirstNetBuffer->Next == NULL);

This will not be true when 'OvsActionsExecute()' is called from 'OvsDoFlowLookupOutput() for reasons you know already. OvsActionsExecute() can be called on an NBL that has multiple NBs after completing TSO in OvsDoEncapVxlan().

> +
>     /* XXX: ASSERT that the flow table lock is held. */
>     status = OvsInitForwardingCtx(&ovsFwdCtx, switchContext, curNbl, portNo,
>                                   sendFlags, fwdDetail, completionList,
> diff --git a/datapath-windows/ovsext/BufferMgmt.c b/datapath-windows/ovsext/BufferMgmt.c
> index e0377c1..341734f 100644
> --- a/datapath-windows/ovsext/BufferMgmt.c
> +++ b/datapath-windows/ovsext/BufferMgmt.c
> @@ -863,7 +863,7 @@ OvsPartialCopyToMultipleNBLs(PVOID ovsContext,
>         if (prevNbl == NULL) {
>             firstNbl = newNbl;
>         } else {
> -            NET_BUFFER_LIST_NEXT_NBL(prevNbl) = nbl;
> +            NET_BUFFER_LIST_NEXT_NBL(prevNbl) = newNbl;

LG

>             NET_BUFFER_NEXT_NB(prevNb) = nb;
>         }
>         prevNbl = newNbl;
> @@ -1533,3 +1533,34 @@ OvsGetCtxSourcePortNo(PNET_BUFFER_LIST nbl,
>     *portNo = ctx->srcPortNo;
>     return NDIS_STATUS_SUCCESS;
> }
> +
> +PNET_BUFFER_LIST
> +OvsSplitNblByNB(PNET_BUFFER_LIST nblList,
> +                POVS_SWITCH_CONTEXT switchContext,
> +                BOOLEAN decrementBufRefCount)

If you could add a comment above the function describing the function behavior, that would be great. I know you have it in the commit message. Duping the comment here would be a good idea so that anyone reading the code does not have to crosscheck on the commit message.


> +{
> +    PNET_BUFFER curNb;
> +    PNET_BUFFER_LIST newNblList;
> +    POVS_BUFFER_CONTEXT bufferContext;
> +
> +    curNb = NET_BUFFER_LIST_FIRST_NB(nblList);
> +
> +    if (curNb->Next == NULL) {
> +        return nblList;
> +    }
> +
> +    bufferContext = (POVS_BUFFER_CONTEXT)
> +                     NET_BUFFER_LIST_CONTEXT_DATA_START(nblList);
> +
> +    newNblList = OvsPartialCopyToMultipleNBLs(switchContext, nblList, 0, 0,
> +                                              TRUE);
> +    if (!newNblList) {
> +        return NULL;
> +    }
> +
> +    if (decrementBufRefCount) {
> +        InterlockedDecrement((volatile LONG*)&bufferContext->refCount);
> +    }

One issue I missed out earlier, is that OvsPartialCopyToMultipleNBLs() would have increased the refCount by as many as the number of NBs you have. Eg. if you have 5 NBs in the NBL,  the refCount value in the source NBL would be 6. I don't know if you had accounted for it.

> +
> +    return newNblList;
> +}
> \ No newline at end of file
> diff --git a/datapath-windows/ovsext/BufferMgmt.h b/datapath-windows/ovsext/BufferMgmt.h
> index 915d7f5..a71d100 100644
> --- a/datapath-windows/ovsext/BufferMgmt.h
> +++ b/datapath-windows/ovsext/BufferMgmt.h
> @@ -17,6 +17,10 @@
> #ifndef __BUFFER_MGMT_H_
> #define __BUFFER_MGMT_H_ 1
> 
> +#include "precomp.h"
> +
> +typedef struct _OVS_SWITCH_CONTEXT *POVS_SWITCH_CONTEXT;
> +
> #define MEM_ALIGN                       MEMORY_ALLOCATION_ALIGNMENT
> #define MEM_ALIGN_SIZE(_x)  ((MEM_ALIGN - 1 + (_x))/MEM_ALIGN * MEM_ALIGN)
> #define OVS_CTX_MAGIC                   0xabcd
> @@ -81,6 +85,16 @@ typedef struct _OVS_NBL_POOL {
> #endif
> } OVS_NBL_POOL, *POVS_NBL_POOL;
> 
> +#define OVS_NBL_FOR_EACH(_curNbl, _nblList)       \
> +for (PNET_BUFFER_LIST _curNbl = _nblList;         \
> +     _curNbl != NULL;                             \
> +     _curNbl = NET_BUFFER_LIST_NEXT_NBL(_curNbl))
> +
> +#define OVS_NBL_FOR_EACH_NEXT(_curNbl, _nblList, _nextNbl)   \
> +for (PNET_BUFFER_LIST _curNbl = _nblList;                    \
> +     _curNbl != NULL;                                        \
> +     _curNbl = _nextNbl)
> +
> 
> NDIS_STATUS OvsInitBufferPool(PVOID context);
> VOID OvsCleanupBufferPool(PVOID context);
> @@ -121,4 +135,9 @@ NDIS_STATUS OvsSetCtxSourcePortNo(PNET_BUFFER_LIST nbl, UINT32 portNo);
> 
> NDIS_STATUS OvsGetCtxSourcePortNo(PNET_BUFFER_LIST nbl, UINT32 *portNo);
> 
> +PNET_BUFFER_LIST
> +OvsSplitNblByNB(PNET_BUFFER_LIST nblList,
> +                POVS_SWITCH_CONTEXT switchContext,
> +                BOOLEAN decrementBufRefCount);

A typical function declaration (not definition) would put the function name and the return type on the same line. I don't really feel very strongly about this though, but suggest doing it to keep the code consistent:

Eg.:
PNET_BUFFER_LIST OvsSplitNblByNB(PNET_BUFFER_LIST nblList,
                                 POVS_SWITCH_CONTEXT switchContext,
                                 BOOLEAN decrementBufRefCount);


> +
> #endif /* __BUFFER_MGMT_H_ */
> diff --git a/datapath-windows/ovsext/PacketIO.c b/datapath-windows/ovsext/PacketIO.c
> index ac7862d..488ce2b 100644
> --- a/datapath-windows/ovsext/PacketIO.c
> +++ b/datapath-windows/ovsext/PacketIO.c
> @@ -28,6 +28,7 @@
> #include "Flow.h"
> #include "Event.h"
> #include "User.h"
> +#include "BufferMgmt.h"
> 
> /* Due to an imported header file */
> #pragma warning( disable:4505 )
> @@ -176,149 +177,207 @@ OvsStartNBLIngressError(POVS_SWITCH_CONTEXT switchContext,
>                                     sendCompleteFlags);
> }
> 
> +/*
> + * --------------------------------------------------------------------------
> + *  Processes one NB, which means:
> + *  o) Extract Packet Info (i.e. flow key)
> + *  o) Find flow matching packet info
> + *  o) If a matching flow was found => execute flow's actions
> + *  o) If a matching flow was not found => queue packet to userspace
> + * --------------------------------------------------------------------------
> +*/
> static VOID
> -OvsStartNBLIngress(POVS_SWITCH_CONTEXT switchContext,
> -                   PNET_BUFFER_LIST netBufferLists,
> -                   ULONG SendFlags)
> +OvsProcessOneNb(PNET_BUFFER_LIST curNbl,
> +                POVS_SWITCH_CONTEXT switchContext,
> +                POVS_VPORT_ENTRY sourceVPort,
> +                OvsCompletionList *completionList,
> +                ULONG sendFlags,
> +                PLIST_ENTRY missedPackets,
> +                PUINT32 countMissedPackets)
> {
> -    NDIS_SWITCH_PORT_ID sourcePort = 0;
> -    NDIS_SWITCH_NIC_INDEX sourceIndex = 0;
> -    PNDIS_SWITCH_FORWARDING_DETAIL_NET_BUFFER_LIST_INFO fwdDetail;
> -    PNET_BUFFER_LIST curNbl = NULL, nextNbl = NULL;
> -    ULONG sendCompleteFlags;
> -    UCHAR dispatch;
> -    LOCK_STATE_EX lockState, dpLockState;
>     NDIS_STATUS status;
> +    OvsFlowKey key;
> +    OVS_PACKET_HDR_INFO layers;
>     NDIS_STRING filterReason;
> -    LIST_ENTRY missedPackets;
> -    UINT32 num = 0;
> -    OvsCompletionList completionList;
> +    LOCK_STATE_EX dpLockState;
> +    OvsFlow *flow;
> +    UINT64 hash;
> +    POVS_DATAPATH datapath = &(switchContext->datapath);
> 
> -    dispatch = NDIS_TEST_SEND_AT_DISPATCH_LEVEL(SendFlags)?
> -                                            NDIS_RWL_AT_DISPATCH_LEVEL : 0;
> -    sendCompleteFlags = OvsGetSendCompleteFlags(SendFlags);
> -    SendFlags |= NDIS_SEND_FLAGS_SWITCH_DESTINATION_GROUP;
> +    ASSERT(NET_BUFFER_LIST_NEXT_NBL(curNbl) == NULL);
> +    ASSERT(curNbl->FirstNetBuffer->Next == NULL);
> 
> -    InitializeListHead(&missedPackets);
> -    OvsInitCompletionList(&completionList, switchContext, sendCompleteFlags);
> +    /* 1. Extract flow key*/
> +    status = OvsExtractFlow(curNbl, sourceVPort->portNo, &key, &layers, NULL);
> 
> -    for (curNbl = netBufferLists; curNbl != NULL; curNbl = nextNbl) {
> -        POVS_VPORT_ENTRY vport;
> -        UINT32 portNo;
> -        OVS_DATAPATH *datapath = &switchContext->datapath;
> -        OVS_PACKET_HDR_INFO layers;
> -        OvsFlowKey key;
> -        UINT64 hash;
> -        PNET_BUFFER curNb;
> +    if (status != NDIS_STATUS_SUCCESS) {
> +        RtlInitUnicodeString(&filterReason, L"OVS-Flow extract failed");
> +        OvsAddPktCompletionList(completionList, TRUE, sourceVPort->portId,
> +                                curNbl, 0, &filterReason);
> +        return;
> +    }
> 
> -        nextNbl = curNbl->Next;
> -        curNbl->Next = NULL;
> +    /*
> +     * We must be at dispatch level, because the caller locked
> +     * switchContext->dispatchLock.
> +     */
> +    ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
> +    OvsAcquireDatapathRead(datapath, &dpLockState, TRUE);
> +
> +    /* 2. Find flow matching key */
> +    flow = OvsLookupFlow(datapath, &key, &hash, FALSE);
> +    if (flow) {
> +        OvsFlowUsed(flow, curNbl, &layers);
> +        datapath->hits++;
> +
> +        /*
> +         * 2.a) execute actions on packet.
> +         * If successful, OvsActionsExecute() consumes the NBL.
> +         * Otherwise, it adds it to the completionList. No need to
> +         * check the return value.
> +        */
> +        OvsActionsExecute(switchContext, completionList, curNbl,
> +                          sourceVPort->portNo, sendFlags, &key, &hash,
> +                          &layers, flow->actions, flow->actionsLen);
> +
> +        OvsReleaseDatapath(datapath, &dpLockState);
> +        return;
> +    }
> 
> -        /* Ethernet Header is a guaranteed safe access. */
> -        curNb = NET_BUFFER_LIST_FIRST_NB(curNbl);
> -        if (curNb->Next != NULL) {
> -            /* XXX: This case is not handled yet. */
> -            ASSERT(FALSE);
> -        } else {
> -            POVS_BUFFER_CONTEXT ctx;
> -            OvsFlow *flow;
> -
> -            fwdDetail = NET_BUFFER_LIST_SWITCH_FORWARDING_DETAIL(curNbl);
> -            sourcePort = fwdDetail->SourcePortId;
> -            sourceIndex = (NDIS_SWITCH_NIC_INDEX)fwdDetail->SourceNicIndex;
> -
> -            /* Take the DispatchLock so none of the VPORTs disconnect while
> -             * we are setting destination ports.
> -             *
> -             * XXX: acquire/release the dispatch lock for a "batch" of packets
> -             * rather than for each packet. */
> -            NdisAcquireRWLockRead(switchContext->dispatchLock, &lockState,
> -                                  dispatch);
> -
> -            ctx = OvsInitExternalNBLContext(switchContext, curNbl,
> -                                  sourcePort == switchContext->externalPortId);
> -            if (ctx == NULL) {
> -                RtlInitUnicodeString(&filterReason,
> -                                     L"Cannot allocate external NBL context.");
> -
> -                OvsStartNBLIngressError(switchContext, curNbl,
> -                                        sendCompleteFlags, &filterReason,
> -                                        NDIS_STATUS_RESOURCES);
> -                NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
> -                continue;
> -            }
> -
> -            vport = OvsFindVportByPortIdAndNicIndex(switchContext, sourcePort,
> -                                                    sourceIndex);
> -            if (vport == NULL || vport->ovsState != OVS_STATE_CONNECTED) {
> -                RtlInitUnicodeString(&filterReason,
> -                    L"OVS-Cannot forward packet from unknown source port");
> -                goto dropit;
> -            } else {
> -                portNo = vport->portNo;
> -            }
> -
> -            vport->stats.rxPackets++;
> -            vport->stats.rxBytes += NET_BUFFER_DATA_LENGTH(curNb);
> -
> -            status = OvsExtractFlow(curNbl, vport->portNo, &key, &layers, NULL);
> -            if (status != NDIS_STATUS_SUCCESS) {
> -                RtlInitUnicodeString(&filterReason, L"OVS-Flow extract failed");
> -                goto dropit;
> -            }
> -
> -            ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
> -            OvsAcquireDatapathRead(datapath, &dpLockState, dispatch);
> -
> -            flow = OvsLookupFlow(datapath, &key, &hash, FALSE);
> -            if (flow) {
> -                OvsFlowUsed(flow, curNbl, &layers);
> -                datapath->hits++;
> -                /* If successful, OvsActionsExecute() consumes the NBL.
> -                 * Otherwise, it adds it to the completionList. No need to
> -                 * check the return value. */
> -                OvsActionsExecute(switchContext, &completionList, curNbl,
> -                                portNo, SendFlags, &key, &hash, &layers,
> -                                flow->actions, flow->actionsLen);
> -                OvsReleaseDatapath(datapath, &dpLockState);
> -                NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
> -                continue;
> -            } else {
> -                OvsReleaseDatapath(datapath, &dpLockState);
> -
> -                datapath->misses++;
> -                status = OvsCreateAndAddPackets(OVS_DEFAULT_PACKET_QUEUE,
> -                                                NULL, 0, OVS_PACKET_CMD_MISS,
> -                                                portNo,
> -                                                key.tunKey.dst != 0 ?
> -                                                (OvsIPv4TunnelKey *)&key.tunKey :
> -                                                NULL, curNbl,
> -                                                sourcePort ==
> -                                                switchContext->externalPortId,
> -                                                &layers, switchContext,
> -                                                &missedPackets, &num);
> -                if (status == NDIS_STATUS_SUCCESS) {
> -                    /* Complete the packet since it was copied to user
> -                     * buffer. */
> -                    RtlInitUnicodeString(&filterReason,
> -                        L"OVS-Dropped since packet was copied to userspace");
> -                } else {
> -                    RtlInitUnicodeString(&filterReason,
> -                        L"OVS-Dropped due to failure to queue to userspace");
> -                }
> -                goto dropit;
> -            }
> -
> -dropit:
> -            OvsAddPktCompletionList(&completionList, TRUE, sourcePort, curNbl, 0,
> -                                    &filterReason);
> -            NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
> -        }
> +    OvsReleaseDatapath(datapath, &dpLockState);
> +
> +    /* 2.b) no matching flow found => queue packet to userspace */
> +    datapath->misses++;
> +    status = OvsCreateAndAddPackets(OVS_DEFAULT_PACKET_QUEUE,
> +                                    /*user data*/ NULL, /*user data length*/ 0,
> +                                    OVS_PACKET_CMD_MISS, sourceVPort->portNo,
> +                                    (key.tunKey.dst != 0 ? &key.tunKey : NULL),
> +                                    curNbl,
> +                                    (sourceVPort->portId ==
> +                                    switchContext->externalPortId),
> +                                    &layers, switchContext, missedPackets,
> +                                    countMissedPackets);
> +    if (status == NDIS_STATUS_SUCCESS) {
> +        /* Complete the packet since it was copied to user buffer. */
> +        RtlInitUnicodeString(&filterReason,
> +                             L"OVS-Dropped since packet was copied to "
> +                             L"userspace");
> +    } else {
> +        RtlInitUnicodeString(&filterReason,
> +                             L"OVS-Dropped due to failure to queue to "
> +                             L"userspace");
>     }
> 
> -    /* Queue the missed packets. */
> -    OvsQueuePackets(OVS_DEFAULT_PACKET_QUEUE, &missedPackets, num);
> -    OvsFinalizeCompletionList(&completionList);
> +    OvsAddPktCompletionList(completionList, TRUE, sourceVPort->portId, curNbl,
> +                            0, &filterReason);
> +}
> +
> +/*
> + * --------------------------------------------------------------------------
> + *  Processes one NBL, which may contain one or more NBs.
> + *  Processing means: if the NBL contains multiple NBs, the NBL is cloned to
> + *  multiple NBLs (i.e. split), and each NB (wrapped in a new NBL) is being
> + *  processed separately. If the original NBL contains only one NB, then only
> + *  one NB is processed.
> + * --------------------------------------------------------------------------
> +*/
> +static VOID
> +OvsProcessOneNbl(PNET_BUFFER_LIST origNbl,
> +                 POVS_SWITCH_CONTEXT switchContext,
> +                 ULONG sendFlags,
> +                 OvsCompletionList *completionList,
> +                 PLIST_ENTRY missedPackets,
> +                 PUINT32 countMissedPackets)
> +{
> +    POVS_VPORT_ENTRY sourceVPort;
> +    PNET_BUFFER_LIST nextNbl, newNbl;
> +    POVS_BUFFER_CONTEXT bufferContext;
> +    LOCK_STATE_EX lockState;
> +    BOOLEAN atDispatch;
> +    NDIS_STATUS status = STATUS_SUCCESS;
> +    NDIS_STRING filterReason;
> +    ULONG completeFlags = OvsGetSendCompleteFlags(sendFlags);
> +
> +    PNDIS_SWITCH_FORWARDING_DETAIL_NET_BUFFER_LIST_INFO fwdDetail;
> +    NDIS_SWITCH_PORT_ID hvSourcePortId;
> +    NDIS_SWITCH_NIC_INDEX hvSourceIndex;
> +
> +    atDispatch = (NDIS_TEST_SEND_AT_DISPATCH_LEVEL(sendFlags) ?
> +                  NDIS_RWL_AT_DISPATCH_LEVEL : 0);
> +    fwdDetail = NET_BUFFER_LIST_SWITCH_FORWARDING_DETAIL(origNbl);
> +    hvSourcePortId = fwdDetail->SourcePortId;
> +    hvSourceIndex = (NDIS_SWITCH_NIC_INDEX)fwdDetail->SourceNicIndex;
> +
> +    /* create buffer context */
> +    bufferContext = OvsInitExternalNBLContext(switchContext, origNbl,
> +        hvSourcePortId == switchContext->externalPortId);
> +    if (bufferContext == NULL) {
> +        RtlInitUnicodeString(&filterReason,
> +                             L"Cannot allocate external NBL context.");
> +        status = NDIS_STATUS_RESOURCES;
> +
> +        OvsStartNBLIngressError(switchContext, origNbl, completeFlags,
> +                                &filterReason,
> +                                status);
> +        return;
> +    }
> +
> +    /*
> +     * split NBLs: each NBL must contain only one NB.
> +     * NOTE: if origNbl has only one NB => newNbl will be origNbl
> +    */
> +    newNbl = OvsSplitNblByNB(origNbl, switchContext, TRUE);
> +    if (!newNbl) {
> +        RtlInitUnicodeString(&filterReason, L"Cannot allocate new NBL: partial"
> +                             L"copy NB to multiple NBLs.");
> +        status = NDIS_STATUS_RESOURCES;
> +        goto Cleanup;
> +    }

We could add an ASSERT here:
ASSERT(origNbl == newNbl || origNbl->FirstNetBuffer->Next != NULL);

> +
> +    /*
> +     * Take the DispatchLock so none of the VPORTs disconnect while
> +     * we are setting hyper-v switch ports as destinations to the NBL
> +     *
> +     * XXX: we may be waiting very long here (given we're at DPC level) if
> +     * there is a lot of network traffic and one of many VMs connects or
> +     * disconnects.
> +    */
> +    NdisAcquireRWLockRead(switchContext->dispatchLock, &lockState, atDispatch);
> +
> +    sourceVPort = OvsFindVportByPortIdAndNicIndex(switchContext, hvSourcePortId,
> +                                                  hvSourceIndex);
> +    if (sourceVPort == NULL || sourceVPort->ovsState != OVS_STATE_CONNECTED) {
> +        NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
> +
> +        RtlInitUnicodeString(&filterReason, L"OVS-Cannot forward packet from "
> +                             L"unknown source port");
> +        status = NDIS_STATUS_INVALID_PORT;

If we hit this case, and origNbl->FirstNetBuffer->Next == NULL, the code is doing the right thing since the refCount on origNbl has not been decremented by 'OvsSplitNblByNB()'. However, if origNbl->FirstNetBuffer->Next != NULL, the refCount on 'origNbl' has been decremented, and I see 2 issues:
1. We'll end up calling OvsAddPktCompletionList(), OvsFinalizeCompletionList(), OvsCompleteNBLIngress(), OvsCompleteNBL() on 'origNbl'. This is wrong.
2. Further, 'nextNbl' that was created by OvsSplitNblByNB(), but not being freed using OvsCompleteNBL().

One solution basically is to preserve the refCount on the original, and call OvsCompleteNBL() on 'newNbl'. You'll have to do something like:
OVS_NBL_FOR_EACH_NEXT() {
    origNbl = OvsCompleteNBL(nextNbl);
    if (origNbl) {
        goto Cleanup;
    }
}

> +        goto Cleanup;
> +    }
> +
> +    /* process each NB: each NB is wrapped in an NBL in the newNbl chain */
> +    OVS_NBL_FOR_EACH_NEXT(curNbl, newNbl, nextNbl) {
> +        nextNbl = NET_BUFFER_LIST_NEXT_NBL(curNbl);
> +        NET_BUFFER_LIST_NEXT_NBL(curNbl) = NULL;
> +
> +        PNET_BUFFER firstNb = NET_BUFFER_LIST_FIRST_NB(curNbl);
> +
> +        /* update vport stats */
> +        sourceVPort->stats.rxPackets++;
> +        sourceVPort->stats.rxBytes += NET_BUFFER_DATA_LENGTH(firstNb);
> +
> +        /* process NB */
> +        OvsProcessOneNb(curNbl, switchContext, sourceVPort, completionList,
> +                        sendFlags, missedPackets, countMissedPackets);
> +    }
> +
> +    NdisReleaseRWLock(switchContext->dispatchLock, &lockState);
> +
> +Cleanup:
> +    if (NT_ERROR(status)) {
> +        OvsAddPktCompletionList(completionList, TRUE, hvSourcePortId, origNbl,
> +                                0, &filterReason);
> +    }
> }
> 
> 
> @@ -340,9 +399,17 @@ OvsExtSendNBL(NDIS_HANDLE filterModuleContext,
>     POVS_SWITCH_CONTEXT switchContext;
>     switchContext = (POVS_SWITCH_CONTEXT) filterModuleContext;
> 
> +    PNET_BUFFER_LIST nextNbl = NULL;
> +    ULONG sendCompleteFlags;
> +    LIST_ENTRY missedPackets;
> +    UINT32 countMissedPackets = 0;
> +    OvsCompletionList completionList;
> +
> +    sendCompleteFlags = OvsGetSendCompleteFlags(sendFlags);
> +    sendFlags |= NDIS_SEND_FLAGS_SWITCH_DESTINATION_GROUP;
> +
>     if (switchContext->dataFlowState == OvsSwitchPaused) {
>         NDIS_STRING filterReason;
> -        ULONG sendCompleteFlags = OvsGetSendCompleteFlags(sendFlags);
> 
>         RtlInitUnicodeString(&filterReason,
>                              L"Switch state PAUSED, drop on ingress.");
> @@ -354,7 +421,22 @@ OvsExtSendNBL(NDIS_HANDLE filterModuleContext,
> 
>     ASSERT(switchContext->dataFlowState == OvsSwitchRunning);
> 
> -    OvsStartNBLIngress(switchContext, netBufferLists, sendFlags);
> +    InitializeListHead(&missedPackets);
> +    OvsInitCompletionList(&completionList, switchContext, sendCompleteFlags);
> +
> +    OVS_NBL_FOR_EACH_NEXT(curNbl, netBufferLists, nextNbl) {
> +        nextNbl = curNbl->Next;
> +        curNbl->Next = NULL;
> +
> +        OvsProcessOneNbl(curNbl, switchContext, sendFlags, &completionList,
> +                         &missedPackets, &countMissedPackets);
> +    }
> +
> +    /* Queue the missed packets. */
> +    OvsQueuePackets(OVS_DEFAULT_PACKET_QUEUE, &missedPackets,
> +                    countMissedPackets);
> +    /* Complete all packets that were added to the completion list */
> +    OvsFinalizeCompletionList(&completionList);
> }
> 
> static VOID
> @@ -382,7 +464,8 @@ OvsCompleteNBLIngress(POVS_SWITCH_CONTEXT switchContext,
> 
>     /* Complete the NBL's that were sent by the upper layer. */
>     if (newList.dropNbl != NULL) {
> -        NdisFSendNetBufferListsComplete(switchContext->NdisFilterHandle, newList.dropNbl,
> +        NdisFSendNetBufferListsComplete(switchContext->NdisFilterHandle,
> +                                        newList.dropNbl,
>                                         sendCompleteFlags);
>     }
> }
> diff --git a/datapath-windows/ovsext/Tunnel.c b/datapath-windows/ovsext/Tunnel.c
> index 2e7da10..d52d788 100644
> --- a/datapath-windows/ovsext/Tunnel.c
> +++ b/datapath-windows/ovsext/Tunnel.c
> @@ -227,6 +227,7 @@ OvsInjectPacketThroughActions(PNET_BUFFER_LIST pNbl,
>     OVS_DATAPATH *datapath = &gOvsSwitchContext->datapath;
> 
>     ASSERT(gOvsSwitchContext);
> +    ASSERT(pNbl->FirstNetBuffer->Next == NULL);
> 
>     /* Fill the tunnel key */
>     status = OvsSlowPathDecapVxlan(pNbl, &tunnelKey);
> @@ -306,8 +307,8 @@ OvsInjectPacketThroughActions(PNET_BUFFER_LIST pNbl,
>             datapath->hits++;
> 
>             OvsActionsExecute(gOvsSwitchContext, &completionList, pNbl,
> -                            portNo, SendFlags, &key, &hash, &layers,
> -                            flow->actions, flow->actionsLen);
> +                              portNo, SendFlags, &key, &hash, &layers,
> +                              flow->actions, flow->actionsLen);
> 
>             OvsReleaseDatapath(datapath, &dpLockState);
>         } else {
> diff --git a/datapath-windows/ovsext/User.c b/datapath-windows/ovsext/User.c
> index 612a4bd..358706e 100644
> --- a/datapath-windows/ovsext/User.c
> +++ b/datapath-windows/ovsext/User.c
> @@ -360,10 +360,15 @@ OvsExecuteDpIoctl(PVOID inputBuffer,
>         fwdDetail->SourcePortId = NDIS_SWITCH_DEFAULT_PORT_ID;
>         fwdDetail->SourceNicIndex = 0;
>     }
> -    // XXX: Figure out if any of the other members of fwdDetail need to be set.
> +    /*
> +     * XXX: Figure out if any of the other members of fwdDetail need to be
> +     * set.
> +     */
> +
> +    ASSERT(pNbl->FirstNetBuffer->Next == NULL);
> 
>     ndisStatus = OvsExtractFlow(pNbl, fwdDetail->SourcePortId, &key, &layers,
> -                              NULL);
> +                                NULL);
>     if (ndisStatus == NDIS_STATUS_SUCCESS) {
>         ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
>         NdisAcquireRWLockRead(gOvsSwitchContext->dispatchLock, &lockState,
> @@ -763,6 +768,11 @@ OvsQueuePackets(UINT32 queueId,
> 
>     OVS_LOG_LOUD("Enter: queueId %u, numELems: %u",
>                   queueId, numElems);
> +
> +    if (numElems == 0) {
> +        return;
> +    }
> +
>     if (queue == NULL) {
>         goto cleanup;
>     }
> @@ -808,6 +818,7 @@ cleanup:
>  *
>  *  This function would fragment packet if needed, and queue
>  *  each segment to user space.
> + *  The NBL is expected to have one or more NBs.
>  *----------------------------------------------------------------------------
>  */
> NTSTATUS
> diff --git a/datapath-windows/ovsext/Vxlan.c b/datapath-windows/ovsext/Vxlan.c
> index 3a1291c..4cdbfe4 100644
> --- a/datapath-windows/ovsext/Vxlan.c
> +++ b/datapath-windows/ovsext/Vxlan.c
> @@ -115,6 +115,8 @@ OvsDoEncapVxlan(PNET_BUFFER_LIST curNbl,
>     UINT32 headRoom = OvsGetVxlanTunHdrSize();
>     UINT32 packetLength;
> 
> +    ASSERT(curNbl->FirstNetBuffer->Next == NULL);
> +
>     /*
>      * XXX: the assumption currently is that the NBL is owned by OVS, and
>      * headroom has already been allocated as part of allocating the NBL and
> @@ -286,6 +288,8 @@ OvsIpHlprCbVxlan(PNET_BUFFER_LIST curNbl,
>     NDIS_STATUS status;
>     UNREFERENCED_PARAMETER(inPort);
> 
> +    ASSERT(curNbl->FirstNetBuffer->Next == NULL);
> +
>     status = OvsExtractFlow(curNbl, inPort, &key, &layers, NULL);
>     if (result == STATUS_SUCCESS) {
>         status = OvsDoEncapVxlan(curNbl, tunKey, fwdInfo, &layers,
> @@ -456,6 +460,7 @@ OvsSlowPathDecapVxlan(const PNET_BUFFER_LIST packet,
>     OVS_PACKET_HDR_INFO layers;
> 
>     layers.value = 0;
> +    ASSERT(packet->FirstNetBuffer->Next == NULL);
> 
>     do {
>         nh = OvsGetIp(packet, layers.l3Offset, &ip_storage);
> -- 
> 1.8.3.msysgit.0
> 

Thanks,
-- Nithin




More information about the dev mailing list