[ovs-dev] [PATCH v3] datapath-windows Event read handler

Eitan Eliahu eliahue at vmware.com
Fri Sep 26 17:34:15 UTC 2014


Thanks for the review Nithin.
" new blank line at EOF."
Thanks, will remove.

" Also, I am not very convinced that we cannot reuse the READ_IOCTL, but I am ok with your approach provided the end-to-end (including userspace code) looks simpler overall. Maybe including the userspace code would make things clearer"
I considered to user a single READ ioctl for all the three types of the socket {Dump, Event, Packet}. However, I'm quite convinced that maintaining explicit socket type would make the code much clearer and modular than trying to infer the socket type using indirect instance variables.

" Do we need to call it OVS_VPORT_CMD_NOTIFY? If we can call it with a generic name such as OVS_CTRL_CMD_NOTIFY, we can make this as part of the command extensions in OvsDpInterfaceExt.h. That way we cam avoid sneaking in commands in the standard interface to the datapath."
In fact this is a Vport command. The only difference is that this command is sent from kernel to usersapce rather than vice versa (please refer to the Linux datapath) . As we have a single function table for botth directions (user->kernel and kernel->user) we need to add it as an additional command. 
We could add this command to the generic Windows family but logically that won't make sense as it is a Vport command.


"The handler function does not seem to be specific to the VPORT event.  Also, I don't see this function listed as a handler for any of the commands. Is that coming in a subsequent patch?"
I didn't want to stub the Vport handlers as there is another pending patch from Sam which has the implementation. I will add the handler once the that patch is applied.

". Maybe including the userspace code would make things clearer."
I almost done with the user mode counterpart. Will post a patch soon.

"The other values in the OVS_MESSAGE are not being set. Is it not required? At the minimum, the dpif index should be set I think."
This is not a real NL message. The only usage it for it is to get the handler from the handler table. The dpif index (as we have only one) is assigned datapath object in the driver.

" Do we want to include this as part of the VPORT_FAMILY? IMO, this should be the OVS_WIN_NL_CTRL_FAMILY_ID since event subscribe etc commands are on the OVS_WIN_CONTROL_FAMILY family."
Actually Events are Vport commands sent from Kernel to usersapce. The Event notification which is totally unrelated belongs to the the OVS_WIN_NL_CTRL_FAMILY_ID

" If there's a failure, there's a chance that *replyLen will be set to a value > 0. This might be OK, but most of the other code sets *replyLen to a value > 0, only when there's success.:
Will do.

" Cool, you have this enforcement. We reuse this and demultiplex the READ_IOCTL."
This is the reason we need to have explicit socket type :-)

Thanks!
Eitan.

Other comments are minor.

-----Original Message-----
From: Nithin Raju 
Sent: Friday, September 26, 2014 8:57 AM
To: Eitan Eliahu
Cc: <dev at openvswitch.org>
Subject: Re: [ovs-dev] [PATCH v3] datapath-windows Event read handler

Eitan,
I get the following error when I try to apply this patch. Looks like there's a new line in build-aux/extract-odp-netlink-windows-dp-h.

---
[nithin at pa-dbc1122 openvswitch-review]$ git am < ../tmp/ovs-dev-v3-datapath-windows-Event-read-handler.patch
Applying: datapath-windows Event read handler
/dbc/pa-dbc1122/nithin/nithin-ovs-int-1/openvswitch-review/.git/rebase-apply/patch:13: new blank line at EOF.
+
warning: 1 line adds whitespace errors.
---

Also, I am not very convinced that we cannot reuse the READ_IOCTL, but I am ok with your approach provided the end-to-end (including userspace code) looks simpler overall. Maybe including the userspace code would make things clearer.

Other comments are minor.

Thanks for working on this esp. when not everything is ironed out :)

On Sep 26, 2014, at 5:03 AM, Eitan Eliahu <eliahue at vmware.com>
 wrote:

> The Read event handler is executed when user mode issues a socket 
> receive on an MC socket associated with the event queue. A new IOCTL 
> READ command is used to differentiate between transaction based and packet miss sockets.
> An entry for the handler will be added once the Vport family 
> implementation checked in.
> User mode code for setting the socket type will follow
> 
> Signed-off-by: Eitan Eliahu <eliahue at vmware.com>
> ---
> build-aux/extract-odp-netlink-windows-dp-h   |   4 +
> datapath-windows/include/OvsDpInterfaceExt.h |  10 +-
> datapath-windows/ovsext/Datapath.c           | 150 ++++++++++++++++++++++++++-
> datapath-windows/ovsext/Event.c              |  42 ++++++++
> datapath-windows/ovsext/Event.h              |   2 +
> 5 files changed, 205 insertions(+), 3 deletions(-)
> 
> diff --git a/build-aux/extract-odp-netlink-windows-dp-h 
> b/build-aux/extract-odp-netlink-windows-dp-h
> index 70514cb..82d95f1 100755
> --- a/build-aux/extract-odp-netlink-windows-dp-h
> +++ b/build-aux/extract-odp-netlink-windows-dp-h
> @@ -23,3 +23,7 @@ s,#include <linux/if_ether\.h>,\n#ifndef 
> ETH_ADDR_LEN \
> 
> # Use OVS's own ETH_ADDR_LEN instead of Linux-specific ETH_ALEN.
> s/ETH_ALEN/ETH_ADDR_LEN/
> +
> +s/OVS_VPORT_CMD_GET,/OVS_VPORT_CMD_GET,\
> +	OVS_VPORT_CMD_NOTIFY,/

Do we need to call it OVS_VPORT_CMD_NOTIFY? If we can call it with a generic name such as OVS_CTRL_CMD_NOTIFY, we can make this as part of the command extensions in OvsDpInterfaceExt.h. That way we cam avoid sneaking in commands in the standard interface to the datapath.


> +
> diff --git a/datapath-windows/include/OvsDpInterfaceExt.h 
> b/datapath-windows/include/OvsDpInterfaceExt.h
> index 64fd20e..be1e49f 100644
> --- a/datapath-windows/include/OvsDpInterfaceExt.h
> +++ b/datapath-windows/include/OvsDpInterfaceExt.h
> @@ -34,11 +34,17 @@
> #define OVS_IOCTL_READ \
>     CTL_CODE (OVS_IOCTL_DEVICE_TYPE, OVS_IOCTL_START + 0x0, METHOD_OUT_DIRECT,\
>               FILE_READ_ACCESS)
> +#define OVS_IOCTL_READ_PACKET \
> +    CTL_CODE (OVS_IOCTL_DEVICE_TYPE, OVS_IOCTL_START + 0x1, METHOD_OUT_DIRECT, \
> +              FILE_READ_ACCESS)
> +#define OVS_IOCTL_READ_EVENT \
> +    CTL_CODE (OVS_IOCTL_DEVICE_TYPE, OVS_IOCTL_START + 0x2, METHOD_OUT_DIRECT, \
> +              FILE_READ_ACCESS)
> #define OVS_IOCTL_WRITE \
> -    CTL_CODE (OVS_IOCTL_DEVICE_TYPE, OVS_IOCTL_START + 0x1, METHOD_IN_DIRECT,\
> +    CTL_CODE (OVS_IOCTL_DEVICE_TYPE, OVS_IOCTL_START + 0x3, 
> + METHOD_IN_DIRECT,\
>               FILE_READ_ACCESS)
> #define OVS_IOCTL_TRANSACT \
> -    CTL_CODE (OVS_IOCTL_DEVICE_TYPE, OVS_IOCTL_START + 0x2, METHOD_OUT_DIRECT,\
> +    CTL_CODE (OVS_IOCTL_DEVICE_TYPE, OVS_IOCTL_START + 0x4, 
> + METHOD_OUT_DIRECT,\
>               FILE_WRITE_ACCESS)
> 
> /*
> diff --git a/datapath-windows/ovsext/Datapath.c 
> b/datapath-windows/ovsext/Datapath.c
> index 0dfdd57..3102e66 100644
> --- a/datapath-windows/ovsext/Datapath.c
> +++ b/datapath-windows/ovsext/Datapath.c
> @@ -99,7 +99,8 @@ static NetlinkCmdHandler OvsGetPidCmdHandler,
>                          OvsGetDpCmdHandler,
>                          OvsPendEventCmdHandler,
>                          OvsSubscribeEventCmdHandler,
> -                         OvsSetDpCmdHandler;
> +                         OvsSetDpCmdHandler,
> +                         OvsReadEventCmdHandler;

The handler function does not seem to be specific to the VPORT event.

Also, I don't see this function listed as a handler for any of the commands. Is that coming in a subsequent patch?

> 
> static NTSTATUS HandleGetDpTransaction(POVS_USER_PARAMS_CONTEXT usrParamsCtx,
>                                        UINT32 *replyLen); @@ -620,6 
> +621,29 @@ OvsDeviceControl(PDEVICE_OBJECT deviceObject,
>         devOp = OVS_TRANSACTION_DEV_OP;
>         break;
> 
> +    case OVS_IOCTL_READ_EVENT:
> +        /* This IOCTL is used to read events */
> +        if (outputBufferLen != 0) {
> +            status = MapIrpOutputBuffer(irp, outputBufferLen,
> +                                        sizeof *ovsMsg, &outputBuffer);
> +            if (status != STATUS_SUCCESS) {
> +                goto done;
> +            }
> +            ASSERT(outputBuffer);
> +        } else {
> +            status = STATUS_NDIS_INVALID_LENGTH;
> +            goto done;
> +        }
> +        inputBuffer = NULL;
> +        inputBufferLen = 0;
> +
> +        ovsMsg = &ovsMsgReadOp;
> +        ovsMsg->nlMsg.nlmsgType = OVS_WIN_NL_VPORT_FAMILY_ID;
> +        /* An "artificial" command so we can use NL family function table*/
> +        ovsMsg->genlMsg.cmd = OVS_VPORT_CMD_NOTIFY;

The other values in the OVS_MESSAGE are not being set. Is it not required? At the minimum, the dpif index should be set I think.

> +        devOp = OVS_READ_DEV_OP;
> +        break;
> +

IMO, we should be able to read events using the OVS_IOCTL_READ operation itself. I had implemented this for transactions using a write-read model, but eventually did not send it out since we ended up using TRANSACTION ioctl. As you can expect, the code looked something like this in the READ_IOCTL:

        /* Check if there are any transaction replies queued up. */
        if (!IsListEmpty(&instance->transactionReply)) {
            /* Code to Send up the reply. */
            goto done;
        }

        /* Check we are in the middle of a dump operation. */
        if (instance->dumpState.ovsMsg == NULL) {
        }

As you know, events, packets and dump operations are mutually exclusive. It is straight forward to have some enforcements such as:
1. A descriptor that has subscribed to OVS_CTRL_CMD_MC_SUBSCRIBE_REQ cannot start a dump operation.
2. While in the middle of a dump operation, a descriptor cannot call into: OVS_CTRL_CMD_MC_SUBSCRIBE_REQ.
3. A descriptor that has subscribed to OVS_CTRL_CMD_MC_SUBSCRIBE_REQ cannot receive packets.
4. A descriptor that has subscribed to receiving packets, cannot also have an event queue associated with it using OVS_CTRL_CMD_MC_SUBSCRIBE_REQ.

All this said, I am ok with the changes you have made provided the end-to-end (including userspace code) looks simpler overall. Maybe including the userspace code would make things clearer.

>     case OVS_IOCTL_READ:
>         /* Output buffer is mandatory. */
>         if (outputBufferLen != 0) {
> @@ -924,6 +948,7 @@ OvsPendEventCmdHandler(POVS_USER_PARAMS_CONTEXT usrParamsCtx,
>     return status;
> }
> 
> +
> /*
>  * 
> ----------------------------------------------------------------------
> ----
>  *  Handler for the subscription for the event queue @@ -1214,4 
> +1239,127 @@ MapIrpOutputBuffer(PIRP irp,
>     return STATUS_SUCCESS;
> }
> 
> +/*
> + * 
> +---------------------------------------------------------------------
> +-----
> + * Utility function to fill up information about the state of a port 
> +in a reply
> + * to* userspace.
> + * Assumes that 'gOvsCtrlLock' lock is acquired.
> + * 
> +---------------------------------------------------------------------
> +-----
> + */
> +static NTSTATUS
> +OvsPortFillInfo(POVS_USER_PARAMS_CONTEXT usrParamsCtx,
> +                POVS_EVENT_ENTRY eventEntry,
> +                PNL_BUFFER nlBuf)
> +{
> +    NTSTATUS status;
> +    BOOLEAN rc;
> +    OVS_MESSAGE msgOutTmp;
> +    PNL_MSG_HDR nlMsg;
> +    POVS_VPORT_ENTRY vport;
> +
> +    ASSERT(NlBufAt(nlBuf, 0, 0) != 0 && nlBuf->bufRemLen >= sizeof 
> + msgOutTmp);
> +
> +    msgOutTmp.nlMsg.nlmsgType = OVS_WIN_NL_VPORT_FAMILY_ID;

Do we want to include this as part of the VPORT_FAMILY? IMO, this should be the OVS_WIN_NL_CTRL_FAMILY_ID since event subscribe etc commands are on the OVS_WIN_CONTROL_FAMILY family.

> +    msgOutTmp.nlMsg.nlmsgFlags = 0;  /* XXX: ? */
> +    msgOutTmp.nlMsg.nlmsgSeq = 0; /* driver intiated messages should 
> + have */

The comment seems incomplete.

> +    msgOutTmp.nlMsg.nlmsgPid = usrParamsCtx->ovsInstance->pid;
> +
> +    msgOutTmp.genlMsg.version = nlVportFamilyOps.version;
> +    msgOutTmp.genlMsg.reserved = 0;
> +
> +    /* we don't have netdev yet, treat link up/down a adding/removing a port*/
> +    if (eventEntry->status & (OVS_EVENT_LINK_UP | OVS_EVENT_CONNECT)) {
> +        msgOutTmp.genlMsg.cmd = OVS_VPORT_CMD_NEW;
> +    } else if (eventEntry->status &
> +             (OVS_EVENT_LINK_DOWN | OVS_EVENT_DISCONNECT)) {
> +        msgOutTmp.genlMsg.cmd = OVS_VPORT_CMD_DEL;
> +    } else {
> +        ASSERT(FALSE);
> +        return STATUS_UNSUCCESSFUL;
> +    }

I am not sure how the end behavior would be here if userspace sees a OVS_VPORT_CMD_DEL, but sees the vport in vport dump. I agree that once we have netdev, userspace can have more useful information at hand to detect changes. Until then, I think we should just ignore LINK_UP and LINK_DOWN and not treat them as CMD_NEW and CMD_DEL respectively. Esp. given that userspace has no way to detect the link state today. In other words, we should just post OVS_EVENT_CONNECT as OVS_VPORT_CMD_NEW, and OVS_EVENT_DISCONNECT as OVS_VPORT_CMD_DEL and leave it at that. OVS_EVENT_LINK_UP and OVS_EVENT_LINK_DOWN can be ignored, and no events should be posted.

> +    msgOutTmp.ovsHdr.dp_ifindex = gOvsSwitchContext->dpNo;
> +
> +    rc = NlMsgPutHead(nlBuf, (PCHAR)&msgOutTmp, sizeof msgOutTmp);
> +    if (!rc) {
> +        status = STATUS_INVALID_BUFFER_SIZE;
> +        goto cleanup;
> +    }
> +
> +    vport = OvsFindVportByPortNo(gOvsSwitchContext, eventEntry->portNo);
> +    if (!vport) {
> +        status = STATUS_DEVICE_DOES_NOT_EXIST;
> +        goto cleanup;
> +    }
> +
> +    rc = NlMsgPutTailU32(nlBuf, OVS_VPORT_ATTR_PORT_NO, eventEntry->portNo) ||
> +         NlMsgPutTailU32(nlBuf, OVS_VPORT_ATTR_TYPE, vport->ovsType) ||
> +         NlMsgPutTailString(nlBuf, OVS_VPORT_ATTR_NAME, vport->ovsName);
> +    if (!rc) {
> +        status = STATUS_INVALID_BUFFER_SIZE;
> +        goto cleanup;
> +    }
> +
> +    /* XXXX Should we add the port stats attributes?*/
> +    nlMsg = (PNL_MSG_HDR)NlBufAt(nlBuf, 0, 0);
> +    nlMsg->nlmsgLen = NlBufSize(nlBuf);
> +    status = STATUS_SUCCESS;
> +
> +cleanup:
> +    return status;
> +}
> +
> +
> +/*
> + * 
> +---------------------------------------------------------------------
> +-----
> + * Handler for reading events from the driver event queue. This 
> +handler is
> + * executed when user modes issues a socket receive on a socket 
> +assocaited
> + * with the MC group for events.
> + * XXX user mode should read multiple events in one system call
> + * 
> +---------------------------------------------------------------------
> +-----
> + */
> +static NTSTATUS
> +OvsReadEventCmdHandler(POVS_USER_PARAMS_CONTEXT usrParamsCtx,
> +                       UINT32 *replyLen) { #ifdef DBG
> +    POVS_MESSAGE msgOut = (POVS_MESSAGE)usrParamsCtx->outputBuffer;
> +    POVS_OPEN_INSTANCE instance =
> +        (POVS_OPEN_INSTANCE)usrParamsCtx->ovsInstance;
> +#endif
> +    NL_BUFFER nlBuf;
> +    NTSTATUS status;
> +    OVS_EVENT_ENTRY eventEntry;
> +
> +    ASSERT(usrParamsCtx->devOp == OVS_READ_DEV_OP);
> +
> +    /* Should never read events with a dump socket */
> +    ASSERT(instance->dumpState.ovsMsg == NULL);

Cool, you have this enforcement. We reuse this and demultiplex the READ_IOCTL.

> +
> +    /* Must have an event queue */
> +    ASSERT(instance->eventQueue != NULL);
> +
> +    /* Output buffer has been validated while validating read dev op. */
> +    ASSERT(msgOut != NULL && usrParamsCtx->outputLength >= sizeof 
> + *msgOut);
> +
> +    NlBufInit(&nlBuf, usrParamsCtx->outputBuffer, 
> + usrParamsCtx->outputLength);
> +
> +    OvsAcquireCtrlLock();
> +    if (!gOvsSwitchContext) {
> +        status = STATUS_SUCCESS;
> +        goto cleanup;
> +    }
> +
> +    /* remove an event entry from the event queue */
> +    status = OvsRemoveEventEntry(usrParamsCtx->ovsInstance, &eventEntry);
> +    if (status != STATUS_SUCCESS) {
> +        goto cleanup;
> +    }
> +
> +    status = OvsPortFillInfo(usrParamsCtx, &eventEntry, &nlBuf);
> +
> +cleanup:
> +    OvsReleaseCtrlLock();
> +    *replyLen = NlBufSize(&nlBuf);;

If there's a failure, there's a chance that *replyLen will be set to a value > 0. This might be OK, but most of the other code sets *replyLen to a value > 0, only when there's success.


> +    return status;
> +}
> #endif /* OVS_USE_NL_INTERFACE */
> diff --git a/datapath-windows/ovsext/Event.c 
> b/datapath-windows/ovsext/Event.c index fec3485..467771d 100644
> --- a/datapath-windows/ovsext/Event.c
> +++ b/datapath-windows/ovsext/Event.c
> @@ -292,6 +292,7 @@ done_event_subscribe:
>     return status;
> }
> 
> +#if defined OVS_USE_NL_INTERFACE && OVS_USE_NL_INTERFACE == 0
> /*
>  * 
> ----------------------------------------------------------------------
> ----
>  * Poll event queued in the event queue. always synchronous.
> @@ -376,6 +377,7 @@ event_poll_done:
>     OVS_LOG_TRACE("Exit: numEventPolled: %d", numEntry);
>     return STATUS_SUCCESS;
> }
> +#endif /* OVS_USE_NL_INTERFACE */
> 
> 
> /*
> @@ -494,3 +496,43 @@ OvsWaitEventIoctl(PIRP irp,
>     OVS_LOG_TRACE("Exit: return status: %#x", status);
>     return status;
> }
> +
> +/*
> + 
> +*--------------------------------------------------------------------
> +------
> + * Poll event queued in the event queue.always synchronous.
> + *
> + * Results:
> + *     STATUS_SUCCESS event was dequeued
> + *     STATUS_UNSUCCESSFUL the queue is empty.
> + * 
> +---------------------------------------------------------------------
> +-----
> + */
> +NTSTATUS
> +OvsRemoveEventEntry(POVS_OPEN_INSTANCE instance,
> +                    POVS_EVENT_ENTRY entry) {
> +    NTSTATUS status = STATUS_UNSUCCESSFUL;
> +    POVS_EVENT_QUEUE queue;
> +    POVS_EVENT_QUEUE_ELEM elem;
> +
> +    OvsAcquireEventQueueLock();
> +
> +    queue = (POVS_EVENT_QUEUE)instance->eventQueue;
> +
> +    if (queue == NULL) {
> +        ASSERT(queue);
> +        goto remove_event_done;
> +    }
> +
> +    if (queue->numElems) {
> +        elem = (POVS_EVENT_QUEUE_ELEM)RemoveHeadList(&queue->elemList);
> +        entry->portNo = elem->portNo;
> +        entry->status = elem->status;
> +        OvsFreeMemory(elem);
> +        queue->numElems--;
> +        status = STATUS_SUCCESS;
> +    }
> +
> +remove_event_done:
> +    OvsReleaseEventQueueLock();
> +    return status;
> +}
> diff --git a/datapath-windows/ovsext/Event.h 
> b/datapath-windows/ovsext/Event.h index f4801b9..a43a0bb 100644
> --- a/datapath-windows/ovsext/Event.h
> +++ b/datapath-windows/ovsext/Event.h
> @@ -47,4 +47,6 @@ NTSTATUS OvsPollEventIoctl(PFILE_OBJECT fileObject, PVOID inputBuffer,
>                            UINT32 outputLength, UINT32 *replyLen); 
> NTSTATUS OvsWaitEventIoctl(PIRP irp, PFILE_OBJECT fileObject,
>                            PVOID inputBuffer, UINT32 inputLength);
> +NTSTATUS OvsRemoveEventEntry(PVOID instance, POVS_EVENT_ENTRY entry);
> +
> #endif /* __EVENT_H_ */
> --
> 1.9.4.msysgit.0
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://urldefense.proofpoint.com/v1/url?u=http://openvswitch.org/mail
> man/listinfo/dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=ubrOpIWavCMqX4l
> 4j1LEVpTfDj%2FD5Qyn8KCoJIBGvzo%3D%0A&m=D32NoB29antmSYUS0H7v1fJSyW72eHV
> WreqXKIjegmM%3D%0A&s=2f507d5bb59ce07b57300f7259baa0d2159b4015c2fbb8167
> 7205ddbadb85ed1

Thanks,
-- Nithin




More information about the dev mailing list