[ovs-dev] [PATCH v2] datapath-windows: Netlink command: vport dump

Samuel Ghinet sghinet at cloudbasesolutions.com
Tue Sep 23 16:33:38 UTC 2014


Hello Nithin,

Thanks for the review!

> 2. We'll have to get rid of the OvsDumpVportIoctl() sometime. We need not do it in this patch itself.
I think the best time to remove it is when the netlink part is done. We could add some #ifdef so people know it's not used when the netlink device is used.

> I think we should avoid initializing 'nlBuffer' each time and initialize it only once in the caller. Each call to OvsCreateMsgFromVport should continue from the offset it was left off. The caller of OvsCreateMsgFromVport() should be responsible for writing out the OVS_MESSAGE, and OvsCreateMsgFromVport should only write out the VPORT data.
I think you do not understand the functionality here: the dump next operation yields a reply with the very next vport.
this code:
instance->dumpState.index[0] = outBucket;
instance->dumpState.index[1] = outIndex;

updates the dump state: we've got an array of lists, so we need an index into the array (outBucket) and an index into the list therein (outIndex). One index is not enough.
The code does update the indexes correctly.

> The caller of OvsCreateMsgFromVport() should be responsible for writing out the OVS_MESSAGE,
I do not agree with you on OvsCreateMsgFromVport: the name implies that its job is to build an OVS_MESSAGE out of a given vport.

> I think we should avoid initializing 'nlBuffer' each time and initialize it only once in the caller
The dump next operation yields a message with the very next vport. nlBuffer gets initialized maximum 1 time a dump next call.
nlBuffer is being initialized by OvsCreateMsgFromVport; if no vport is found (or we've finished with the last one), nlBuffer will not be initialized.

> > +    NlMsgPutTailUnspec(&nlBuffer, OVS_VPORT_ATTR_STATS, (PCHAR)&vportStats,
> > +              sizeof(OVS_VPORT_FULL_STATS));
> 
> How are you sure that there's enough space in the buffer. NlMsgPutTailUnspec() might return FALSE. This needs to be translated into "end of this dump-next operation".
Yes, checks for the return values of NlMsgPutTailXxx need to be added.

> > +        return STATUS_SUCCESS;
> > +    }
> > +
> > +    msgOut->ovsHdr.dp_ifindex = gOvsSwitchContext->dpNo;
> 
> Shouldn't this come from BuildMsgOutFromMsgIn()?
No. The name states that it builds a msg out from a msg in. The dp_ifindex is not part of a msgin. Besides, it would not make the BuildMsgOutFromMsgIn function generic, because messages such as a netlink error message lack dp_ifindex.

> 2. Also, the format of the message would look something like:
> OVS_MESSAGE - VPORT1 - VPORT2 - VPORT3, etc
The format actually looks like this:
OVS_MESSAGE - VPORT1 (dump next: msg1)
OVS_MESSAGE - VPORT2 (dump next: msg2)
OVS_MESSAGE - VPORT3 (dump next: msg3)

> Why are we breaking out of the loop here?
We are breaking the loop whenever we found a vport as "next". Because we reply with maximum one vport at a time.

> There's one more issue:
> Lets say, we added some ports but there's buffer space left. In such a case, 'i' would be 'OVS_MAX_VPORT_ARRAY_SIZE' - basically, we'd have looked through the entire hash table, but found no ports. In such a case, you negating all the data added in the output buffer by setting *replyLen = 0. The FreeUserDumpState() is fine.
I don't understand... is there an issue or it's fine?

If there is no vport to dump next, this line will be negated:
*replyLen = msgOut->nlMsg.nlmsgLen;
by making *replyLen be 0, so that the userspace knows it's dump done.
I could put an if there, but the point is that we need to set *reply = 0 to let the userspace know that we're at dump done.

> 1. You need to handle the return value of OvsCreateMsgFromVport().
What do you want to be done if a dump next for a given vport failed: cancel all the dump operation (set *replyLen = 0), or try our luck with the next vport (ignore current vport dump next)?

Btw, I've just remembered: time ago we discussed that the index(es) will be managed by the userspace.
The current method (having dump state preserved in the "instance" object in the kernel) makes the dump state be handled by the kernel only, instead.
Should we change our methods back to the original design, or you think it's ok to continue the way we did so far?

Thanks,
Sam

________________________________________
From: Nithin Raju [nithin at vmware.com]
Sent: Friday, September 19, 2014 2:21 AM
To: Samuel Ghinet
Cc: dev at openvswitch.org; Eitan Eliahu; Saurabh Shah; Ankur Sharma; Alin Serdean
Subject: Re: [PATCH v2] datapath-windows: Netlink command: vport dump

hi Samuel,
Thanks for sending out the changes.

Some high level comments are:
1. You might have to rebase your change when Eitan's change gets checked in. He has coalesced the declarations of the command handlers.
2. We'll have to get rid of the OvsDumpVportIoctl() sometime. We need not do it in this patch itself.

I have noted the comments inline. Pls. have a look.

On Sep 17, 2014, at 6:08 AM, Samuel Ghinet <sghinet at cloudbasesolutions.com> wrote:
> /* Netlink vport family. */
> +NETLINK_CMD nlVportFamilyCmdOps[] = {
> +    { OVS_VPORT_CMD_GET, OvsGetVportCmdHandler,
> +    OVS_WRITE_DEV_OP | OVS_READ_DEV_OP, FALSE }

I am 100% sure which is the correct approach. But other commands, have aligned OVS_WRITE_DEV_OP with OVS_VPORT_CMD_GET.

Is the ValidateDpIndex FALSE? I'd think it is TRUE. Pls. see the following code in dpif-linux.c
dpif_linux_port_dump_start__(const struct dpif_linux *dpif,
                             struct nl_dump *dump)
[...]
    request.cmd = OVS_VPORT_CMD_GET;
    request.dp_ifindex = dpif->dp_ifindex;


> +/* Netlink vport family. */
> /* XXX: Add commands here. */

You can get rid of the two lines above. The formatting followed is:

/* Netlink XYS family. */
<Command definitions>
<Family definition>


> @@ -966,6 +976,210 @@ OvsSetupDumpStart(POVS_USER_PARAMS_CONTEXT usrParamsCtx)
> }
>
> /*
> +* XXX: When "dump done" and "error" netlink types will be implemented, we will
> +* need a few more BuildMsgOut functions, which would ultimately call a generic
> +* BuildMsgOutFromMsgIn.
> +*/
> +
> +static VOID
> +BuildMsgOutFromMsgIn(POVS_MESSAGE msgIn, POVS_MESSAGE msgOut, UINT16 flags)
> +{
> +    msgOut->nlMsg.nlmsgType = msgIn->nlMsg.nlmsgType;
> +    msgOut->nlMsg.nlmsgFlags = flags;
> +    msgOut->nlMsg.nlmsgSeq = msgIn->nlMsg.nlmsgSeq;
> +    msgOut->nlMsg.nlmsgPid = msgIn->nlMsg.nlmsgPid;
> +    msgOut->nlMsg.nlmsgLen = sizeof *msgOut;
> +
> +    msgOut->genlMsg.cmd = msgIn->genlMsg.cmd;
> +    msgOut->genlMsg.version = nlDatapathFamilyOps.version;
> +    msgOut->genlMsg.reserved = 0;
> +}
> +
> +static NTSTATUS
> +OvsCreateMsgFromVport(POVS_VPORT_ENTRY vport,
> +                      POVS_MESSAGE msgOut,
> +                      UINT32 maxPayloadSize)
> +{
> +    NL_BUFFER nlBuffer;
> +    OVS_VPORT_FULL_STATS vportStats;
> +
> +    NlBufInit(&nlBuffer, (PCHAR)msgOut + sizeof (*msgOut), maxPayloadSize);

I think we should avoid initializing 'nlBuffer' each time and initialize it only once in the caller. Each call to OvsCreateMsgFromVport should continue from the offset it was left off. The caller of OvsCreateMsgFromVport() should be responsible for writing out the OVS_MESSAGE, and OvsCreateMsgFromVport should only write out the VPORT data.

> +
> +    NlMsgPutTailU32(&nlBuffer, OVS_VPORT_ATTR_PORT_NO, vport->portNo);
> +    NlMsgPutTailU32(&nlBuffer, OVS_VPORT_ATTR_TYPE, vport->ovsType);
> +
> +    NlMsgPutTailString(&nlBuffer, OVS_VPORT_ATTR_NAME, vport->ovsName);
> +
> +    /*
> +    * XXX: when we implement OVS_DP_ATTR_USER_FEATURES in datapath,
> +    * we'll need to check the OVS_DP_F_VPORT_PIDS flag: if it is set,
> +    * it means we have an array of pids, instead of a single pid.
> +    * ATM we assume we have one pid only.
> +    */

Multiple line comments should be formatted as:
/*
 *
 */

and not

/*
*
*
*/

s/ATM/Currently,


> +
> +    NlMsgPutTailU32(&nlBuffer, OVS_VPORT_ATTR_UPCALL_PID, vport->upcallPid);
> +
> +    /*stats*/
> +    vportStats.rxPackets = vport->stats.rxPackets;
> +    vportStats.rxBytes = vport->stats.rxBytes;
> +    vportStats.txPackets = vport->stats.txPackets;
> +    vportStats.txBytes = vport->stats.txBytes;
> +    vportStats.rxErrors = vport->errStats.rxErrors;
> +    vportStats.txErrors = vport->errStats.txErrors;
> +    vportStats.rxDropped = vport->errStats.rxDropped;
> +    vportStats.txDropped = vport->errStats.txDropped;
> +
> +    NlMsgPutTailUnspec(&nlBuffer, OVS_VPORT_ATTR_STATS, (PCHAR)&vportStats,
> +              sizeof(OVS_VPORT_FULL_STATS));

How are you sure that there's enough space in the buffer. NlMsgPutTailUnspec() might return FALSE. This needs to be translated into "end of this dump-next operation".

> +
> +    /*
> +    * XXX: when vxlan udp dest port becomes configurable, we will also need
> +    * to add vport options
> +    */
> +
> +    msgOut->nlMsg.nlmsgLen = NlBufSize(&nlBuffer) + sizeof(OVS_MESSAGE);

I think it is best of the caller of OvsCreateMsgFromVport() wrote out the OVS_MESSAGE like what you are doing, and here you can just do. It is best if the NlBufSize() includes the OVS_MESSAGE.
msgOut->nlMsg.nlmsgLen = NlBufSize(&nlBuffer);


> +
> +    return STATUS_SUCCESS;
> +}
> +
> +static NTSTATUS
> +OvsGetVportDumpNext(POVS_USER_PARAMS_CONTEXT usrParamsCtx,
> +                    UINT32 *replyLen)
> +{
> +    POVS_MESSAGE msgIn;
> +    POVS_MESSAGE msgOut = (POVS_MESSAGE)usrParamsCtx->outputBuffer;
> +    POVS_OPEN_INSTANCE instance =
> +        (POVS_OPEN_INSTANCE)usrParamsCtx->ovsInstance;
> +    LOCK_STATE_EX lockState;
> +    UINT32 i = 0;
> +
> +    /*
> +    * XXX: this function shares some code with other dump command(s).
> +    * In the future, we will need to refactor the dump functions
> +    */
> +
> +    ASSERT(usrParamsCtx->devOp == OVS_READ_DEV_OP);
> +
> +    if (instance->dumpState.ovsMsg == NULL) {
> +        ASSERT(FALSE);
> +        return STATUS_INVALID_DEVICE_STATE;
> +    }
> +
> +    /* Output buffer has been validated while validating read dev op. */
> +    ASSERT(msgOut != NULL && usrParamsCtx->outputLength >= sizeof *msgOut);
> +
> +    msgIn = instance->dumpState.ovsMsg;
> +    BuildMsgOutFromMsgIn(msgIn, msgOut, NLM_F_MULTI);
> +
> +    OvsAcquireCtrlLock();
> +    if (!gOvsSwitchContext) {
> +        /* Treat this as a dump done. */
> +        OvsReleaseCtrlLock();
> +        *replyLen = 0;

We don't need to set *replyLen to 0, since it is initialized to 0 in OvsDeviceControl(). I know there are other occurances of *replyLen = 0. We should get rid of them too at some point.

> +        FreeUserDumpState(instance);


This is an interesting situation that gOvsSwitchContext is NULL while the dump state is setup. This would imply that the dump state was setup while gOvsSwitchContext was != NULL, and before the dump operation could complete gOvsSwitchContext became NULL. The best way to handle such a case is to clear up the dump state in the place where we set gOvsSwitchContext to NULL (in the NDIS detach handler). But, I am fine with this code for now.


> +        return STATUS_SUCCESS;
> +    }
> +
> +    msgOut->ovsHdr.dp_ifindex = gOvsSwitchContext->dpNo;

Shouldn't this come from BuildMsgOutFromMsgIn()?

> +
> +    /*
> +    * XXX: when we implement OVS_DP_ATTR_USER_FEATURES in datapath,
> +    * we'll need to check the OVS_DP_F_VPORT_PIDS flag: if it is set,
> +    * it means we have an array of pids, instead of a single pid.
> +    * ATM we assume we have one pid only.
> +    */
> +
> +    NdisAcquireRWLockRead(gOvsSwitchContext->dispatchLock, &lockState, 0);
> +
> +    if (gOvsSwitchContext->numVports > 0) {
> +        /* inBucket: the bucket, used for lookup */
> +        UINT32 inBucket = instance->dumpState.index[0];
> +        /* inIndex: index within the given bucket, used for lookup */
> +        UINT32 inIndex = instance->dumpState.index[1];
> +        /* the bucket to be used for the next dump operation */
> +        UINT32 outBucket = 0;
> +        /* the index within the outBucket to be used for the next dump */
> +        UINT32 outIndex = 0;
> +        UINT32 maxSize = usrParamsCtx->outputLength;
> +
> +        for (i = inBucket; i < OVS_MAX_VPORT_ARRAY_SIZE; i++) {
> +            PLIST_ENTRY head, link;
> +            head = &(gOvsSwitchContext->portHashArray[i]);
> +            POVS_VPORT_ENTRY vport = NULL;
> +
> +            outIndex = 0;
> +            LIST_FORALL(head, link) {
> +
> +                /*
> +                * if one or more dumps were previously done on this same bucket,
> +                * inIndex will be > 0, so we'll need to reply with the
> +                * inIndex + 1 vport from the bucket.
> +                */
> +                if (outIndex >= inIndex) {
> +                    vport = CONTAINING_RECORD(link, OVS_VPORT_ENTRY, portLink);
> +                    OvsCreateMsgFromVport(vport, msgOut, maxSize);

There's something missing here:
1. You need to handle the return value of OvsCreateMsgFromVport().
2. Also, the format of the message would look something like:
OVS_MESSAGE - VPORT1 - VPORT2 - VPORT3, etc

After writing out the OVS_MESSAGE using BuildMsgOutFromMsgIn(), and VPORT1, I don't see you incrementing the offset when you call into OvsCreateMsgFromVport(). When OvsCreateMsgFromVport() gets called for VPORT2, you are again doing a NlMsgPutTailUnspec() and start writing at the same offset as VPORT1 which will overwrite the data for VPORT1. Also, you should be decrementing the value of 'maxSize' each time you send write out a VPORT. For that matter, 'maxSize' should be initialized with
maxSize = usrParamsCtx->outputLength - sizeof *msgOut.


> +                    ++outIndex;
> +                    break;

Why are we breaking out of the loop here? Consider the following example, where there are 5 ports in the bucket, and 'outIndex' and 'inIndex' are 0. We'll hit the if block, and get into it. After adding a port, we'll break out of the loop. We need to continue till all the 5 ports are added. Basically, you want to do something like:

LIST_FORALL(head, link) {
    if (outIndex < inIndex) {
        /* Skip over all the ports till we get to 'inIndex'. */
        /* Break out of the loop to store outIndex and outBucket if there's no space in outBuffer. */
    } else {
        /* Add the port at outIndex */
    }
    outindex++;
}



> +                }
> +
> +                ++outIndex;
> +            }
> +
> +            if (vport) {
> +                break;
> +            }
> +
> +            /*
> +            * if no vport was found above, check the next bucket, beginning
> +            * with the first (i.e. index 0) elem from within that bucket
> +            */
> +            inIndex = 0;
> +        }
> +
> +        outBucket = i;
> +
> +        /* XXX: what about NLMSG_DONE (as msg type)? */
> +        instance->dumpState.index[0] = outBucket;
> +        instance->dumpState.index[1] = outIndex;
> +    }
> +
> +    NdisReleaseRWLock(gOvsSwitchContext->dispatchLock, &lockState);
> +
> +    OvsReleaseCtrlLock();
> +
> +    *replyLen = msgOut->nlMsg.nlmsgLen;
> +
> +    /* if i is OVS_MAX_VPORT_ARRAY_SIZE => no vport was found => dump done*/
> +    if (i == OVS_MAX_VPORT_ARRAY_SIZE) {
> +        *replyLen = 0;
> +        /* Free up the dump state, since there's no more data to continue. */
> +        FreeUserDumpState(instance);
> +    }

For the case where there are no ports, it could be optimized as:

if (gOvsSwitchContext->numVports > 0) {

} else {
    /* Free up the dump state, since there's no more data to continue. */
    FreeUserDumpState(instance);
    < release locks >
    < return >
}

There's one more issue:
Lets say, we added some ports but there's buffer space left. In such a case, 'i' would be 'OVS_MAX_VPORT_ARRAY_SIZE' - basically, we'd have looked through the entire hash table, but found no ports. In such a case, you negating all the data added in the output buffer by setting *replyLen = 0. The FreeUserDumpState() is fine.


> +
> +    return STATUS_SUCCESS;
> +}
> +
> +/*
> +* --------------------------------------------------------------------------
> +*  Handler for the get vport command. The function handles the initial call to
> +*  setup the dump state, as well as subsequent calls to continue dumping data.
> +* --------------------------------------------------------------------------
> +*/
> +static NTSTATUS
> +OvsGetVportCmdHandler(POVS_USER_PARAMS_CONTEXT usrParamsCtx,
> +                      UINT32 *replyLen)
> +{
> +    if (usrParamsCtx->devOp == OVS_WRITE_DEV_OP) {
> +        *replyLen = 0;

We don't need explicit '*replyLen = 0'.



More information about the dev mailing list