[ovs-dev] Design notes for provisioning Netlink interface from the OVS Windows driver (Switch extension)

Samuel Ghinet sghinet at cloudbasesolutions.com
Fri Aug 8 15:27:39 UTC 2014


Hello Eitan,

[QUOTE]Transaction based DPIF primitives are mapped into synchronous  device I/O control system calls.
The NL reply would be returned in the output buffer of the IOCTL parameter.[/QUOTE]
I am still confused. You spoke in the design file about "nl_sock_transact_multiple", which could be implemented as ReadFile and WriteFile. And that "these DPIF commands are mapped to nl_sock_transact NL interface to  nl_sock_transact_multiple."
Do you mean we will no longer use nl_sock_transact_multiple in userspace for these DPIF transactions?

[QUOTE]>You mean, whenever, say, a Flow dump request is issued, in one reply to give back all flows?
Not necessarily. I meant that the driver does not have to maintain the state of the dump command.
Each dump command sent down to the driver would be self-contained. [/QUOTE]
We currently have this in our implementation. The only thing 'left' would be the fact that we provide all the output buffer for dump at once. The userspace can read sequentially from it. Unless there is a reason to write sequentially from the kernel to the userspace, and wait for the userspace to read, I think that how we have this one is ok.

[QUOTE]Yes, these are OVS events that are placed in a custom queue.
There is a single Operating System event associated with the global socket which collects all OVS events.
It will be triggered through a completion of a pending I/O request in the driver.[/QUOTE]
I used to be a bit confused of your implementation in OvsEvent and OvsUser. Perhaps this discussion would
clarify a bit more things. :)
Ok, so we'll hold OVERLAPPED structs in the kernel, as events. What kind of IRP requests would be returned as "pending" in the kernel? Requests coming as "nl_sock_recv()" on the multicast groups?
Will there be multiple multicast groups used? or all multicast operations would queue events on the same event queue, where all the events are read from the same part of code in userspace?

How exactly are events queued by the kernel associated with the userspace? I mean, how do you register a "nic connected" event so that when an event happens, you know you need to update userspace data for a nic, not do something else. Would there be IDs stored in the OvsEvent structs that would specify what kind of events they are? Would we also need context data associated with these events?

[QUOTE]>However, I think we need to take into account the situation where the userspace might be providing a smaller buffer than it is the total to read. Also, I think the "dump" mechanism requires it.
I (want) to assume that each transaction is self-contained which means that the driver should not maintain a state of the transaction. Since, we will be using an IOCTL for that transaction the user mode buffer length will be specified in the command itself.
All Write/Read dump pairs are replaced with a single IOCTL call.[/QUOTE]
That still did not answer my question :)
You mean to use a very large read buffer, so that you would be able to read all in one single operation? I am more concerned here about flow dumps, because you may not know whether you need an 1024 bytes buffer or an 10240 byes buffer, or an 102400 bytes buffer, or etc.
So I do not see how a DeviceIoControl operation could do both the 'write' and the 'read' part for the dump.
If you pass to the DeviceIoControl a buffer length = 8000, and the flow dump reply buffer is 32000 bytes, you need to do additional reads AND maintain state in the kernel (e.g. offset in the kernel read buffer).

[QUOTE]As I understand transactions and dump are (as used for DPIF) are not really socket operation per se.[/QUOTE]
They are file / device operations.

[QUOTE]o) I believe we shouldn't use the netlink overhead (nlmsghdr, genlmsghdr, attributes) when not needed (say, when registering a KEVENT notification) , and, if w>e choose not to use netlink protocol always, we may need a way to differentiate between netlink and non-netlink requests.
Possible, as phase for optimization[/QUOTE]
Not necessarily: if we can make a clear separation in code between netlink and non-netlink km-um, not using netlink where we don't need to might save us some development & maintainability effort - both in kernel and in userspace. Because otherwise we'd need to turn non-netlink messages of (windows) userspace code into netlink messages.

Sam
________________________________________
From: Eitan Eliahu [eliahue at vmware.com]
Sent: Thursday, August 07, 2014 8:57 PM
To: Alin Serdean; dev at openvswitch.org; Rajiv Krishnamurthy; Ben Pfaff; Kaushik Guha; Ben Pfaff; Justin Pettit; Nithin Raju; Ankur Sharma; Samuel Ghinet; Linda Sun; Keith Amidon
Subject: RE: Design notes for provisioning Netlink interface from the OVS Windows driver (Switch extension)

Hi Alin, yes, we want to exercise the interface when OVS is running. For example we would like to dump the flow table is not empty.

On the other issue (NBL with multiple NBs, Github issue #10) I think we need to talk how to support it.  After you came across this issue we even know how to produce this case :-)
Thanks,
Eitan

-----Original Message-----
From: Alin Serdean [mailto:aserdean at cloudbasesolutions.com]
Sent: Thursday, August 07, 2014 10:50 AM
To: Eitan Eliahu; dev at openvswitch.org; Rajiv Krishnamurthy; Ben Pfaff; Kaushik Guha; Ben Pfaff; Justin Pettit; Nithin Raju; Ankur Sharma; Samuel Ghinet; Linda Sun; Keith Amidon
Subject: RE: Design notes for provisioning Netlink interface from the OVS Windows driver (Switch extension)

Hi Eithan,

Do you have any particular reason to support both devices for start instead of focusing on the Netlink interface?

On the patches progressing a bit slower than expected spent a bit too much time on the issue https://urldefense.proofpoint.com/v1/url?u=https://github.com/openvswitch/ovs-issues/issues/10&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=yTvML8OxA42Jb6ViHe7fUXbvPVOYDPVq87w43doxtlY%3D%0A&m=EXUTxzeugqhErQ2Fi%2BVBW0vsf89O8ECLmGTTV1lNZX0%3D%0A&s=70afc8a789fbcf2c754809a1f9d1246227250b279dd35e740bbb31b34544bc6b but I may have an idea which I would like to talk about in the next meeting. I plan to work in the weekend though to get so we can be one step close to our goal :).

Alin.

-----Mesaj original-----
De la: Eitan Eliahu [mailto:eliahue at vmware.com]
Trimis: Thursday, August 7, 2014 3:19 AM
Către: Alin Serdean; dev at openvswitch.org; Rajiv Krishnamurthy; Ben Pfaff; Kaushik Guha; Ben Pfaff; Justin Pettit; Nithin Raju; Ankur Sharma; Samuel Ghinet; Linda Sun; Keith Amidon
Subiect: RE: Design notes for provisioning Netlink interface from the OVS Windows driver (Switch extension)


Hi Alin,
The driver which is currently checked in (the original one) supports the DPIF interface  through a device object registered with the system. This driver works with a private version of user mode OVS (i.e. dpif-windows.c). The secondary device would be a second device object which supports the Nelink interface. For the initial development phase both devices will be instantiated and registered in the system. Thus, we could bring up all transaction and dump based DPIF commands over the Netlink device while the system is up and running.

For clarity, let's call the "original device" the "DPIF device" and the "secondary device" the "Netlink device".
Eitan

-----Original Message-----
From: Alin Serdean [mailto:aserdean at cloudbasesolutions.com]
Sent: Wednesday, August 06, 2014 4:28 PM
To: Eitan Eliahu; dev at openvswitch.org; Rajiv Krishnamurthy; Ben Pfaff; Kaushik Guha; Ben Pfaff; Justin Pettit; Nithin Raju; Ankur Sharma; Samuel Ghinet; Linda Sun; Keith Amidon
Subject: RE: Design notes for provisioning Netlink interface from the OVS Windows driver (Switch extension)

Hi Eitan,

> C. Implementation work flow:
> The driver creates a device object which provides a NetLink interface
> for user mode processes. During the development phase this device is created in addition to the existing DPIF device. (This means that the bring-up of the NL based user mode can be done on a live kernel with resident DPs, ports and flows) All transaction
> and dump based DPIF functions could be developed and brought up when the NL device is a secondary device (ovs-dpctl show and dump XXX should work). After    > the initial phase is completed (i.e. all transaction and dump based DPIF primitives are implemented), the original device interface will be removed and packet and
> event propagation path will be brought up (driven by vswicth.exe)

Could you, please explain a bit more what does original/secondary device mean?

Ty!
Alin.



More information about the dev mailing list