[ovs-dev] Design notes for provisioning Netlink interface from the OVS Windows driver (Switch extension)

Eitan Eliahu eliahue at vmware.com
Wed Aug 6 21:19:50 UTC 2014

Sure Alin, we would be glad to add details or to clarify any issue.

-----Original Message-----
From: Alin Serdean [mailto:aserdean at cloudbasesolutions.com] 
Sent: Wednesday, August 06, 2014 2:12 PM
To: Nithin Raju; Eitan Eliahu
Cc: dev at openvswitch.org; Rajiv Krishnamurthy; Ben Pfaff; Kaushik Guha; Ben Pfaff; Justin Pettit; Ankur Sharma; Samuel Ghinet; Linda Sun; Keith Amidon
Subject: RE: Design notes for provisioning Netlink interface from the OVS Windows driver (Switch extension)

Thank you a lot for writing the document in such a short period. 

I will have it in mind when I will write the patches for dpif. If I have further questions can I get back to you Eitan?


-----Mesaj original-----
De la: Nithin Raju [mailto:nithin at vmware.com]
Trimis: Wednesday, August 6, 2014 10:58 PM
Către: Eitan Eliahu
Cc: dev at openvswitch.org; Rajiv Krishnamurthy; Alin Serdean; Ben Pfaff; Kaushik Guha; Ben Pfaff; Justin Pettit; Ankur Sharma; Samuel Ghinet; Linda Sun; Keith Amidon
Subiect: Re: Design notes for provisioning Netlink interface from the OVS Windows driver (Switch extension)

Thanks so much for writing this up. This should clarify the questions that the folks had during the IRC meeting.

Pls. feel free to send out a writeup if you have anything to discuss regarding the changes in dpif-linux.c. If not, if you can cleanup dpif-linux.c, and submit it with the changes/interface that was working with the Cloudbase kernel implementation, that should also be a major step forward.

We can take up how to make the changes in dpif-linux.c to fit the (efficient) I/O model that Eitan has described.


On Aug 6, 2014, at 11:15 AM, Eitan Eliahu <eliahue at vmware.com>

> Hello all,
> Here is a summary of our initial design. Not all areas are covered so we would be glad  to discuss anything listed here and any other code/features we could leverage.
> Thanks!
> Eitan
> A. Objectives:
> [1] Create a NetLink (NL) driver interface for Windows which interoperates with
>    the OVS NL user mode.
> [2] User mode code should be mostly cross platform with some minimal changes to 
>    support specific Windows OS calls.
> [3] The Driver should not have to maintain a state or resources for transaction
>    or dumps
> [4] Reduce the number of system calls: User mode NL code should use Device IOCTL
>    system call to send an NL commands and to receive the associated NL reply
> 	in the same system call, whenever possible (*).
> [5] An event may be associated with a NL socket I/O request to signal a 
>    completion for an outstanding receive operation on the socket. 
> 	(For simplicity a single outstanding I/O request could be associated with
> 	a socket for the signaling purpose)
> (*) We assume Multiple NL transactions for the same socket can never be 
>    interleaved   
> B. Netlink operation types:
> There are four types of interactions carried by processes through the NL layer:
> [1] Transaction based DPIF primitives: these DPIF commands are mapped to 
>    nl_sock_transact NL interface to  nl_sock_transact_multiple. The transaction 
> 	based command creates an ad hoc socket and submits a synchronous device 
> 	I/O to the driver. The driver constructs the NL reply and copies it to the
> 	output 	buffer of the IRP representing the I/O transaction.
>    (Provisioning of transaction based command can be brought up and exercised 
> 	 through the ovs-dpctl command in parallel to the exsisting DPIF
> device)
> [2] State aware DPIF Dump commands: port and flow dump calls the following NL 
>    interfaces:
>    a) nl_dump_start()
>    b) nl_dump_next()
>    c) nl_dump_done()
> 	With the exception of nl_dump_start these NL primitives are based on a
> 	synchronous	IOCTL system call rather than Write/Read. Thus, the driver
> 	does not have to maintain any dump transaction outstanding request nor 
> 	need to allocate any resources for it.
> [3] UpCall Port/PID/Unicast socket: 
>    The driver maintains per socket queue for all packets which have no 
> 	matching flow in the flow table. The socket has a single overlapped (event)
> 	structure which will be signalled through a completion of a pending I/O 
> 	request sent by user mode on subscription (similar to the current 
> 	implementation). When dpif_recv_wait is called, the event associated with 
>        the pending I/O request is passed poll_fd_wait_event inorder to wake the
>        thread which polls the port queue.
> 	dpif_recv calls nl_socket_recv which in turn drains the queue 
> 	maintained by the kernel in a synchronous fashion (through the use of 
> 	system ioctl call). The overlapped structure is rearmed when the recv_set 
> 	DPIF callback function is called.
> [4] Event notification / NL multicast subscription:
>    An event (such as port addition/deletion link up/down) are propagated from
> 	the kernel to user mode through a subscription of a socket to a multicast 
> 	group (nl_sock_join_mcgroup()) and a synchronous Receive (nl_sock_recv()) 
> 	for retrieving the events. The driver maintains a single event queue for
> 	all events. Similar to the UpCall mechanism, a user mode process keeps an 
> 	outstanding I/O request in the driver which is triggered whenever a new 
> 	event is generated. The event associated with the overlapped structure of
> 	the socket is passed to poll_fd_wait_event() whenever dpif_port_poll_wait()
> 	callback function is called. dpif_poll() will drain the event queue through 
> 	the call of nl_sock_recv().
> C. Implementation work flow:
> The driver creates a device object which provides a NetLink interface 
> for user mode processes. During the development phase this device is 
> created in addition to the existing DPIF device. (This means that the 
> bring-up of the NL based user mode can be done on a live kernel with 
> resident DPs, ports and flows) All transaction and dump based DPIF 
> functions could be developed and brought up when the NL device is a 
> secondary device (ovs-dpctl show and dump XXX should work). After the 
> initial phase is completed (i.e. all transaction and dump based DPIF 
> primitives are implemented), the original device interface will be 
> removed and packet and event propagation path will be brought up 
> (driven by vswicth.exe)
> [1] Socket creation
>    Since PID should be allocated on a system wide basis and unique across all processes, the kernel
>    assigns the PID for a newly created socket. A new IOCTL command OVS_GET_PID returns the PID to a user
>    mode client to be associated with the socket.  
> [2] Detailed description
>    nl_sock_transact_multiple() which calls into a series of nl_sock_send__()
>    and nl_sock_recv__(). These can be implemented using ReadFile() and WriteFile()
>    or an ioctl modeled on a transaction which does both read and write. One thing
>    though is that, nl_sock_transact_multiple() might have to be modified to the
>    series of nl_sock_send__() and nl_sock_recv__(), rather than doing a bunch of
>    sends first and then doing the recvs. This is because Windows may not preserve
>    message boundaries when we do the recv.

More information about the dev mailing list