[ovs-dev] [RFC PATCH] create vxlan device using rtnetlink interface

Wed Apr 20 18:38:31 UTC 2016

On Wed, Apr 20, 2016 at 4:00 AM, Thadeu Lima de Souza Cascardo
<cascardo at redhat.com> wrote:
> On Tue, Apr 19, 2016 at 04:25:32PM -0700, Jesse Gross wrote:
>> On Mon, Apr 18, 2016 at 2:57 AM, Thadeu Lima de Souza Cascardo
>> <cascardo at redhat.com> wrote:
>> > On Fri, Apr 15, 2016 at 08:36:51PM -0700, Jesse Gross wrote:
>> >> On Fri, Apr 15, 2016 at 2:30 PM, Thadeu Lima de Souza Cascardo
>> >> <cascardo at redhat.com> wrote:
>> >> > The other problem is during port_del, that since we don't have a netdev, the
>> >> > dpif_port type would not be vxlan, and deciding whether to remove it or not
>> >> > could not be made. In fact, this is one of the main reasons why this is an RFC.
>> >> >
>> >> > The solution here is to decide on the type by the device's name, and there is
>> >> > even a function for this, though it looks up for the netdev's already created,
>> >> > those that probably come from the database. However, when OVS starts, it removes
>> >> > ports from the datapath, and that information is not available, and may not even
>> >> > be. In fact, since the dpif_port has a different name because it's a vport, it
>> >> > won't match any netdev at all. So, I did the opposite of getting the type from
>> >> > the dpif_port name, ie, if it contains vxlan_sys, it's of vxlan type. Even if we
>> >> > make this more strict and use the port, we could still be in conflict with a
>> >> > vxlan device created by the user with such name. This should have been one patch
>> >> > by itself.
>> >>
>> >> What about using the driver name exposed through ethtool? I believe
>> >> that all of the tunnel and internal devices expose this in a
>> >> consistent way - i.e. a VXLAN device can be queried and get back the
>> >> string "vxlan". Any driver other than the ones that we recognize can
>> >> be assumed to be OVS_VPORT_TYPE_NETDEV.
>> >>
>> >
>> > I don't see how this address the case when the user adds a vxlan interface
>> > created by the system. Like:
>> >
>> > ip link add vxlan_sys_4789 type vxlan id 10 remote 192.168.123.1 dstport 4789
>> > ovs-vsctl add-port br0 vxlan_sys_4789
>> >
>> > Its driver would be vxlan. That goes to vxlan0 too.
>>
>> I think we can check the combination of device type and the options
>> that are set on it (the same as what we need to do after we create the
>> device). If all of those match then it doesn't matter whether we added
>> it previously or the user added it - the device will work exactly the
>> same either way. If there isn't a match then we should bail out
>> without adding the port. This is pretty similar to the behavior of the
>> current compat code in the kernel (in that case if a port already
>> exists with the same name, it just aborts).
>>
>
> As I pointed out in another message, vxlan_sys* is a reserved name, if
> any port like that is added to the database, it's going to be rejected
> by vswitchd. The same goes for other vports. So, I think it's sufficient
> to check this name, then set the type appropriately, and depending on
> the type, try to remove the interface. So vxlan0 is not going to be
> removed, but vxlan_sys_4789 will.
>
>> Two unresolved issues in my mind:
>>  * How do we handle cleaning up ports (including in cases where
>> userspace crashes)? In the kernel case, the port will stick around
>> until either the module is removed or the port is explicitly deleted.
>>  * How do we handle upgrade where the new version of OVS needs options
>> that are different from the previous version? This is basically a
>> special version of the port being created by someone else but we
>> should be able to handle the differences. Depending on how good a job
>> we do at cleaning up the port, this might not be an issue.
>
> OVS already does a good job at cleaning up ports when it starts. In
> open_dpif_backer, any port not belonging to init_ofp_ports will be
> removed from the datapath. As I implemented the code, those vxlan_sys
> devices are going to be deleted as well after removal from the datapath,
> and only if they were present in the datapath in the first place. That
> means the port won't stick around in the case of a crash, and neither in
> the case of upgrades.

Thanks for that analysis. I agree with you that this looks safe. I'm
glad - there are fewer corner cases than I was expecting.

One minor comment that I noticed on the patch itself - I don't know if
the port mapping functions are handling IPsec variants of tunnels
correctly in all situations (or maybe are handling them by accident).
Since both IPsec and non-IPsec ports share the same dpif_port name, we
might get the wrong class back and then it seems like we don't handle
the IPsec class types when converting between names and types, which
could itself be an independent bug. Can you take a look?