[ovs-dev] [PATCH v6] datapath: Add support for VXLAN tunnels to Open vSwitch
Kyle Mestery (kmestery)
kmestery at cisco.com
Wed Dec 5 16:49:07 UTC 2012
Thanks, I agree with all your comments below, I've addressed them all,
re-tested, and will be sending a new patch now (with the correct ordering
of commit messages vs. notes at the top). Thanks for the continued reviews
of this series Jesse!
Kyle
On Dec 4, 2012, at 4:52 PM, Jesse Gross <jesse at nicira.com> wrote:
> On Thu, Nov 29, 2012 at 2:46 PM, Kyle Mestery <kmestery at cisco.com> wrote:
>> Note: v6 of this patch addresses additional comments from
>> Jesse. Specifically, I've removed the hash table of
>> port/socket mappings and moved to a linked list. I've also
>> cleaned up the code around this quite a bit.
>>
>> Note: v5 of this patch addresses comments from Jesse
>> and Chris.
>>
>> Note: v4 of this patch removes VXLAN over IPSEC support,
>> per an offline conversation with Jesse.
>>
>> Actual commit message below.
>> ---
>> Add support for VXLAN tunnels to Open vSwitch. Add support
>> for setting the destination UDP port on a per-port basis.
>> This is done by adding a "dst_port" parameter to the port
>> configuration. This is only applicable currently to VXLAN
>> tunnels.
>>
>> Please note this currently does not implement any sort of multicast
>> learning. With this patch, VXLAN tunnels must be configured similar
>> to GRE tunnels (e.g. point to point). A subsequent patch will implement
>> a VXLAN control plane in userspace to handle multicast learning.
>>
>> This patch set is based on one posted by Ben Pfaff on Oct. 12, 2011
>> to the ovs-dev mailing list:
>>
>> http://openvswitch.org/pipermail/dev/2011-October/012051.html
>>
>> The patch has been maintained, updated, and freshened by me and a
>> version of it is available at the following github repository:
>>
>> https://github.com/mestery/ovs-vxlan/tree/vxlan
>>
>> I've tested this patch with multiple VXLAN tunnels between hosts
>> using different UDP port numbers. Performance is on par (though
>> slightly faster) than comparable GRE tunnels.
>>
>> See the following IETF draft for additional information about VXLAN:
>> http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02
>>
>> Signed-off-by: Kyle Mestery <kmestery at cisco.com>
>
> The git style is actually the reverse of the commit message and the
> comments (the commit message goes on top). If you do that then git
> will automatically chose the right pieces when it is applied.
>
> I received a sparse warning:
> CHECK /home/jgross/openvswitch/datapath/linux/vport-vxlan.c
> /home/jgross/openvswitch/datapath/linux/vport-vxlan.c:88:1: warning:
> symbol 'vxlan_ports' was not declared. Should it be static?
>
>> diff --git a/datapath/vport-vxlan.c b/datapath/vport-vxlan.c
>> new file mode 100644
>> index 0000000..b382bb5
>> --- /dev/null
>> +++ b/datapath/vport-vxlan.c
>> +/**
>> + * struct if_names - List of names
>> + * @list: list element.
>> + * @name: The name of the interface.
>> + */
>> +struct if_name {
>> + struct list_head list;
>> + char name[IFNAMSIZ];
>> +};
>
> I'm not quite sure why we need to be able to look up by name. I see
> it is used in vxlan_set_options() to find the socket to release but I
> think we could just do it by port number as is done in
> vxlan_tnl_destroy().
>
>> +/**
>> + * struct vxlan_port - Keeps track of open UDP ports
>> + * @list: list element.
>> + * @port: The UDP port number.
>> + * @net: The net namespace this port is assosciated with.
>> + * @if_names: List of names this port is assosciated with.
>> + * @socket: The socket created for this port number.
>> + * @count: How many ports are using this socket/port.
>> + */
>> +struct vxlan_port {
>> + struct list_head list;
>> + u16 port;
>> + struct net *net;
>
> Since we have the socket, you could save a little space and just use
> sock_net(vxlan_rcv_socket->sk) instead here.
>
>> +static struct vxlan_port *vxlan_port_exists(struct net *net, u16 port)
>> +{
>> + struct vxlan_port *vxlan_port;
>> +
>> + list_for_each_entry(vxlan_port, &vxlan_ports, list) {
>> + if (vxlan_port->port == port && vxlan_port->net == net)
>
> You could use net_eq() here (the benefit of these different accessors
> is they compile out if network namespaces aren't configured).
>
>> +static struct sk_buff *vxlan_build_header(const struct vport *vport,
>> + const struct tnl_mutable_config *mutable,
>> + struct dst_entry *dst,
>> + struct sk_buff *skb,
>> + int tunnel_hlen)
>> +{
>> + struct udphdr *udph = udp_hdr(skb);
>> + struct vxlanhdr *vxh = (struct vxlanhdr *)(udph + 1);
>> + const struct ovs_key_ipv4_tunnel *tun_key = OVS_CB(skb)->tun_key;
>> + __be64 out_key;
>> + u32 flags;
>> +
>> + tnl_get_param(mutable, tun_key, &flags, &out_key);
>> +
>> + udph->dest = htons(mutable->dst_port);
>
> It would be nice to store this in network byte order rather than doing
> the swap every time.
>
>> +static int vxlan_set_options(struct vport *vport, struct nlattr *options)
>> +{
>> + int err;
>> + const char *vname = vport->ops->get_name(vport);
>> + struct net *net = ovs_dp_get_net(vport->dp);
>> + struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
>> + struct tnl_mutable_config *config;
>> + struct vxlan_port *old_port = NULL;
>> + struct vxlan_port *vxlan_port = NULL;
>> +
>> + config = rtnl_dereference(tnl_vport->mutable);
>> +
>> + old_port = find_vxlan_port_by_name(net, vname);
>> + if (old_port && old_port->port == config->dst_port) {
>> + /* Release old socket */
>> + vxlan_tunnel_release(old_port);
>> + }
>> +
>> + err = vxlan_tunnel_setup(net, vname, options, &vxlan_port);
>> + if (err)
>> + goto out;
>> +
>> + err = ovs_tnl_set_options(vport, options);
>> +
>> + if (err)
>> + vxlan_tunnel_release(vxlan_port);
>
> I think we really want to make releasing the old socket the last thing
> that we do. Otherwise, if part of the operation fails then we'll be
> left in an inconsistent state without a way to rollback. Also, if an
> option other than the destination port is being changed then there
> will be a blip in traffic while we create and destroy the port. In
> this case, the refcounting it makes it easy to avoid both of the
> problems.
>
>> +static void vxlan_tnl_destroy(struct vport *vport)
>> +{
>> + struct vxlan_port *vxlan_port;
>> + struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
>> + struct tnl_mutable_config *config;
>> +
>> + config = rtnl_dereference(tnl_vport->mutable);
>> +
>> + vxlan_port = vxlan_port_exists(ovs_dp_get_net(vport->dp),
>> + config->dst_port);
>> + if (!vxlan_port)
>> + goto out;
>
> It almost seems better to not try to handle the NULL case here. It
> should never happen so it's good to know if it does.
>
>> +static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
>> +{
>> + int err;
>> + struct vport *vport;
>> + struct vxlan_port *vxlan_port = NULL;
>> +
>> + err = vxlan_tunnel_setup(ovs_dp_get_net(parms->dp), parms->name,
>> + parms->options, &vxlan_port);
>> + if (err) {
>> + vxlan_tunnel_release(vxlan_port);
>
> Is it necessary to call release at this point? It looks like setup
> should clean everything up if it fails (and worse it could cause us to
> dereference a NULL pointer).
>
>> + vport = ovs_tnl_create(parms, &ovs_vxlan_vport_ops, &ovs_vxlan_tnl_ops);
>> +
>> + if (!vport) {
>> + vxlan_tunnel_release(vxlan_port);
>> + }
>
> The usual kernel convention is drop the braces in this case.
More information about the dev
mailing list