[ovs-dev] [PATCH v6] datapath: Add support for VXLAN tunnels to Open vSwitch

Kyle Mestery (kmestery) kmestery at cisco.com
Wed Dec 5 16:49:07 UTC 2012


Thanks, I agree with all your comments below, I've addressed them all,
re-tested, and will be sending a new patch now (with the correct ordering
of commit messages vs. notes at the top). Thanks for the continued reviews
of this series Jesse!

Kyle

On Dec 4, 2012, at 4:52 PM, Jesse Gross <jesse at nicira.com> wrote:
> On Thu, Nov 29, 2012 at 2:46 PM, Kyle Mestery <kmestery at cisco.com> wrote:
>> Note: v6 of this patch addresses additional comments from
>> Jesse. Specifically, I've removed the hash table of
>> port/socket mappings and moved to a linked list. I've also
>> cleaned up the code around this quite a bit.
>> 
>> Note: v5 of this patch addresses comments from Jesse
>> and Chris.
>> 
>> Note: v4 of this patch removes VXLAN over IPSEC support,
>> per an offline conversation with Jesse.
>> 
>> Actual commit message below.
>> ---
>> Add support for VXLAN tunnels to Open vSwitch. Add support
>> for setting the destination UDP port on a per-port basis.
>> This is done by adding a "dst_port" parameter to the port
>> configuration. This is only applicable currently to VXLAN
>> tunnels.
>> 
>> Please note this currently does not implement any sort of multicast
>> learning. With this patch, VXLAN tunnels must be configured similar
>> to GRE tunnels (e.g. point to point). A subsequent patch will implement
>> a VXLAN control plane in userspace to handle multicast learning.
>> 
>> This patch set is based on one posted by Ben Pfaff on Oct. 12, 2011
>> to the ovs-dev mailing list:
>> 
>> http://openvswitch.org/pipermail/dev/2011-October/012051.html
>> 
>> The patch has been maintained, updated, and freshened by me and a
>> version of it is available at the following github repository:
>> 
>> https://github.com/mestery/ovs-vxlan/tree/vxlan
>> 
>> I've tested this patch with multiple VXLAN tunnels between hosts
>> using different UDP port numbers. Performance is on par (though
>> slightly faster) than comparable GRE tunnels.
>> 
>> See the following IETF draft for additional information about VXLAN:
>> http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-02
>> 
>> Signed-off-by: Kyle Mestery <kmestery at cisco.com>
> 
> The git style is actually the reverse of the commit message and the
> comments (the commit message goes on top).  If you do that then git
> will automatically chose the right pieces when it is applied.
> 
> I received a sparse warning:
>  CHECK   /home/jgross/openvswitch/datapath/linux/vport-vxlan.c
> /home/jgross/openvswitch/datapath/linux/vport-vxlan.c:88:1: warning:
> symbol 'vxlan_ports' was not declared. Should it be static?
> 
>> diff --git a/datapath/vport-vxlan.c b/datapath/vport-vxlan.c
>> new file mode 100644
>> index 0000000..b382bb5
>> --- /dev/null
>> +++ b/datapath/vport-vxlan.c
>> +/**
>> + * struct if_names - List of names
>> + * @list: list element.
>> + * @name: The name of the interface.
>> + */
>> +struct if_name {
>> +       struct list_head list;
>> +       char name[IFNAMSIZ];
>> +};
> 
> I'm not quite sure why we need to be able to look up by name.  I see
> it is used in vxlan_set_options() to find the socket to release but I
> think we could just do it by port number as is done in
> vxlan_tnl_destroy().
> 
>> +/**
>> + * struct vxlan_port - Keeps track of open UDP ports
>> + * @list: list element.
>> + * @port: The UDP port number.
>> + * @net: The net namespace this port is assosciated with.
>> + * @if_names: List of names this port is assosciated with.
>> + * @socket: The socket created for this port number.
>> + * @count: How many ports are using this socket/port.
>> + */
>> +struct vxlan_port {
>> +       struct list_head list;
>> +       u16 port;
>> +       struct net *net;
> 
> Since we have the socket, you could save a little space and just use
> sock_net(vxlan_rcv_socket->sk) instead here.
> 
>> +static struct vxlan_port *vxlan_port_exists(struct net *net, u16 port)
>> +{
>> +       struct vxlan_port *vxlan_port;
>> +
>> +       list_for_each_entry(vxlan_port, &vxlan_ports, list) {
>> +               if (vxlan_port->port == port && vxlan_port->net == net)
> 
> You could use net_eq() here (the benefit of these different accessors
> is they compile out if network namespaces aren't configured).
> 
>> +static struct sk_buff *vxlan_build_header(const struct vport *vport,
>> +                                         const struct tnl_mutable_config *mutable,
>> +                                         struct dst_entry *dst,
>> +                                         struct sk_buff *skb,
>> +                                         int tunnel_hlen)
>> +{
>> +       struct udphdr *udph = udp_hdr(skb);
>> +       struct vxlanhdr *vxh = (struct vxlanhdr *)(udph + 1);
>> +       const struct ovs_key_ipv4_tunnel *tun_key = OVS_CB(skb)->tun_key;
>> +       __be64 out_key;
>> +       u32 flags;
>> +
>> +       tnl_get_param(mutable, tun_key, &flags, &out_key);
>> +
>> +       udph->dest = htons(mutable->dst_port);
> 
> It would be nice to store this in network byte order rather than doing
> the swap every time.
> 
>> +static int vxlan_set_options(struct vport *vport, struct nlattr *options)
>> +{
>> +       int err;
>> +       const char *vname = vport->ops->get_name(vport);
>> +       struct net *net = ovs_dp_get_net(vport->dp);
>> +       struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
>> +       struct tnl_mutable_config *config;
>> +       struct vxlan_port *old_port = NULL;
>> +       struct vxlan_port *vxlan_port = NULL;
>> +
>> +       config = rtnl_dereference(tnl_vport->mutable);
>> +
>> +       old_port = find_vxlan_port_by_name(net, vname);
>> +       if (old_port && old_port->port == config->dst_port) {
>> +               /* Release old socket */
>> +               vxlan_tunnel_release(old_port);
>> +       }
>> +
>> +       err = vxlan_tunnel_setup(net, vname, options, &vxlan_port);
>> +       if (err)
>> +               goto out;
>> +
>> +       err = ovs_tnl_set_options(vport, options);
>> +
>> +       if (err)
>> +               vxlan_tunnel_release(vxlan_port);
> 
> I think we really want to make releasing the old socket the last thing
> that we do.  Otherwise, if part of the operation fails then we'll be
> left in an inconsistent state without a way to rollback.  Also, if an
> option other than the destination port is being changed then there
> will be a blip in traffic while we create and destroy the port.  In
> this case, the refcounting it makes it easy to avoid both of the
> problems.
> 
>> +static void vxlan_tnl_destroy(struct vport *vport)
>> +{
>> +       struct vxlan_port *vxlan_port;
>> +       struct tnl_vport *tnl_vport = tnl_vport_priv(vport);
>> +       struct tnl_mutable_config *config;
>> +
>> +       config = rtnl_dereference(tnl_vport->mutable);
>> +
>> +       vxlan_port = vxlan_port_exists(ovs_dp_get_net(vport->dp),
>> +                                        config->dst_port);
>> +       if (!vxlan_port)
>> +               goto out;
> 
> It almost seems better to not try to handle the NULL case here.  It
> should never happen so it's good to know if it does.
> 
>> +static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
>> +{
>> +       int err;
>> +       struct vport *vport;
>> +       struct vxlan_port *vxlan_port = NULL;
>> +
>> +       err = vxlan_tunnel_setup(ovs_dp_get_net(parms->dp), parms->name,
>> +                                               parms->options, &vxlan_port);
>> +       if (err) {
>> +               vxlan_tunnel_release(vxlan_port);
> 
> Is it necessary to call release at this point?  It looks like setup
> should clean everything up if it fails (and worse it could cause us to
> dereference a NULL pointer).
> 
>> +       vport = ovs_tnl_create(parms, &ovs_vxlan_vport_ops, &ovs_vxlan_tnl_ops);
>> +
>> +       if (!vport) {
>> +               vxlan_tunnel_release(vxlan_port);
>> +       }
> 
> The usual kernel convention is drop the braces in this case.





More information about the dev mailing list