[ovs-dev] [netlink v4 49/52] datapath: Convert ODP_DP_* commands to use AF_NETLINK socket layer.

Ben Pfaff blp at nicira.com
Mon Jan 24 21:53:43 UTC 2011


On Sun, Jan 23, 2011 at 03:55:14PM -0800, Jesse Gross wrote:
> On Sat, Jan 22, 2011 at 4:28 PM, Ben Pfaff <blp at nicira.com> wrote:
> > On Fri, Jan 21, 2011 at 06:38:21PM -0800, Jesse Gross wrote:
> >> On Tue, Jan 11, 2011 at 9:50 PM, Ben Pfaff <blp at nicira.com> wrote:
> >> > diff --git a/datapath/datapath.c b/datapath/datapath.c
> >> > index 5fea1af..5d4e1f2 100644
> >> > --- a/datapath/datapath.c
> >> > +++ b/datapath/datapath.c
> >> > +static int odpdc_new(struct sk_buff *skb, struct genl_info *info)
> >> [...]
> >> > - ? ? ? return 0;
> >> > + ? ? ? return genlmsg_reply(reply, info);
> >>
> >> Should we set an error on the socket if the reply/message creation
> >> fails? ?That's what we (and the bridge) do if new port notification
> >> fail.
> >
> > I think you are talking about this code in dp_ifinfo_notify():
> >
> > ? ? ? ? ? ?rtnl_notify(skb, &init_net, 0, RTNLGRP_LINK, NULL, GFP_KERNEL);
> > ? ? ? ? ? ?return;
> > ? ?errout:
> > ? ? ? ? ? ?if (err < 0)
> > ? ? ? ? ? ? ? ? ? ?rtnl_set_sk_err(&init_net, RTNLGRP_LINK, err);
> >
> > The need for this is specific to multicasting notifications. ?If you
> > chase down the tree of functions called by genlmsg_reply(), you end up
> > in netlink_attachskb(), which calls netlink_overrun() if the socket
> > buffer is full, and netlink_overrun() in turn sets the socket error code
> > to -ENOBUFS.
> 
> Actually, I think that multicasting sets the same error codes on
> sockets: af_netlink.c:do_one_broadcast() similarly calls
> netlink_overrun() if it runs into a problem, which is where
> rtnl_notify() eventually ends up.
> 
> Setting the socket error code by hand is only important if the message
> creation fails before the message is actually sent: memory allocation
> failure for example.  In that case we need to set an error so all
> multicast listeners know that they missed something, which is what is
> happening in the code above.

Thanks for clearing that up.  I hadn't noticed that subtlety before.

> In this case, though since the reply is unicast we can just directly
> return an error code, which is what we are already doing.  So we're
> fine for now but will need this when we do multicast.  Should we do it
> for upcalls, the one place where we currently do multicast?

Yes, I think that we should.  The fix for this (incremental appended)
required some intrusive new compat code.  Do you see a better way to do
it?

(I noticed that <linux/jhash.h> was missing when compiling for Xen
5.5.0.)

> >> It's also not really clear to me when we should send a reply. ?We have:
> >> * New objects unicast a response back.
> >> * Flows unicast a response on deletion.
> >> * Ports multicast notifications on creation and deletion as the bridge
> >> does through RTNL.
> >
> > Yes, this needs to be made consistent. ?I believe that the "correct" way
> > to do it is that each type of object should multicast its status on
> > creation, deletion, or change. ?(I think that the goal is supposed to be
> > to allow userspace to accurately track the changes in a table over time;
> > that's essentially what libnl allows one to do.) ?This is on my to-do
> > list, along with using sk_buff frags for long sets of flow actions. ?It
> > shouldn't be too difficult; I was just tired when I wrote this code.
> 
> OK, that makes sense to me.  Should we also keep the existing RTNL
> notifications?  It feels like compatibility code.

These are needed for brcompat.

diff --git a/datapath/datapath.c b/datapath/datapath.c
index 0011477..2512323 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -17,6 +17,7 @@
 #include <linux/if_vlan.h>
 #include <linux/in.h>
 #include <linux/ip.h>
+#include <linux/jhash.h>
 #include <linux/delay.h>
 #include <linux/time.h>
 #include <linux/etherdevice.h>
@@ -412,8 +413,10 @@ static int queue_control_packets(struct datapath *dp, struct sk_buff *skb,
 			len += nla_total_size(upcall_info->actions_len);
 
 		user_skb = genlmsg_new(len, GFP_ATOMIC);
-		if (!user_skb)
+		if (!user_skb) {
+			netlink_set_err(INIT_NET_GENL_SOCK, 0, group, -ENOBUFS);
 			goto err_kfree_skbs;
+		}
 
 		upcall = genlmsg_put(user_skb, 0, 0, &dp_packet_genl_family, 0, upcall_info->cmd);
 		upcall->dp_idx = dp->dp_idx;
diff --git a/datapath/linux-2.6/Modules.mk b/datapath/linux-2.6/Modules.mk
index 2c544cf..8286cc4 100644
--- a/datapath/linux-2.6/Modules.mk
+++ b/datapath/linux-2.6/Modules.mk
@@ -48,6 +48,7 @@ openvswitch_headers += \
 	linux-2.6/compat-2.6/include/net/dst.h \
 	linux-2.6/compat-2.6/include/net/genetlink.h \
 	linux-2.6/compat-2.6/include/net/ip.h \
+	linux-2.6/compat-2.6/include/net/net_namespace.h \
 	linux-2.6/compat-2.6/include/net/netlink.h \
 	linux-2.6/compat-2.6/include/net/protocol.h \
 	linux-2.6/compat-2.6/include/net/route.h \
diff --git a/datapath/linux-2.6/compat-2.6/include/net/genetlink.h b/datapath/linux-2.6/compat-2.6/include/net/genetlink.h
index 06a14c8..f5bff63 100644
--- a/datapath/linux-2.6/compat-2.6/include/net/genetlink.h
+++ b/datapath/linux-2.6/compat-2.6/include/net/genetlink.h
@@ -4,6 +4,7 @@
 
 #include <linux/netlink.h>
 #include_next <net/genetlink.h>
+#include <net/net_namespace.h>
 
 #include <linux/version.h>
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,23)
diff --git a/datapath/linux-2.6/compat-2.6/include/net/net_namespace.h b/datapath/linux-2.6/compat-2.6/include/net/net_namespace.h
new file mode 100644
index 0000000..9ce9fcd
--- /dev/null
+++ b/datapath/linux-2.6/compat-2.6/include/net/net_namespace.h
@@ -0,0 +1,15 @@
+#ifndef __NET_NET_NAMESPACE_WRAPPER_H
+#define __NET_NET_NAMESPACE_WRAPPER_H 1
+
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,24)
+/* <net/net_namespace.h> exists, go ahead and include it. */
+#include_next <net/net_namespace.h>
+#endif
+
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,32)
+#define INIT_NET_GENL_SOCK init_net.genl_sock
+#else
+#define INIT_NET_GENL_SOCK genl_sock
+#endif
+
+#endif /* net/net_namespace.h wrapper */




More information about the dev mailing list