[ovs-dev] Fail to netdev_open internal iface with error "File exists"

Eelco Chaudron echaudro at redhat.com
Wed Jul 5 15:03:06 UTC 2017


On 23/06/17 10:37, Eelco Chaudron wrote:
> On 06/23/2017 02:12 AM, Ben Pfaff wrote:
>> On Thu, Jun 22, 2017 at 04:04:44PM -0300, Flavio Leitner wrote:
>>> On Thu, Jun 22, 2017 at 01:04:59AM +0800, Huanle Han wrote:
>>>> Hi,all
>>>>
>>>> I get this problem with latest(dbd8112) branch-2.7 code on my Ubuntu.
>>>> root at ubuntu:/var/log/# ovs-vsctl show
>>>> adf2ea99-0c53-4180-914f-7dadaa71302b
>>>>      Bridge test
>>>>          Port test
>>>>              Interface test
>>>>                  type: internal
>>>>      Bridge "manage"
>>>>          Port "manage"
>>>>              Interface "manage"
>>>>                  type: internal
>>>>                  error: "could not open network device manage (File 
>>>> exists)"
>>>>          Port "veth0"
>>>>              Interface "veth0"
>>>>          Port "eth0"
>>>>              Interface "eth0"
>>>>      ovs_version: "2.7.0"
>>>>
>>>> How to reproduce:
>>>> 1. add bridge "manage", up and add ip on it
>>>> 2. restart ovs-vswitchd
>>>> 3. "ovs-vsctl show" displays error message.
>>>>
>>>> The reason:
>>>> In following "netdev_open" call on ovs-vswitchd start, input "type" 
>>>> is NULL
>>>> and "manage" is opened as a "system" netdev_class iface incorrectly.
>>>>
>>>> #0  netdev_open (name=0x7fffffffe2bc "manage", type=0x0,
>>>> netdevp=0x7fffffffc3b0) at ../lib/netdev.c:396
>>>> #1  0x000000000052c492 in get_src_addr (ip6_dst=0x7fffffffe2ac,
>>>> output_bridge=0x7fffffffe2bc "manage", psrc=0x8f3490) at
>>>> ../lib/ovs-router.c:141
>>>> #2  0x000000000052c85d in ovs_router_insert__ (priority=104 'h',
>>>> ip6_dst=0x7fffffffe29c, plen=104 'h', output_bridge=0x7fffffffe2bc
>>>> "manage", gw=0x7fffffffe2ac) at ../lib/ovs-router.c:202
>>>> #3  0x000000000052c980 in ovs_router_insert (ip_dst=0x7fffffffe29c,
>>>> plen=104 'h', output_bridge=0x7fffffffe2bc "manage", 
>>>> gw=0x7fffffffe2ac) at
>>>> ../lib/ovs-router.c:228
>>>> #4  0x000000000058f63a in route_table_handle_msg 
>>>> (change=0x7fffffffe290) at
>>>> ../lib/route-table.c:295
>>>> #5  0x000000000058f1da in route_table_reset () at 
>>>> ../lib/route-table.c:174
>>>> #6  0x000000000058f034 in route_table_init () at 
>>>> ../lib/route-table.c:110
>>>> #7  0x0000000000495838 in dp_initialize () at ../lib/dpif.c:126
>>>> #8  0x0000000000495b40 in dp_enumerate_types (types=0x7fffffffe3a0) at
>>>> ../lib/dpif.c:244
>>>> #9  0x000000000042eb1c in enumerate_types (types=0x7fffffffe3a0) at
>>>> ../ofproto/ofproto-dpif.c:267
>>>> #10 0x000000000041b81c in ofproto_enumerate_types 
>>>> (types=0x7fffffffe3a0) at
>>>> ../ofproto/ofproto.c:432
>>>> #11 0x000000000040df1e in bridge_run__ () at ../vswitchd/bridge.c:3020
>>>> #12 0x000000000040e196 in bridge_run () at ../vswitchd/bridge.c:3082
>>>> #13 0x00000000004138ef in main (argc=1, argv=0x7fffffffe578) at
>>>> ../vswitchd/ovs-vswitchd.c:119
>>>>
>>>> After then, ovs fails to netdev_open "manage" with type == "internal".
>>>> "File exists" error is reported.
>>>> I think commit d3b8f50(netdev: Fix netdev_open() to adhere to class 
>>>> type if
>>>> given) introduces this problem. It need be improved.
>>> Your analysis is correct.  One solution is to wait vswitchd to
>>> configure first and only then enable ovs-route.  The problem is that
>>> more modules might use netdev_open() and it doesn't sound like a good
>>> idea to have a control per module.
>>>
>>> Another option is to map the device's info to a class instead of using
>>> "system", so we could use internal classes for vports, for instance.
>>> However, that doesn't guarantee it will match with what is configured
>>> in the DB.
>> This sounds like the kind of problem that I expected the following
>> commit might cause.
>>
>>      commit d3b8f5052292b3ba9084ffed097e90b87f2950f5
>>      Author: Eelco Chaudron <echaudro at redhat.com>
>>      Date:   Thu Jun 1 14:38:09 2017 +0200
>>
>>          netdev: Fix netdev_open() to adhere to class type if given
>>
>> If we can't fix it somehow, we might need to revert.
>
> Reverting the above patch does not solve the real issue here. If we 
> revert we
> donot get the error, but the wrong class is used, hence the wrong 
> callbacks
> get called.
>
> The main issue is with netdev_open() being called with type = NULL before
> the interface is actually configured in the system. We could track these
> "auto" generated interfaces, and once netdev_open() gets called with a
> valid type reconfigure (re-create) it. netdev_remove() could work here
> but not sure if its safe due to reference counting.
>
Some background info; I see a lot of netdev_open() with a NULL type, but 
they just grep some data and closed it again. I found that the tunnel 
code is keeping a reference to the netdev (for the IP assigned) and 
hence it is not going away before the "real" interface opens it.

The change below is based on tracking the "classless" opened devices, 
and once they get opened with a "real" class, re-create them. I did some 
basic testing and it seem to work fine, also went over related code and 
could not find any corner case.

So please take a peek, and if it all makes sense I can send out an 
official patch.

Thanks,

Eelco


diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index 3c3c181..b3c57d5 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -45,6 +45,10 @@ struct netdev {
      const struct netdev_class *netdev_class; /* Functions to control
                                                  this device. */

+    /* If this is 'true' the user did not specify a netdev_class when
+     * opening this device, and therefore got assigned to the "system" 
class */
+    bool auto_classified;
+
      /* A sequence number which indicates changes in one of 'netdev''s
       * properties.   It must be nonzero so that users have a value which
       * they may use as a reset when tracking 'netdev'.
diff --git a/lib/netdev.c b/lib/netdev.c
index a7840a8..89afa71 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -361,7 +361,7 @@ netdev_open(const char *name, const char *type, 
struct netdev **netdevp)
      OVS_EXCLUDED(netdev_mutex)
  {
      struct netdev *netdev;
-    int error;
+    int error = 0;

      if (!name[0]) {
          /* Reject empty names.  This saves the providers having to do 
this.  At
@@ -375,6 +375,29 @@ netdev_open(const char *name, const char *type, 
struct netdev **netdevp)

      ovs_mutex_lock(&netdev_mutex);
      netdev = shash_find_data(&netdev_shash, name);
+
+    if (netdev &&
+        type && type[0] && strcmp(type, netdev->netdev_class->type)) {
+
+        if (netdev->auto_classified) {
+            /* If this device was first created without a 
classification type,
+             * for example due to routing or tunneling code, and they 
keep a
+             * reference, a "classified" call to open will fail. In 
this case
+             * we remove the classless device, and re-add it below. We 
remove
+             * the netdev from the shash, and change the sequence, so 
owners of
+             * the old classless device can release/cleanup. */
+            if (netdev->node) {
+                shash_delete(&netdev_shash, netdev->node);
+                netdev->node = NULL;
+                netdev_change_seq_changed(netdev);
+            }
+
+            netdev = NULL;
+        } else {
+            error = EEXIST;
+        }
+    }
+
      if (!netdev) {
          struct netdev_registered_class *rc;

@@ -384,6 +407,7 @@ netdev_open(const char *name, const char *type, 
struct netdev **netdevp)
              if (netdev) {
                  memset(netdev, 0, sizeof *netdev);
                  netdev->netdev_class = rc->class;
+                netdev->auto_classified = type && type[0] ? false : true;
                  netdev->name = xstrdup(name);
                  netdev->change_seq = 1;
                  netdev->reconfigure_seq = seq_create();
@@ -416,10 +440,6 @@ netdev_open(const char *name, const char *type, 
struct netdev **netdevp)
                        name, type);
              error = EAFNOSUPPORT;
          }
-    } else if (type && type[0] && strcmp(type, 
netdev->netdev_class->type)) {
-        error = EEXIST;
-    } else {
-        error = 0;
      }

      if (!error) {












More information about the dev mailing list