[ovs-discuss] Failed to add ovs bridge

fukaige fukaige at huawei.com
Fri May 12 01:34:46 UTC 2017


I am not using STP/RSTP.I saw the bug fix you mentioned, seems it is irrelevant to my problem.
May be there is some race condition lead to deleting netdev in netdev_shash. But, I cannot figure
it out right now.

The occurrence probability is very low. I just hit this for three times in two month.

> -----Original Message-----
> From: Ben Pfaff [mailto:blp at ovn.org]
> Sent: Thursday, May 11, 2017 9:32 PM
> To: fukaige
> Cc: ovs-discuss at openvswitch.org; joe at ovn.org
> Subject: Re: Failed to add ovs bridge
> 
> Are you using STP or RSTP?  There's a bug fix related to them on branch-2.5.
> 
> On Thu, May 11, 2017 at 11:11:02AM +0000, fukaige wrote:
> > Hi all,
> >
> > Occasionally, I get error when creating a bridge using “ovs-vsctl add-br
> br-eth”
> >
> >
> > ovs-vsctl: Error detected while setting up 'br-eth'.  See ovs-vswitchd log for
> details.
> >
> >
> > Ovs-vswitched log is below:
> >
> > 2017-05-11T03:45:25.293Z|00026|ofproto_dpif|INFO|system at ovs-system:
> > Datapath supports recirculation
> > 2017-05-11T03:45:25.293Z|00027|ofproto_dpif|INFO|system at ovs-system:
> > MPLS label stack length probed as 1
> > 2017-05-11T03:45:25.293Z|00028|ofproto_dpif|INFO|system at ovs-system:
> > Datapath supports unique flow ids
> > 2017-05-11T03:45:25.293Z|00029|ofproto_dpif|INFO|system at ovs-system:
> > Datapath supports ct_state
> > 2017-05-11T03:45:25.293Z|00030|ofproto_dpif|INFO|system at ovs-system:
> > Datapath supports ct_zone
> > 2017-05-11T03:45:25.293Z|00031|ofproto_dpif|INFO|system at ovs-system:
> > Datapath supports ct_mark
> > 2017-05-11T03:45:25.293Z|00032|ofproto_dpif|INFO|system at ovs-system:
> > Datapath supports ct_label
> > 2017-05-11T03:45:25.364Z|00001|ofproto_dpif_upcall(handler226)|INFO|re
> > ceived packet on unassociated datapath port 0
> > 2017-05-11T03:45:25.368Z|00033|netdev_linux|WARN|ethtool command
> > ETHTOOL_GFLAGS on network device br-eth failed: No such device
> > 2017-05-11T03:45:25.368Z|00034|dpif|WARN|system at ovs-system: failed
> to
> > add br-eth as port: No such device
> > 2017-05-11T03:45:25.368Z|00035|bridge|INFO|bridge br-eth: using
> > datapath ID 00002a51cf9f2841
> > 2017-05-11T03:45:25.368Z|00036|connmgr|INFO|br-eth: added service
> controller "punix:/var/run/openvswitch/br-eth.mgmt"
> >
> > Then I delete the br-eth, then try to add it. But, still get same error as above.
> However, bridge which name is different from br-eth can be created
> successfully.
> >
> > Some clues:
> >
> > 1.       As I kown, the port br-eth’s type is internel, and there is no way to
> get into netdev_linux_ethtool_set_flag(). But, the log shows that request.type
> is wrong.
> > request.type get wrong value OVS_VPORT_TYPE_NETDEV instead of
> OVS_VPORT_TYPE_INTERNAL.
> >
> > static int
> > dpif_netlink_port_add__(struct dpif_netlink *dpif, struct netdev *netdev,
> >                         odp_port_t *port_nop)
> >     OVS_REQ_WRLOCK(dpif->upcall_lock)
> > {
> >          ……
> >
> >     if (request.type == OVS_VPORT_TYPE_NETDEV) { #ifdef _WIN32
> >         /* XXX : Map appropiate Windows handle */ #else
> >         netdev_linux_ethtool_set_flag(netdev, ETH_FLAG_LRO, "LRO",
> > false); #endif }
> >
> > ……
> > }
> >
> >
> > 2.       Debug ovs-vswitchd with gdb. I find that there is a netdev with
> same name was not deleted(lib/netdev.c:netdev_open).
> > netdev_open (name=0xffff6000d6b0 "br-int", type=0x52ca80 "internal",
> netdevp=0xfffffc20fab8, netdevp at entry=0xfffffc20fb28)
> >     at lib/netdev.c:354
> > 354  {
> > (gdb) n
> > 358      netdev_initialize();
> > (gdb)
> > 360      ovs_mutex_lock(&netdev_class_mutex);
> > (gdb)
> > 361      ovs_mutex_lock(&netdev_mutex);
> > (gdb)
> > 360      ovs_mutex_lock(&netdev_class_mutex);
> > (gdb)
> > 361      ovs_mutex_lock(&netdev_mutex);
> > (gdb)
> > 362      netdev = shash_find_data(&netdev_shash, name);
> > (gdb)
> > 363      if (!netdev) {
> > (gdb) print netdev->name
> > $1 = 0x47852e0 "br-int"
> > (gdb) print netdev->refcnt
> > There is no member named refcnt.
> > (gdb) n
> > 405          netdev->ref_cnt++;
> > (gdb) print netdev->ref_cnt
> > $2 = 2
> > (gdb) n
> > 406          *netdevp = netdev;
> > (gdb) print netdev->ref_cnt
> > $3 = 3
> >
> > There must be something wrong when deleting bridge. But, I cannot find out
> a way to reproduce it and why it was not deleted correctly. Is
> > any can offer some suggestions to reproduce the error or solve it?
> >
> > Note:
> > ovs version: 2.5.2
> > kernel version: 4.1
> >


More information about the discuss mailing list