[ovs-dev] openvswitch-2.4 possible bug in hmap_remove

Justin Pettit jpettit at nicira.com
Sat Oct 3 16:45:18 UTC 2015


It's still difficult to read.  Can you send it in plain text?

--Justin


> On Oct 2, 2015, at 11:22 PM, Richurov Kes <kesri1234 at rediffmail.com> wrote:
> 
> Hi,
> My earlier mail to ovs-dev had format errors. Resending again.I'm a student at Antwerp University D' Sciences and Technology and we are trying to bring up openvswitch-2.4 in our project environment. We are running into a crash that is affecting our research work. Here is some detail:
> Stack trace:(gdb) bt#0 &nbsp;hmap_remove (xcfg=0x2289520, xport=0x23874a0) at lib/hmap.h:245#1 &nbsp;xlate_xport_remove (xcfg=0x2289520, xport=0x23874a0)&nbsp; &nbsp; at ofproto/ofproto-dpif-xlate.c:1318#2 &nbsp;0x000000000043b6fc in xlate_xbridge_remove (xcfg=0x2289520,&nbsp; &nbsp; xbridge=0x2386c90) at ofproto/ofproto-dpif-xlate.c:1142#3 &nbsp;0x000000000043ba0a in xlate_xcfg_free ()&nbsp; &nbsp; at ofproto/ofproto-dpif-xlate.c:1091#4 &nbsp;xlate_txn_commit () at ofproto/ofproto-dpif-xlate.c:1050#5 &nbsp;0x0000000000426dbb in type_run (type=&lt;value optimized out&gt;)&nbsp; &nbsp; at ofproto/ofproto-dpif.c:731#6 &nbsp;0x000000000041a322 in ofproto_type_run (datapath_type=0x2285f00 "system")&nbsp; &nbsp; at ofproto/ofproto.c:1740#7 &nbsp;0x000000000040812d in bridge_run__ () at vswitchd/bridge.c:2928#8 &nbsp;0x00000000004113a5 in bridge_run () at vswitchd/bridge.c:3000#9 &nbsp;0x00000000004121ad in main (argc=10, argv=0x7fff37462dc8)&nbsp; &nbsp; at vswitchd/ovs-vswitchd.c
> :131
> We inspected some frames. In Frame 1, at line number 1318, we find xcfg-&gt;xports to be valid, and note the xport-&gt;hmap_node. All of the xport-&gt;ofport appear to be valid data.
> (gdb) frame 1#1 &nbsp;xlate_xport_remove (xcfg=0x2289520, xport=0x23874a0)&nbsp; &nbsp; at ofproto/ofproto-dpif-xlate.c:13181318 &nbsp; &nbsp; &nbsp; &nbsp;hmap_remove(&amp;xcfg-&gt;xports, &amp;xport-&gt;hmap_node);(gdb) list1313 &nbsp; &nbsp; &nbsp; &nbsp;}13141315 &nbsp; &nbsp; &nbsp; &nbsp;clear_skb_priorities(xport);1316 &nbsp; &nbsp; &nbsp; &nbsp;hmap_destroy(&amp;xport-&gt;skb_priorities);13171318 &nbsp; &nbsp; &nbsp; &nbsp;hmap_remove(&amp;xcfg-&gt;xports, &amp;xport-&gt;hmap_node); &nbsp; &nbsp; &nbsp;&lt;&lt;&lt;&lt;&lt;1319 &nbsp; &nbsp; &nbsp; &nbsp;hmap_remove(&amp;xport-&gt;xbridge-&gt;xports, &amp;xport-&gt;ofp_node);13201321 &nbsp; &nbsp; &nbsp; &nbsp;netdev_close(xport-&gt;netdev);(gdb) p &amp;xport-&gt;hmap_node$5 = (struct hmap_node *) 0x23874a0(gdb) p xport-&gt;hmap_node$6 = {hash = 7, next = 0x0}(gdb) p xcfg-&gt;xports$11 = {buckets = 0x2387620, one = 0x0, mask = 31, n = 45}(gdb) p xport-&gt;ofport$12 = (struct ofport_dpif *) 0x2232160(gdb) p *xport-&gt;o
> fport$13 = {odp_port_node = {hash = 0, next = 0x0}, up = {hmap_node = {&nbsp; &nbsp; &nbsp; hash = 186588107, next = 0x0}, ofproto = 0x21b6f20, netdev = 0x2231f90,&nbsp; &nbsp; pp = {port_no = 12, hw_addr = "\362\366\a\035\275D",&nbsp; &nbsp; &nbsp; name = "taf2fe2592c\000\000\000\000", config = 0,&nbsp; &nbsp; &nbsp; state = OFPUTIL_PS_STP_LISTEN, curr = 0, advertised = 0, supported = 0,&nbsp; &nbsp; &nbsp; peer = 0, curr_speed = 0, max_speed = 0}, ofp_port = 12, change_seq = 3,&nbsp; &nbsp; created = 1260028, mtu = 0, op_vendor_data = 0x22322a0, rte_node = {&nbsp; &nbsp; &nbsp; hash = 186588107, next = 0x0}}, ofpd_vendor_data = 0x21e03d0,&nbsp; odp_port = 9, bundle = 0x0, bundle_node = {prev = 0x0, next = 0x0},&nbsp; cfm = 0x0, bfd = 0x0, lldp = 0x0, may_enable = true, is_tunnel = true,&nbsp; is_layer3 = false, carrier_seq = 0, peer = 0x0, stp_port = 0x0,&nbsp; stp_state = STP_DISABLED, stp_state_entered = 0, rstp_port = 0x0,&nbsp; rstp_state = RSTP_DISABLED, qdscp = 0x0, 
> n
> _qdscp = 0, realdev_ofp_port = 0,&nbsp; vlandev_vid = 0}
> However, peculiarly, we find that xport-&gt;hmap_node to be in wrong hash map bucket in frame 0 and as a result we trip. See below:
> (gdb) frame 0#0 &nbsp;hmap_remove (xcfg=0x2289520, xport=0x23874a0) at lib/hmap.h:245245 &nbsp; &nbsp; &nbsp; &nbsp; while (*bucket != node) {(gdb) list240 &nbsp; &nbsp; &nbsp;* hmap_shrink() directly if desired. */241 &nbsp; &nbsp; static inline void242 &nbsp; &nbsp; hmap_remove(struct hmap *hmap, struct hmap_node *node)243 &nbsp; &nbsp; {244 &nbsp; &nbsp; &nbsp; &nbsp; struct hmap_node **bucket = &amp;hmap-&gt;buckets[node-&gt;hash &amp; hmap-&gt;mask];245 &nbsp; &nbsp; &nbsp; &nbsp; while (*bucket != node) {246 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; bucket = &amp;(*bucket)-&gt;next;247 &nbsp; &nbsp; &nbsp; &nbsp; }248 &nbsp; &nbsp; &nbsp; &nbsp; *bucket = node-&gt;next;249 &nbsp; &nbsp; &nbsp; &nbsp; hmap-&gt;n--;(gdb) p node-&gt;hash &amp; hmap-&gt;mask$7 = 7(gdb) p hmap-&gt;buckets[7]$8 = (struct hmap_node *) 0x0
> (gdb) p hmap-&gt;buckets[6]$15 = (struct hmap_node *) 0x23874a0
> This makes us wonder if there is a bug in openvswitch-2.4 hash map library that computes wrong address during hashing. There is no particular sequence of events that reproduce this bug. But it often occurs during downloading flows from controller or even when adding some ports or deleting ports from ovs-vsctl.
> Is this a known problem with openvswitch-2.4? If so, is there a patch available or is it fixed in new releases or master branch? Please help.
> Regards,Richurov
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev




More information about the dev mailing list