[ovs-discuss] OVS bridges in docker containers segfault when dpdkvhostuser port is added.

Alan Kayahan hsykay at gmail.com
Fri Nov 2 18:01:23 UTC 2018


Thanks for the response all.

@Ian
1GB pagesize, 8 total pages. OVS is launched without the
dpdk-socket-mem options so it should take 1GB. When one switch is
started, free hugepage count drops to 7. When I launch another, I'd
expect it to drop to 6 but it crashes instead.

My DPDK apps create the hugepage file in /dev/hugepages with the name
I specify. I am assuming ovs-vswitchd is responsible for the naming of
ovs hugepage files in /dev/hugepages. I believe it wouldn't be a
problem (I will test this and respond) if both dpdk bridges were
managed by the same ovs-vswitchd service. But in the containerized
scenario, two ovs-vswitchd services are accessing the same
/dev/hugepaes path. Dont you think this would be a problem? Or is it
the openvswitch kernel module that is in charge of hugepage
coordination?

@Ben
Will retrieve the info you requested as soon as I eliminate couple of
other possible causes.

Alan
On Wed, Oct 31, 2018 at 11:00 AM Stokes, Ian <ian.stokes at intel.com> wrote:
>
> > On Thu, Oct 25, 2018 at 09:51:38PM +0200, Alan Kayahan wrote:
> > > Hello,
> > >
> > > I have 3 OVS bridges on the same host, connected to each other as
> > > br1<->br2<->br3. br1 and br3 are connected to the docker container cA
> > > via dpdkvhostuser port type (I know it is deprecated, the app works
> > > this way only). The DPDK app running in cA generate packets, which
> > > traverse bridges br1->br2->br3, then ends up back at the DPDK app.
> > > This setup works fine.
> > >
> > > Now I am trying to put each OVS bridge into its respective docker
> > > container. I connect the containers with veth pairs, then add the veth
> > > ports to the bridges. Next, I add a dpdkvhostuser port named SRC to
> > > br1, so far so good. The moment I add a dpdkvhostuser port named SNK
> > > to br3, ovs-vswitchd services in br1's and br3's containers segfault.
> > > Following are the backtraces from each,
>
> What version of OVS and DPDK are you using?
>
> > >
> > > ------------------br1's container---------------
> > >
> > > [Thread debugging using libthread_db enabled] Using host libthread_db
> > > library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> > > Core was generated by `ovs-vswitchd
> > > unix:/usr/local/var/run/openvswitch/db.sock -vconsole:emer -vsyslo'.
> > > Program terminated with signal SIGSEGV, Segmentation fault.
> > > #0  0x00005608fa0f321b in netdev_rxq_recv (rx=0x7ff13c34ee80,
> > >     batch=batch at entry=0x7ff1bbb4d890) at lib/netdev.c:702
> > > 702    retval = rx->netdev->netdev_class->rxq_recv(rx, batch);
> > > [Current thread is 1 (Thread 0x7ff1bbb4e700 (LWP 376))]
> > > (gdb) bt
> > > #0  0x00005608fa0f321b in netdev_rxq_recv (rx=0x7ff13c34ee80,
> > >     batch=batch at entry=0x7ff1bbb4d890) at lib/netdev.c:702
> > > #1  0x00005608fa0cce65 in dp_netdev_process_rxq_port (
> > >     pmd=pmd at entry=0x7ff1bbb4f010, rxq=0x5608fb651be0, port_no=1)
> > >     at lib/dpif-netdev.c:3279
> > > #2  0x00005608fa0cd296 in pmd_thread_main (f_=<optimized out>)
> > >     at lib/dpif-netdev.c:4145
> > > #3  0x00005608fa14a836 in ovsthread_wrapper (aux_=<optimized out>)
> > >     at lib/ovs-thread.c:348
> > > #4  0x00007ff1c52517fc in start_thread (arg=0x7ff1bbb4e700)
> > >     at pthread_create.c:465
> > > #5  0x00007ff1c4815b5f in clone ()
> > >     at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> > >
> > > ------------------br3's container---------------
> > >
> > > [Thread debugging using libthread_db enabled] Using host libthread_db
> > > library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> > > Core was generated by `ovs-vswitchd
> > > unix:/usr/local/var/run/openvswitch/db.sock -vconsole:emer -vsyslo'.
> > > Program terminated with signal SIGSEGV, Segmentation fault.
> > > #0  0x000055c517e3abcb in rte_mempool_free_memchunks () [Current
> > > thread is 1 (Thread 0x7f202351f300 (LWP 647))]
> > > (gdb) bt
> > > #0  0x000055c517e3abcb in rte_mempool_free_memchunks ()
> > > #1  0x000055c517e3ad46 in rte_mempool_free.part ()
> > > #2  0x000055c518218b78 in dpdk_mp_free (mp=0x7f603fe66a00)
> > >     at lib/netdev-dpdk.c:599
> > > #3  0x000055c518218ff0 in dpdk_mp_free (mp=<optimized out>)
> > >     at lib/netdev-dpdk.c:593
> > > #4  netdev_dpdk_mempool_configure (dev=0x7f1f7ffeac00) at
> > > lib/netdev-dpdk.c:629
> > > #5  0x000055c51821a98d in dpdk_vhost_reconfigure_helper
> > (dev=0x7f1f7ffeac00)
> > >     at lib/netdev-dpdk.c:3599
> > > #6  0x000055c51821ac8b in netdev_dpdk_vhost_reconfigure
> > (netdev=0x7f1f7ffebcc0)
> > >     at lib/netdev-dpdk.c:3624
> > > #7  0x000055c51813fe6b in port_reconfigure (port=0x55c51a4522a0)
> > >     at lib/dpif-netdev.c:3341
> > > #8  reconfigure_datapath (dp=dp at entry=0x55c51a46efc0) at
> > > lib/dpif-netdev.c:3822
> > > #9  0x000055c5181403e8 in do_add_port (dp=dp at entry=0x55c51a46efc0,
> > >     devname=devname at entry=0x55c51a456520 "SNK",
> > >     type=0x55c51834f7bd "dpdkvhostuser", port_no=port_no at entry=1)
> > >     at lib/dpif-netdev.c:1584
> > > #10 0x000055c51814059b in dpif_netdev_port_add (dpif=<optimized out>,
> > >     netdev=0x7f1f7ffebcc0, port_nop=0x7fffb4eef68c) at
> > > lib/dpif-netdev.c:1610
> > > #11 0x000055c5181469be in dpif_port_add (dpif=0x55c51a469350,
> > >     netdev=netdev at entry=0x7f1f7ffebcc0,
> > port_nop=port_nop at entry=0x7fffb4eef6ec)
> > >     at lib/dpif.c:579
> > > ---Type <return> to continue, or q <return> to quit---
> > > #12 0x000055c5180f9f28 in port_add (ofproto_=0x55c51a464ee0,
> > > netdev=0x7f1f7ffebcc0) at ofproto/ofproto-dpif.c:3645
> > > #13 0x000055c5180ecafe in ofproto_port_add (ofproto=0x55c51a464ee0,
> > > netdev=0x7f1f7ffebcc0, ofp_portp=ofp_portp at entry=0x7fffb4eef7e8) at
> > > ofproto/ofproto.c:1999
> > > #14 0x000055c5180d97e6 in iface_do_create (errp=0x7fffb4eef7f8,
> > > netdevp=0x7fffb4eef7f0, ofp_portp=0x7fffb4eef7e8,
> > > iface_cfg=0x55c51a46d590, br=0x55c51a4415b0)
> > >     at vswitchd/bridge.c:1799
> > > #15 iface_create (port_cfg=0x55c51a46e210, iface_cfg=0x55c51a46d590,
> > > br=0x55c51a4415b0) at vswitchd/bridge.c:1837
> > > #16 bridge_add_ports__ (br=br at entry=0x55c51a4415b0,
> > > wanted_ports=wanted_ports at entry=0x55c51a441690,
> > > with_requested_port=with_requested_port at entry=true) at
> > > vswitchd/bridge.c:931
> > > #17 0x000055c5180db87a in bridge_add_ports
> > > (wanted_ports=0x55c51a441690, br=0x55c51a4415b0) at
> > > vswitchd/bridge.c:942
> > > #18 bridge_reconfigure (ovs_cfg=ovs_cfg at entry=0x55c51a46ea80) at
> > > vswitchd/bridge.c:661
> > > #19 0x000055c5180df989 in bridge_run () at vswitchd/bridge.c:3016
> > > #20 0x000055c517dbc535 in main (argc=<optimized out>, argv=<optimized
> > > out>) at vswitchd/ovs-vswitchd.c:120
> > >
> > > Note that /dev/hugepages of the host is shared with all containers. I
> > > have a feeling that br3 is overwriting the hugepage file of br1. Any
> > > ideas?
>
> How many/much huge page memory are you allocating to the system and how much do you allocate when launching OVS DPDK in each container?
>
> Can you confirm with "cat /proc/meminfo"?
>
> Ian
> >
> > It does look like some kind of bad pointer, since
> > rx->netdev->netdev_class->rxq_recv shouldn't segfault.  Is there a way
> > you can rerun with Valgrind or Address Sanitizer?
> > _______________________________________________
> > discuss mailing list
> > discuss at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


More information about the discuss mailing list