[ovs-discuss] ovs crash when running traffic from VM to VM over DPDK and vhostuser
yby.developer at yahoo.com
Mon May 2 17:46:35 UTC 2016
DPDK dev indicated the crash was resolved in 16.04, and indeed when runningwith dpdk 16.04 and latest ovs from git, and removing "mrg_rxbuf=off" from qemu's "-device virtio-net-pci", the crash is no longer observed. However, we are wittnessing ovs gets stuck:2016-05-02T17:26:18.804Z|00111|ovs_rcu|WARN|blocked 1000 ms waiting for pmd145 to quiesce
2016-05-02T17:26:19.805Z|00112|ovs_rcu|WARN|blocked 2001 ms waiting for pmd145 to quiesce
2016-05-02T17:26:21.804Z|00113|ovs_rcu|WARN|blocked 4000 ms waiting for pmd145 to quiesce
2016-05-02T17:26:25.805Z|00114|ovs_rcu|WARN|blocked 8001 ms waiting for pmd145 to quiesce
2016-05-02T17:26:33.805Z|00115|ovs_rcu|WARN|blocked 16001 ms waiting for pmd145 to quiesce
2016-05-02T17:26:49.805Z|00116|ovs_rcu|WARN|blocked 32001 ms waiting for pmd145 to quiesce
2016-05-02T17:27:14.354Z|00072|ovs_rcu(vhost_thread2)|WARN|blocked 128000 ms waiting for pmd145 to quiesce
2016-05-02T17:27:15.841Z|00008|ovs_rcu(urcu3)|WARN|blocked 128001 ms waiting for pmd145 to quiesce
2016-05-02T17:27:21.805Z|00117|ovs_rcu|WARN|blocked 64000 ms waiting for pmd145 to quiesce
2016-05-02T17:28:25.804Z|00118|ovs_rcu|WARN|blocked 128000 ms waiting for pmd145 to quiesce
We observed the same ovs-stuck behavior in ovs 2.5.0 release (with mrg_rxbuf=off), and we though this issue was resolved in latest ovs 2.5.0 branch, but when removing the "mrg_rxbuf=off" we started observing ovs stuck again.
Are you familiar with this issue?
On Tuesday, 5 April 2016 10:26 AM, "Loftus, Ciara" <ciara.loftus at intel.com> wrote:
Since the segmentation fault is occurring in the DPDK vhost code, it might be a good idea to post this information to the dev at dpdk.org mailing list where you might be able to get more feedback on the root cause.
> • What you did that make the problem appear.
> We have an openstack kilo setup. it has 3 controllers and 3 computes. 1 of
> the controllers runs an ODL, which manages the OVS on each compute host.
> The compute hosts are running an hlinux OS, which is HPE's Debian8-based
> each host has 2 numa nodes, each with 12 cores (24 Hyper Threaded). each
> numa with 64GB.
> We patched neutron to create vhostuser ports (which is not available in
> stable kilo), in order to work with dpdk in order to achieve highest
> throughput possible.
> OVS was running with "-c 4" and pmd-core-mask 0x38. all these cores were
> nova was configured with vcpu_pin_set=6-11, and the flavor had 6 vCPUs.
> flavor had 16 1GB huge pages, backed up by real 1GB huge pages in host.
> Then running a traffic generator inside 2 VMs, using DPDK, in order to
> generate traffic. sending directly to the other VMs mac and IP.
> • What you expected to happen.
> We expected traffic to flow.
> • What actually happened.
> OVS crashed (in dpdk code). Attached BT.
> • The Open vSwitch version number (as output by ovs-vswitchd --version)
> root at BASE-CCP-CPN-N0001-NETCLM:~# ovs-vswitchd --version
> ovs-vswitchd (Open vSwitch) 2.5.0
> Compiled Apr 4 2016 08:51:09
> • Any local patches or changes you have applied (if any).
> applied ce179f1163f947fe8dc5afa35a2cdd0756bb53a0
> The following are also handy sometimes:
> • The kernel version on which Open vSwitch is running (from /proc/version)
> and the distribution and version number of your OS (e.g. "Centos 5.0").
> root at BASE-CCP-CPN-N0001-NETCLM:~# cat /proc/version
> Linux version 3.14.48-1-amd64-hlinux (pbuilder at build) (gcc version 4.9.2
> (Debian 4.9.2-10) ) #hlinux1 SMP Thu Aug 6 16:02:22 UTC 2015
> • If you have Open vSwitch configured to connect to an OpenFlow controller,
> the output of ovs-ofctl show <bridge> for each <bridge> configured in the
> vswitchd configuration database.
> We are using odl. attached outputs.
> • A fix or workaround, if you have one
> We disabled mrg_rxbuf (mrg_rxbuf=off) in qemu
> We can supply more info if necessary, like our exact build process etc.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the discuss