[ovs-discuss] Starting OVS+DPDK kills VM using vhostuserclient interface

Wittling, Mark (CCI-Atlanta) Mark.Wittling at cox.com
Wed May 20 13:35:37 UTC 2020


Update after doing another test:

This morning, I made a switch, and unloaded the vfio kernel modules and instead loaded the uio modules.

So we now look like this:
Network devices using DPDK-compatible driver
============================================
0000:01:00.0 '82571EB/82571GB Gigabit Ethernet Controller D0/D1 (copper applications) 105e' drv=uio_pci_generic unused=e1000e
0000:01:00.1 '82571EB/82571GB Gigabit Ethernet Controller D0/D1 (copper applications) 105e' drv=uio_pci_generic unused=e1000e

Network devices using kernel driver
===================================
0000:00:19.0 'Ethernet Connection I217-LM 153a' if=em1 drv=e1000e unused=uio_pci_generic
0000:03:00.0 '82571EB/82571GB Gigabit Ethernet Controller D0/D1 (copper applications) 105e' if=p1p1 drv=e1000e unused=uio_pci_generic
0000:03:00.1 '82571EB/82571GB Gigabit Ethernet Controller D0/D1 (copper applications) 105e' if=p1p2 drv=e1000e unused=uio_pci_generic

This did give me a subtle change in the OpenVSwitch log, as the dpdk|ERR|EAL: Error enabling MSI-X interrupts for fd 27 disappeared from the log.

But the phenomenon of the /tmp/sock0 socket error remains, and then OpenVSwitch is started, the VM exits return code 1.

2020-05-20T13:27:43.157Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2020-05-20T13:27:43.160Z|00002|ovs_numa|INFO|Discovered 4 CPU cores on NUMA node 0
2020-05-20T13:27:43.160Z|00003|ovs_numa|INFO|Discovered 1 NUMA nodes and 4 CPU cores
2020-05-20T13:27:43.160Z|00004|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2020-05-20T13:27:43.160Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2020-05-20T13:27:43.162Z|00006|dpdk|INFO|Using DPDK 17.11.10
2020-05-20T13:27:43.162Z|00007|dpdk|INFO|DPDK Enabled - initializing...
2020-05-20T13:27:43.162Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2020-05-20T13:27:43.162Z|00009|dpdk|INFO|IOMMU support for vhost-user-client disabled.
2020-05-20T13:27:43.162Z|00010|dpdk|INFO|Per port memory for DPDK devices disabled.
2020-05-20T13:27:43.162Z|00011|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x2 --socket-mem 1024
2020-05-20T13:27:43.162Z|00012|dpdk|INFO|EAL: Detected 4 lcore(s)
2020-05-20T13:27:43.164Z|00013|dpdk|INFO|EAL: Probing VFIO support...
2020-05-20T13:27:43.561Z|00014|dpdk|INFO|EAL: PCI device 0000:00:19.0 on NUMA socket -1
2020-05-20T13:27:43.561Z|00015|dpdk|WARN|EAL:   Invalid NUMA socket, default to 0
2020-05-20T13:27:43.561Z|00016|dpdk|INFO|EAL:   probe driver: 8086:153a net_e1000_em
2020-05-20T13:27:43.561Z|00017|dpdk|INFO|EAL: PCI device 0000:01:00.0 on NUMA socket -1
2020-05-20T13:27:43.561Z|00018|dpdk|WARN|EAL:   Invalid NUMA socket, default to 0
2020-05-20T13:27:43.561Z|00019|dpdk|INFO|EAL:   probe driver: 8086:105e net_e1000_em
2020-05-20T13:27:43.814Z|00020|dpdk|INFO|EAL: PCI device 0000:01:00.1 on NUMA socket -1
2020-05-20T13:27:43.815Z|00021|dpdk|WARN|EAL:   Invalid NUMA socket, default to 0
2020-05-20T13:27:43.815Z|00022|dpdk|INFO|EAL:   probe driver: 8086:105e net_e1000_em
2020-05-20T13:27:43.962Z|00023|dpdk|INFO|EAL: PCI device 0000:03:00.0 on NUMA socket -1
2020-05-20T13:27:43.962Z|00024|dpdk|WARN|EAL:   Invalid NUMA socket, default to 0
2020-05-20T13:27:43.962Z|00025|dpdk|INFO|EAL:   probe driver: 8086:105e net_e1000_em
2020-05-20T13:27:43.962Z|00026|dpdk|INFO|EAL: PCI device 0000:03:00.1 on NUMA socket -1
2020-05-20T13:27:43.962Z|00027|dpdk|WARN|EAL:   Invalid NUMA socket, default to 0
2020-05-20T13:27:43.962Z|00028|dpdk|INFO|EAL:   probe driver: 8086:105e net_e1000_em
2020-05-20T13:27:43.962Z|00029|dpdk|INFO|DPDK pdump packet capture enabled
2020-05-20T13:27:43.965Z|00030|dpdk|INFO|DPDK Enabled - initialized
2020-05-20T13:27:43.966Z|00031|dpif_netlink|INFO|The kernel module does not support meters.
2020-05-20T13:27:43.969Z|00032|ofproto_dpif|INFO|system at ovs-system: Datapath supports recirculation
2020-05-20T13:27:43.969Z|00033|ofproto_dpif|INFO|system at ovs-system: VLAN header stack length probed as 2
2020-05-20T13:27:43.969Z|00034|ofproto_dpif|INFO|system at ovs-system: MPLS label stack length probed as 1
2020-05-20T13:27:43.969Z|00035|ofproto_dpif|INFO|system at ovs-system: Datapath supports truncate action
2020-05-20T13:27:43.969Z|00036|ofproto_dpif|INFO|system at ovs-system: Datapath supports unique flow ids
2020-05-20T13:27:43.969Z|00037|ofproto_dpif|INFO|system at ovs-system: Datapath does not support clone action
2020-05-20T13:27:43.970Z|00038|ofproto_dpif|INFO|system at ovs-system: Max sample nesting level probed as 10
2020-05-20T13:27:43.970Z|00039|ofproto_dpif|INFO|system at ovs-system: Datapath supports eventmask in conntrack action
2020-05-20T13:27:43.970Z|00040|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_clear action
2020-05-20T13:27:43.970Z|00041|ofproto_dpif|INFO|system at ovs-system: Max dp_hash algorithm probed to be 0
2020-05-20T13:27:43.970Z|00042|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_state
2020-05-20T13:27:43.970Z|00043|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_zone
2020-05-20T13:27:43.970Z|00044|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_mark
2020-05-20T13:27:43.970Z|00045|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_label
2020-05-20T13:27:43.970Z|00046|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_state_nat
2020-05-20T13:27:43.970Z|00047|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_orig_tuple
2020-05-20T13:27:43.970Z|00048|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_orig_tuple6
2020-05-20T13:27:43.972Z|00049|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2020-05-20T13:27:43.972Z|00050|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2020-05-20T13:27:43.972Z|00051|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2020-05-20T13:27:43.972Z|00052|dpif_netlink|INFO|The kernel module has a broken meter implementation.
2020-05-20T13:27:43.976Z|00053|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports recirculation
2020-05-20T13:27:43.976Z|00054|ofproto_dpif|INFO|netdev at ovs-netdev: VLAN header stack length probed as 1
2020-05-20T13:27:43.976Z|00055|ofproto_dpif|INFO|netdev at ovs-netdev: MPLS label stack length probed as 3
2020-05-20T13:27:43.976Z|00056|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports truncate action
2020-05-20T13:27:43.976Z|00057|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports unique flow ids
2020-05-20T13:27:43.976Z|00058|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports clone action
2020-05-20T13:27:43.976Z|00059|ofproto_dpif|INFO|netdev at ovs-netdev: Max sample nesting level probed as 10
2020-05-20T13:27:43.976Z|00060|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports eventmask in conntrack action
2020-05-20T13:27:43.976Z|00061|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_clear action
2020-05-20T13:27:43.976Z|00062|ofproto_dpif|INFO|netdev at ovs-netdev: Max dp_hash algorithm probed to be 1
2020-05-20T13:27:43.976Z|00063|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_state
2020-05-20T13:27:43.976Z|00064|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_zone
2020-05-20T13:27:43.976Z|00065|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_mark
2020-05-20T13:27:43.976Z|00066|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_label
2020-05-20T13:27:43.976Z|00067|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_state_nat
2020-05-20T13:27:43.976Z|00068|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_orig_tuple
2020-05-20T13:27:43.976Z|00069|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_orig_tuple6
2020-05-20T13:27:43.978Z|00070|bridge|INFO|bridge br-prv: added interface br-prv on port 65534
2020-05-20T13:27:43.978Z|00071|bridge|INFO|bridge br-ex: added interface br-ex on port 65534
2020-05-20T13:27:43.981Z|00072|dpif_netdev|INFO|PMD thread on numa_id: 0, core id:  1 created.
2020-05-20T13:27:43.981Z|00073|dpif_netdev|INFO|There are 1 pmd threads on numa node 0
2020-05-20T13:27:44.430Z|00074|netdev_dpdk|INFO|Port 0: 00:1b:21:c5:72:04
2020-05-20T13:27:44.430Z|00075|dpif_netdev|INFO|Core 1 on numa node 0 assigned port 'dpdk0' rx queue 0 (measured processing cycles 0).
2020-05-20T13:27:44.525Z|00076|bridge|INFO|bridge br-testdpdk: added interface dpdk0 on port 1
2020-05-20T13:27:44.527Z|00077|dpdk|INFO|VHOST_CONFIG: vhost-user client: socket created, fd: 62
2020-05-20T13:27:44.527Z|00078|netdev_dpdk|INFO|vHost User device 'vhost0' created in 'client' mode, using client socket '/tmp/sock0'
2020-05-20T13:27:44.528Z|00079|dpdk|INFO|VHOST_CONFIG: new device, handle is 0
2020-05-20T13:27:44.528Z|00080|dpif_netdev|INFO|Core 1 on numa node 0 assigned port 'dpdk0' rx queue 0 (measured processing cycles 0).
2020-05-20T13:27:44.528Z|00081|dpif_netdev|INFO|Core 1 on numa node 0 assigned port 'vhost0' rx queue 0 (measured processing cycles 0).
2020-05-20T13:27:44.528Z|00082|bridge|INFO|bridge br-testdpdk: added interface vhost0 on port 4
2020-05-20T13:27:44.528Z|00083|dpif_netdev|INFO|Core 1 on numa node 0 assigned port 'dpdk0' rx queue 0 (measured processing cycles 0).
2020-05-20T13:27:44.528Z|00084|dpif_netdev|INFO|Core 1 on numa node 0 assigned port 'vhost0' rx queue 0 (measured processing cycles 0).
2020-05-20T13:27:44.528Z|00085|bridge|INFO|bridge br-testdpdk: added interface br-testdpdk on port 65534
2020-05-20T13:27:44.532Z|00086|bridge|INFO|bridge br-prv: using datapath ID 0000febc0b21c447
2020-05-20T13:27:44.532Z|00087|connmgr|INFO|br-prv: added service controller "punix:/var/run/openvswitch/br-prv.mgmt"
2020-05-20T13:27:44.536Z|00088|bridge|INFO|bridge br-ex: using datapath ID 00003e2a0124144d
2020-05-20T13:27:44.536Z|00089|connmgr|INFO|br-ex: added service controller "punix:/var/run/openvswitch/br-ex.mgmt"
2020-05-20T13:27:44.536Z|00001|dpdk|ERR|VHOST_CONFIG: recvmsg failed
2020-05-20T13:27:44.536Z|00002|dpdk|INFO|VHOST_CONFIG: vhost peer closed
2020-05-20T13:27:44.536Z|00003|dpdk|INFO|VHOST_CONFIG: vhost-user client: socket created, fd: 62
2020-05-20T13:27:44.536Z|00004|dpdk|WARN|VHOST_CONFIG: failed to connect to /tmp/sock0: Connection refused
2020-05-20T13:27:44.536Z|00005|dpdk|INFO|VHOST_CONFIG: /tmp/sock0: reconnecting...
2020-05-20T13:27:44.543Z|00090|bridge|INFO|bridge br-testdpdk: using datapath ID 0000001b21c57204
2020-05-20T13:27:44.543Z|00091|connmgr|INFO|br-testdpdk: added service controller "punix:/var/run/openvswitch/br-testdpdk.mgmt"
2020-05-20T13:27:44.543Z|00092|timeval|WARN|Unreasonably long 1382ms poll interval (853ms user, 415ms system)
2020-05-20T13:27:44.543Z|00093|timeval|WARN|faults: 1338193 minor, 0 major
2020-05-20T13:27:44.543Z|00094|timeval|WARN|disk: 0 reads, 16 writes
2020-05-20T13:27:44.543Z|00095|timeval|WARN|context switches: 10 voluntary, 1050 involuntary
2020-05-20T13:27:44.543Z|00096|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=862cf660:
2020-05-20T13:27:44.543Z|00097|coverage|INFO|cmap_expand                0.0/sec     0.000/sec        0.0000/sec   total: 52
2020-05-20T13:27:44.543Z|00098|coverage|INFO|cmap_shrink                0.0/sec     0.000/sec        0.0000/sec   total: 24
2020-05-20T13:27:44.543Z|00099|coverage|INFO|dpif_port_add              0.0/sec     0.000/sec        0.0000/sec   total: 3
2020-05-20T13:27:44.543Z|00100|coverage|INFO|dpif_flow_flush            0.0/sec     0.000/sec        0.0000/sec   total: 5
2020-05-20T13:27:44.543Z|00101|coverage|INFO|dpif_flow_get              0.0/sec     0.000/sec        0.0000/sec   total: 42
2020-05-20T13:27:44.543Z|00102|coverage|INFO|dpif_flow_put              0.0/sec     0.000/sec        0.0000/sec   total: 45
2020-05-20T13:27:44.543Z|00103|coverage|INFO|dpif_flow_del              0.0/sec     0.000/sec        0.0000/sec   total: 42
2020-05-20T13:27:44.543Z|00104|coverage|INFO|dpif_execute               0.0/sec     0.000/sec        0.0000/sec   total: 8
2020-05-20T13:27:44.543Z|00105|coverage|INFO|flow_extract               0.0/sec     0.000/sec        0.0000/sec   total: 6
2020-05-20T13:27:44.543Z|00106|coverage|INFO|miniflow_malloc            0.0/sec     0.000/sec        0.0000/sec   total: 54
2020-05-20T13:27:44.543Z|00107|coverage|INFO|hmap_expand                0.0/sec     0.000/sec        0.0000/sec   total: 463
2020-05-20T13:27:44.543Z|00108|coverage|INFO|hmap_shrink                0.0/sec     0.000/sec        0.0000/sec   total: 2
2020-05-20T13:27:44.543Z|00109|coverage|INFO|netdev_get_stats           0.0/sec     0.000/sec        0.0000/sec   total: 5
2020-05-20T13:27:44.543Z|00110|coverage|INFO|poll_create_node           0.0/sec     0.000/sec        0.0000/sec   total: 37
2020-05-20T13:27:44.543Z|00111|coverage|INFO|poll_zero_timeout          0.0/sec     0.000/sec        0.0000/sec   total: 1
2020-05-20T13:27:44.543Z|00112|coverage|INFO|seq_change                 0.0/sec     0.000/sec        0.0000/sec   total: 1718
2020-05-20T13:27:44.543Z|00113|coverage|INFO|pstream_open               0.0/sec     0.000/sec        0.0000/sec   total: 7
2020-05-20T13:27:44.543Z|00114|coverage|INFO|stream_open                0.0/sec     0.000/sec        0.0000/sec   total: 1
2020-05-20T13:27:44.543Z|00115|coverage|INFO|util_xalloc                0.0/sec     0.000/sec        0.0000/sec   total: 10285
2020-05-20T13:27:44.543Z|00116|coverage|INFO|netdev_set_policing        0.0/sec     0.000/sec        0.0000/sec   total: 3
2020-05-20T13:27:44.543Z|00117|coverage|INFO|netdev_get_ethtool         0.0/sec     0.000/sec        0.0000/sec   total: 2
2020-05-20T13:27:44.543Z|00118|coverage|INFO|netlink_received           0.0/sec     0.000/sec        0.0000/sec   total: 164
2020-05-20T13:27:44.543Z|00119|coverage|INFO|netlink_recv_jumbo         0.0/sec     0.000/sec        0.0000/sec   total: 34
2020-05-20T13:27:44.543Z|00120|coverage|INFO|netlink_sent               0.0/sec     0.000/sec        0.0000/sec   total: 204
2020-05-20T13:27:44.543Z|00121|coverage|INFO|ofproto_flush              0.0/sec     0.000/sec        0.0000/sec   total: 3
2020-05-20T13:27:44.543Z|00122|coverage|INFO|ofproto_update_port        0.0/sec     0.000/sec        0.0000/sec   total: 5
2020-05-20T13:27:44.543Z|00123|coverage|INFO|rev_flow_table             0.0/sec     0.000/sec        0.0000/sec   total: 1
2020-05-20T13:27:44.543Z|00124|coverage|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0000/sec   total: 1
2020-05-20T13:27:44.543Z|00125|coverage|INFO|78 events never hit
2020-05-20T13:27:44.595Z|00126|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.10.2
2020-05-20T13:27:53.473Z|00127|memory|INFO|143616 kB peak resident set size after 10.3 seconds
2020-05-20T13:27:53.473Z|00128|memory|INFO|handlers:2 ports:5 revalidators:2 rules:15


Regards,
Mark Wittling
NFV Cloud Operations
Cox Communications Inc
CTECH A08-150D
6305-A Peachtree Dunwoody Road, Atlanta GA 30328
1-770-849-9696

From: Wittling, Mark (CCI-Atlanta)
Sent: Tuesday, May 19, 2020 3:13 PM
To: ovs-discuss at openvswitch.org
Subject: Starting OVS+DPDK kills VM using vhostuserclient interface

Earlier, I had posted for some guidance in building and deploying rpms for DPDK and OVS.
The versions are (I'm not married to these versions):

  *   DPDK 17.11.10
  *   OVS 2.10.2

Thanks to some help from Guru Shetty, I was able to tweak the spec files and get some rpms built and installed.

What I am trying to do now, is to do some VM testing with vhostuserclient interfaces. It appears I am having an issue with this, and I was wondering if there's anyone who can take a look and give me some feedback. I will highlight in green what seems to be working, and highlight in red/yellow some things that I could use some guidance on with my troubleshooting.

Two PCI devices with vfio-pci drivers overriding the e1000e drivers.

[root at maschinen dpdk]# dpdk-devbind --status

Network devices using DPDK-compatible driver
============================================
0000:01:00.0 '82571EB/82571GB Gigabit Ethernet Controller D0/D1 (copper applications) 105e' drv=vfio-pci unused=e1000e
0000:01:00.1 '82571EB/82571GB Gigabit Ethernet Controller D0/D1 (copper applications) 105e' drv=vfio-pci unused=e1000e

Network devices using kernel driver
===================================
0000:00:19.0 'Ethernet Connection I217-LM 153a' if=em1 drv=e1000e unused=vfio-pci
0000:03:00.0 '82571EB/82571GB Gigabit Ethernet Controller D0/D1 (copper applications) 105e' if=p1p1 drv=e1000e unused=vfio-pci
0000:03:00.1 '82571EB/82571GB Gigabit Ethernet Controller D0/D1 (copper applications) 105e' if=p1p2 drv=e1000e unused=vfio-pci

I have added a vhostuserclient interface to the bridge with the DPDK NIC. And that looks like it works okay.

[root at testbox x86_64]# ovs-vsctl show
2d46de50-e5b8-47be-84b4-a7e85ce29526
    Bridge br-testdpdk
        Port br-testdpdk
            Interface br-testdpdk
                type: internal
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                options: {dpdk-devargs="0000:01:00.0"}
        Port "vhost0"
            Interface "vhost0"
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/sock0"}
    Bridge br-prv
        Port br-prv
            Interface br-prv
                type: internal
    Bridge br-ex
        Port br-ex
            Interface br-ex
                type: internal
    ovs_version: "2.10.2"

A script (init-vm.sh) tries to spin up a VM

  *   we are using vhostuserclient mode, so the VM should bind to /tmp/sock0 as server, while OVS should connect as a client.

/usr/libexec/qemu-kvm -cpu host -enable-kvm -m 4096M -object memory-backend-file,id=mem,size=4096M,mem-path=/mnt/hugepages,share=on -numa node,memdev=mem -mem-prealloc  -drive file=/opt/images/${VMNAME} -chardev socket,id=char0,path=/tmp/sock0,server -netdev type=vhost-user,id=default,chardev=char0,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=vhost0,mrg_rxbuf=off -nographic


  *   The "id=default" - I am pretty sure that is supposed to be my virtual network that has been created. I am using the network called default.
  *   [root at testbox src]# virsh net-list
  *   Name                 State      Autostart     Persistent
  *   ----------------------------------------------------------
  *   default              active     yes           yes
  *   routed-22            active     yes           yes

So, if the VM is the server, it makes sense to start that up first so it can bind to the socket.

[root at maschinen src]# ./init-vm.sh
qemu-kvm: -chardev socket,id=char0,path=/tmp/sock0,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/sock0,server
Looks good - it appears to be listening. Nothing printed in syslog.

Let's start OVS now..."systemctl start openvswitch" (OpenVSwitch is configured to run as root)

After starting OVS, the VM we were running stops, and when I check the system return code, it is an 1, indicating it abended.
[root at testbox src]# ./init-vm.sh
qemu-kvm: -chardev socket,id=char0,path=/tmp/sock0,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/sock0,server
qemu-kvm: -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=vhost0,mrg_rxbuf=off: drive with bus=0, unit=0 (index=0) exists
[root at maschinen src]# echo $?
1

So, not knowing why the VM launch failed, I look in syslog for some more information. I see this:
May 19 14:38:40 maschinen ovs-vswitchd[5044]: ovs|00078|dpdk|ERR|EAL: Error enabling MSI-X interrupts for fd 27
May 19 14:38:40 maschinen ovs-vswitchd[5044]: ovs|00001|dpdk|ERR|VHOST_CONFIG: recvmsg failed
May 19 14:38:40 maschinen ovs-ctl: Starting ovs-vswitchd [  OK  ]
May 19 14:38:40 maschinen ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait set Open_vSwitch . external-ids:hostname=maschinen
May 19 14:38:40 maschinen ovs-ctl: Enabling remote OVSDB managers [  OK  ]
May 19 14:38:40 maschinen systemd: Started Open vSwitch Forwarding Unit.
May 19 14:38:40 maschinen systemd: Starting Open vSwitch...
May 19 14:38:40 maschinen systemd: Started Open vSwitch.

That's not a lot of information. So we take a look at the OVS log. I see a couple things here that COULD (or seem to be) a root cause:

2020-05-19T18:38:39.521Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log
2020-05-19T18:38:39.525Z|00002|ovs_numa|INFO|Discovered 4 CPU cores on NUMA node 0
2020-05-19T18:38:39.525Z|00003|ovs_numa|INFO|Discovered 1 NUMA nodes and 4 CPU cores
2020-05-19T18:38:39.525Z|00004|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting...
2020-05-19T18:38:39.525Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2020-05-19T18:38:39.527Z|00006|dpdk|INFO|Using DPDK 17.11.10
2020-05-19T18:38:39.527Z|00007|dpdk|INFO|DPDK Enabled - initializing...
2020-05-19T18:38:39.527Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch
2020-05-19T18:38:39.527Z|00009|dpdk|INFO|IOMMU support for vhost-user-client disabled.
2020-05-19T18:38:39.527Z|00010|dpdk|INFO|Per port memory for DPDK devices disabled.
2020-05-19T18:38:39.527Z|00011|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x2 --socket-mem 1024
2020-05-19T18:38:39.527Z|00012|dpdk|INFO|EAL: Detected 4 lcore(s)
2020-05-19T18:38:39.529Z|00013|dpdk|INFO|EAL: Probing VFIO support...
2020-05-19T18:38:39.529Z|00014|dpdk|INFO|EAL: VFIO support initialized
2020-05-19T18:38:39.934Z|00015|dpdk|INFO|EAL: PCI device 0000:00:19.0 on NUMA socket -1
2020-05-19T18:38:39.934Z|00016|dpdk|WARN|EAL:   Invalid NUMA socket, default to 0
2020-05-19T18:38:39.934Z|00017|dpdk|INFO|EAL:   probe driver: 8086:153a net_e1000_em
2020-05-19T18:38:39.934Z|00018|dpdk|INFO|EAL: PCI device 0000:01:00.0 on NUMA socket -1
2020-05-19T18:38:39.934Z|00019|dpdk|WARN|EAL:   Invalid NUMA socket, default to 0
2020-05-19T18:38:39.934Z|00020|dpdk|INFO|EAL:   probe driver: 8086:105e net_e1000_em
2020-05-19T18:38:39.934Z|00021|dpdk|INFO|EAL:   using IOMMU type 1 (Type 1)
2020-05-19T18:38:39.978Z|00022|dpdk|INFO|EAL: Ignore mapping IO port bar(2)
2020-05-19T18:38:40.148Z|00023|dpdk|INFO|EAL: PCI device 0000:01:00.1 on NUMA socket -1
2020-05-19T18:38:40.148Z|00024|dpdk|WARN|EAL:   Invalid NUMA socket, default to 0
2020-05-19T18:38:40.148Z|00025|dpdk|INFO|EAL:   probe driver: 8086:105e net_e1000_em
2020-05-19T18:38:40.170Z|00026|dpdk|INFO|EAL: Ignore mapping IO port bar(2)
2020-05-19T18:38:40.340Z|00027|dpdk|INFO|EAL: PCI device 0000:03:00.0 on NUMA socket -1
2020-05-19T18:38:40.340Z|00028|dpdk|WARN|EAL:   Invalid NUMA socket, default to 0
2020-05-19T18:38:40.340Z|00029|dpdk|INFO|EAL:   probe driver: 8086:105e net_e1000_em
2020-05-19T18:38:40.340Z|00030|dpdk|INFO|EAL: PCI device 0000:03:00.1 on NUMA socket -1
2020-05-19T18:38:40.340Z|00031|dpdk|WARN|EAL:   Invalid NUMA socket, default to 0
2020-05-19T18:38:40.340Z|00032|dpdk|INFO|EAL:   probe driver: 8086:105e net_e1000_em
2020-05-19T18:38:40.340Z|00033|dpdk|INFO|DPDK pdump packet capture enabled
2020-05-19T18:38:40.342Z|00034|dpdk|INFO|DPDK Enabled - initialized
2020-05-19T18:38:40.343Z|00035|dpif_netlink|INFO|The kernel module does not support meters.
2020-05-19T18:38:40.346Z|00036|ofproto_dpif|INFO|system at ovs-system: Datapath supports recirculation
2020-05-19T18:38:40.347Z|00037|ofproto_dpif|INFO|system at ovs-system: VLAN header stack length probed as 2
2020-05-19T18:38:40.347Z|00038|ofproto_dpif|INFO|system at ovs-system: MPLS label stack length probed as 1
2020-05-19T18:38:40.347Z|00039|ofproto_dpif|INFO|system at ovs-system: Datapath supports truncate action
2020-05-19T18:38:40.347Z|00040|ofproto_dpif|INFO|system at ovs-system: Datapath supports unique flow ids
2020-05-19T18:38:40.347Z|00041|ofproto_dpif|INFO|system at ovs-system: Datapath does not support clone action
2020-05-19T18:38:40.347Z|00042|ofproto_dpif|INFO|system at ovs-system: Max sample nesting level probed as 10
2020-05-19T18:38:40.347Z|00043|ofproto_dpif|INFO|system at ovs-system: Datapath supports eventmask in conntrack action
2020-05-19T18:38:40.347Z|00044|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_clear action
2020-05-19T18:38:40.347Z|00045|ofproto_dpif|INFO|system at ovs-system: Max dp_hash algorithm probed to be 0
2020-05-19T18:38:40.347Z|00046|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_state
2020-05-19T18:38:40.347Z|00047|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_zone
2020-05-19T18:38:40.347Z|00048|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_mark
2020-05-19T18:38:40.347Z|00049|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_label
2020-05-19T18:38:40.347Z|00050|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_state_nat
2020-05-19T18:38:40.347Z|00051|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_orig_tuple
2020-05-19T18:38:40.347Z|00052|ofproto_dpif|INFO|system at ovs-system: Datapath supports ct_orig_tuple6
2020-05-19T18:38:40.349Z|00053|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2020-05-19T18:38:40.349Z|00054|dpif_netlink|INFO|dpif_netlink_meter_transact OVS_METER_CMD_SET failed
2020-05-19T18:38:40.349Z|00055|dpif_netlink|INFO|dpif_netlink_meter_transact get failed
2020-05-19T18:38:40.349Z|00056|dpif_netlink|INFO|The kernel module has a broken meter implementation.
2020-05-19T18:38:40.353Z|00057|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports recirculation
2020-05-19T18:38:40.353Z|00058|ofproto_dpif|INFO|netdev at ovs-netdev: VLAN header stack length probed as 1
2020-05-19T18:38:40.353Z|00059|ofproto_dpif|INFO|netdev at ovs-netdev: MPLS label stack length probed as 3
2020-05-19T18:38:40.353Z|00060|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports truncate action
2020-05-19T18:38:40.353Z|00061|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports unique flow ids
2020-05-19T18:38:40.353Z|00062|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports clone action
2020-05-19T18:38:40.353Z|00063|ofproto_dpif|INFO|netdev at ovs-netdev: Max sample nesting level probed as 10
2020-05-19T18:38:40.353Z|00064|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports eventmask in conntrack action
2020-05-19T18:38:40.353Z|00065|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_clear action
2020-05-19T18:38:40.353Z|00066|ofproto_dpif|INFO|netdev at ovs-netdev: Max dp_hash algorithm probed to be 1
2020-05-19T18:38:40.353Z|00067|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_state
2020-05-19T18:38:40.353Z|00068|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_zone
2020-05-19T18:38:40.353Z|00069|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_mark
2020-05-19T18:38:40.353Z|00070|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_label
2020-05-19T18:38:40.353Z|00071|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_state_nat
2020-05-19T18:38:40.353Z|00072|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_orig_tuple
2020-05-19T18:38:40.353Z|00073|ofproto_dpif|INFO|netdev at ovs-netdev: Datapath supports ct_orig_tuple6
2020-05-19T18:38:40.356Z|00074|bridge|INFO|bridge br-prv: added interface br-prv on port 65534
2020-05-19T18:38:40.356Z|00075|bridge|INFO|bridge br-ex: added interface br-ex on port 65534
2020-05-19T18:38:40.358Z|00076|dpif_netdev|INFO|PMD thread on numa_id: 0, core id:  1 created.
2020-05-19T18:38:40.358Z|00077|dpif_netdev|INFO|There are 1 pmd threads on numa node 0
2020-05-19T18:38:40.808Z|00078|dpdk|ERR|EAL: Error enabling MSI-X interrupts for fd 27
2020-05-19T18:38:40.808Z|00079|netdev_dpdk|INFO|Port 0: 00:1b:21:c5:72:04
2020-05-19T18:38:40.808Z|00080|dpif_netdev|INFO|Core 1 on numa node 0 assigned port 'dpdk0' rx queue 0 (measured processing cycles 0).
2020-05-19T18:38:40.903Z|00081|bridge|INFO|bridge br-testdpdk: added interface dpdk0 on port 1
2020-05-19T18:38:40.905Z|00082|dpdk|INFO|VHOST_CONFIG: vhost-user client: socket created, fd: 66
2020-05-19T18:38:40.905Z|00083|netdev_dpdk|INFO|vHost User device 'vhost0' created in 'client' mode, using client socket '/tmp/sock0'
2020-05-19T18:38:40.906Z|00084|dpdk|INFO|VHOST_CONFIG: new device, handle is 0
2020-05-19T18:38:40.906Z|00085|dpif_netdev|INFO|Core 1 on numa node 0 assigned port 'dpdk0' rx queue 0 (measured processing cycles 0).
2020-05-19T18:38:40.906Z|00086|dpif_netdev|INFO|Core 1 on numa node 0 assigned port 'vhost0' rx queue 0 (measured processing cycles 0).
2020-05-19T18:38:40.906Z|00087|bridge|INFO|bridge br-testdpdk: added interface vhost0 on port 4
2020-05-19T18:38:40.906Z|00088|dpif_netdev|INFO|Core 1 on numa node 0 assigned port 'dpdk0' rx queue 0 (measured processing cycles 0).
2020-05-19T18:38:40.906Z|00089|dpif_netdev|INFO|Core 1 on numa node 0 assigned port 'vhost0' rx queue 0 (measured processing cycles 0).
2020-05-19T18:38:40.906Z|00090|bridge|INFO|bridge br-testdpdk: added interface br-testdpdk on port 65534
2020-05-19T18:38:40.906Z|00091|bridge|INFO|bridge br-prv: using datapath ID 0000febc0b21c447
2020-05-19T18:38:40.906Z|00092|connmgr|INFO|br-prv: added service controller "punix:/var/run/openvswitch/br-prv.mgmt"
2020-05-19T18:38:40.910Z|00093|bridge|INFO|bridge br-ex: using datapath ID 00003e2a0124144d
2020-05-19T18:38:40.910Z|00094|connmgr|INFO|br-ex: added service controller "punix:/var/run/openvswitch/br-ex.mgmt"
2020-05-19T18:38:40.910Z|00001|dpdk|ERR|VHOST_CONFIG: recvmsg failed
2020-05-19T18:38:40.910Z|00002|dpdk|INFO|VHOST_CONFIG: vhost peer closed
2020-05-19T18:38:40.910Z|00003|dpdk|INFO|VHOST_CONFIG: vhost-user client: socket created, fd: 66
2020-05-19T18:38:40.910Z|00004|dpdk|WARN|VHOST_CONFIG: failed to connect to /tmp/sock0: Connection refused
2020-05-19T18:38:40.910Z|00005|dpdk|INFO|VHOST_CONFIG: /tmp/sock0: reconnecting...
2020-05-19T18:38:40.917Z|00095|bridge|INFO|bridge br-testdpdk: using datapath ID 0000001b21c57204
2020-05-19T18:38:40.917Z|00096|connmgr|INFO|br-testdpdk: added service controller "punix:/var/run/openvswitch/br-testdpdk.mgmt"
2020-05-19T18:38:40.917Z|00097|timeval|WARN|Unreasonably long 1391ms poll interval (747ms user, 448ms system)
2020-05-19T18:38:40.917Z|00098|timeval|WARN|faults: 1340271 minor, 0 major
2020-05-19T18:38:40.917Z|00099|timeval|WARN|disk: 0 reads, 16 writes
2020-05-19T18:38:40.917Z|00100|timeval|WARN|context switches: 17 voluntary, 1049 involuntary
2020-05-19T18:38:40.917Z|00101|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=862cf660:
2020-05-19T18:38:40.917Z|00102|coverage|INFO|cmap_expand                0.0/sec     0.000/sec        0.0000/sec   total: 52
2020-05-19T18:38:40.917Z|00103|coverage|INFO|cmap_shrink                0.0/sec     0.000/sec        0.0000/sec   total: 24
2020-05-19T18:38:40.917Z|00104|coverage|INFO|dpif_port_add              0.0/sec     0.000/sec        0.0000/sec   total: 3
2020-05-19T18:38:40.917Z|00105|coverage|INFO|dpif_flow_flush            0.0/sec     0.000/sec        0.0000/sec   total: 5
2020-05-19T18:38:40.917Z|00106|coverage|INFO|dpif_flow_get              0.0/sec     0.000/sec        0.0000/sec   total: 42
2020-05-19T18:38:40.917Z|00107|coverage|INFO|dpif_flow_put              0.0/sec     0.000/sec        0.0000/sec   total: 45
2020-05-19T18:38:40.917Z|00108|coverage|INFO|dpif_flow_del              0.0/sec     0.000/sec        0.0000/sec   total: 42
2020-05-19T18:38:40.917Z|00109|coverage|INFO|dpif_execute               0.0/sec     0.000/sec        0.0000/sec   total: 8
2020-05-19T18:38:40.917Z|00110|coverage|INFO|flow_extract               0.0/sec     0.000/sec        0.0000/sec   total: 6
2020-05-19T18:38:40.917Z|00111|coverage|INFO|miniflow_malloc            0.0/sec     0.000/sec        0.0000/sec   total: 54
2020-05-19T18:38:40.917Z|00112|coverage|INFO|hmap_expand                0.0/sec     0.000/sec        0.0000/sec   total: 463
2020-05-19T18:38:40.917Z|00113|coverage|INFO|hmap_shrink                0.0/sec     0.000/sec        0.0000/sec   total: 2
2020-05-19T18:38:40.917Z|00114|coverage|INFO|netdev_get_stats           0.0/sec     0.000/sec        0.0000/sec   total: 5
2020-05-19T18:38:40.917Z|00115|coverage|INFO|poll_create_node           0.0/sec     0.000/sec        0.0000/sec   total: 37
2020-05-19T18:38:40.917Z|00116|coverage|INFO|poll_zero_timeout          0.0/sec     0.000/sec        0.0000/sec   total: 1
2020-05-19T18:38:40.917Z|00117|coverage|INFO|seq_change                 0.0/sec     0.000/sec        0.0000/sec   total: 1718
2020-05-19T18:38:40.917Z|00118|coverage|INFO|pstream_open               0.0/sec     0.000/sec        0.0000/sec   total: 7
2020-05-19T18:38:40.917Z|00119|coverage|INFO|stream_open                0.0/sec     0.000/sec        0.0000/sec   total: 1
2020-05-19T18:38:40.917Z|00120|coverage|INFO|util_xalloc                0.0/sec     0.000/sec        0.0000/sec   total: 10295
2020-05-19T18:38:40.917Z|00121|coverage|INFO|netdev_set_policing        0.0/sec     0.000/sec        0.0000/sec   total: 3
2020-05-19T18:38:40.917Z|00122|coverage|INFO|netdev_get_ethtool         0.0/sec     0.000/sec        0.0000/sec   total: 2
2020-05-19T18:38:40.917Z|00123|coverage|INFO|netlink_received           0.0/sec     0.000/sec        0.0000/sec   total: 164
2020-05-19T18:38:40.917Z|00124|coverage|INFO|netlink_recv_jumbo         0.0/sec     0.000/sec        0.0000/sec   total: 34
2020-05-19T18:38:40.917Z|00125|coverage|INFO|netlink_sent               0.0/sec     0.000/sec        0.0000/sec   total: 204
2020-05-19T18:38:40.917Z|00126|coverage|INFO|ofproto_flush              0.0/sec     0.000/sec        0.0000/sec   total: 3
2020-05-19T18:38:40.917Z|00127|coverage|INFO|ofproto_update_port        0.0/sec     0.000/sec        0.0000/sec   total: 5
2020-05-19T18:38:40.917Z|00128|coverage|INFO|rev_flow_table             0.0/sec     0.000/sec        0.0000/sec   total: 1
2020-05-19T18:38:40.917Z|00129|coverage|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0000/sec   total: 1
2020-05-19T18:38:40.917Z|00130|coverage|INFO|78 events never hit
2020-05-19T18:38:40.920Z|00131|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.10.2
2020-05-19T18:38:49.850Z|00132|memory|INFO|151824 kB peak resident set size after 10.3 seconds
2020-05-19T18:38:49.850Z|00133|memory|INFO|handlers:2 ports:5 revalidators:2 rules:15

Is the problem here, the error related to enabling MSI-X Interrupts?

Or is this some kind of socket error?
The "socket created, fd66" I presume that is a client connection socket. Because OVS should not be the server if we are running vhostuserclient mode.

I presume the Connection Refused is because the VM has died when OVS comes up.

Any help on this would be appreciated!

Mark

Regards,
Mark Wittling
NFV Cloud Operations
Cox Communications Inc
CTECH A08-150D
6305-A Peachtree Dunwoody Road, Atlanta GA 30328
1-770-849-9696

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200520/b1218757/attachment-0001.html>


More information about the discuss mailing list