[ovs-discuss] OVS+DPDK issue with mellanox ConnectX3-Pro cards

Aaron Conole aconole at redhat.com
Mon Jul 11 20:50:48 UTC 2016


Hi John,

Can you try with the latest master branch openvswitch and confirm
whether this issue is present?  The way DPDK is initialized is changed
with releases after 2.5 (currently, nothing released yet).  It would be
good to confirm whether your issue is present or not present.

Thanks,
Aaron

John Phillips <john.phillips5 at hpe.com> writes:

> Hi. I am testing a build of openvswitch with DPDK that we package for
> our debian linux distribution called 'openvswitch-switch-dpdk' which
> is the normal debian package with the ovs-vswitchd used within the
> debian alternatives system (<- not too important). We are trying to
> support the intel niantic and mellanox ConnectX3-Pro. We have seen no
> issues with the niantic, however with the Mellanox card, the
> ovs-vswitchd daemon fails if started in it's init script (the standard
> init script in debian/ directory) to add the DPDK ports, I get this:
>
> 4f412dee-e2e5-42e5-be7e-dbee94c42652
>     Bridge "br0"
>         Port "br0"
>             Interface "br0"
>                 type: internal
>         Port "dpdk0"
>             Interface "dpdk0"
>                 type: dpdk
>                 error: "could not open network device dpdk0 (Cannot
> allocate memory)"
>         Port "dpdk1"
>             Interface "dpdk1"
>                 type: dpdk
>                 error: "could not open network device dpdk1 (Cannot
> allocate memory)"
>     ovs_version: "2.5.1"
>
>
>
> There wasn't anything particularly enlightening in the syslog:
>
> 2016-07-11T19:28:38.783Z|00015|dpdk|INFO|Interface dpdk1 txq(0) setup
> error: Cannot allocate memory
> 2016-07-11T19:28:38.783Z|00016|dpdk|ERR|Interface dpdk1(rxq:1 txq:1)
> configure error: Cannot allocate memory
> 2016-07-11T19:28:38.783Z|00017|bridge|WARN|could not open network
> device dpdk1 (Cannot allocate memory)
> 2016-07-11T19:28:38.784Z|00018|bridge|INFO|bridge br0: added interface
> br0 on port 65534
> 2016-07-11T19:28:38.795Z|00019|dpdk|INFO|Interface dpdk0 txq(0) setup
> error: Cannot allocate memory
> 2016-07-11T19:28:38.795Z|00020|dpdk|ERR|Interface dpdk0(rxq:1 txq:1)
> configure error: Cannot allocate memory
> 2016-07-11T19:28:38.795Z|00021|bridge|WARN|could not open network
> device dpdk0 (Cannot allocate memory)
> 2016-07-11T19:28:38.795Z|00022|bridge|INFO|bridge br0: using datapath
> ID 000036b6cbb99b41
> 2016-07-11T19:28:38.795Z|00023|connmgr|INFO|br0: added service
> controller "punix:/var/run/openvswitch/br0.mgmt"
> 2016-07-11T19:28:38.888Z|00024|dpdk|INFO|Interface dpdk1 txq(0) setup
> error: Cannot allocate memory
> 2016-07-11T19:28:38.888Z|00025|dpdk|ERR|Interface dpdk1(rxq:1 txq:1)
> configure error: Cannot allocate memory
> 2016-07-11T19:28:38.888Z|00026|bridge|WARN|could not open network
> device dpdk1 (Cannot allocate memory)
> 2016-07-11T19:28:38.899Z|00027|dpdk|INFO|Interface dpdk0 txq(0) setup
> error: Cannot allocate memory
> 2016-07-11T19:28:38.899Z|00028|dpdk|ERR|Interface dpdk0(rxq:1 txq:1)
> configure error: Cannot allocate memory
> 2016-07-11T19:28:38.899Z|00029|bridge|WARN|could not open network
> device dpdk0 (Cannot allocate memory)
> 2016-07-11T19:28:38.902Z|00030|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.5.1
> 2016-07-11T19:28:43.767Z|00031|memory|INFO|247496 kB peak resident set
> size after 10.2 seconds
> 2016-07-11T19:28:43.767Z|00032|memory|INFO|handlers:17 ports:1
> revalidators:7 rules:5
>
> This error doesn't occur with the same versions of ovs/dpdk compiled
> and run as in INSTALL.DPDK.md. However as I will explain later there
> is a difference between the way you run it when testing according to
> INSTALL.DPDK.md and doing distribution-type testing.
>
> Since this does not occur with niantic I looked for mellanox log
> errors (I compiled the PMD with the DBG option):
>
> # journalctl --full | grep -i mlx
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx_compat: module verification
> failed: signature and/or required key missing - tainting kernel
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: Mellanox ConnectX core
> driver v3.3-1.0.0 (31 May 2016)
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: Initializing 0000:09:00.0
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: device is working in
> RoCE mode: Roce V1
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: gid_type 1 for UD QPs
> is not supported by the devicegid_type 0 was chosen instead
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core: UD QP Gid type is: V1
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link
> speed is 8.0GT/s, device supports 8.0GT/s
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link
> width is x8, device supports x8
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: Mellanox ConnectX HCA
> Ethernet driver v3.3-1.0.0 (31 May 2016)
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:1
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
> Using 256 TX rings
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
> Using 16 RX rings
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
> frag:0 - size:1522 prefix:0 stride:1536
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
> Initializing port
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: registered
> PHC clock
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:2
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
> Using 256 TX rings
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
> Using 16 RX rings
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
> frag:0 - size:1522 prefix:0 stride:1536
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
> Initializing port
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth5:
> renamed from eth3
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth4:
> renamed from eth2
> Jul 11 13:27:28 bl460gen9-04 logger[930]: openibd: start(): Detected
> 'mlx4_core' loaded with 'log_num_mgm_entry_size=-10' instead of
> 'log_num_mgm_entry_size=-7' as configured in '', calling stop...
> Jul 11 13:27:28 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: removed PHC
> Jul 11 13:27:31 bl460gen9-04 kernel: mlx4_core: Mellanox ConnectX core
> driver v3.3-1.0.0 (31 May 2016)
> Jul 11 13:27:31 bl460gen9-04 kernel: mlx4_core: Initializing 0000:09:00.0
> Jul 11 13:27:36 bl460gen9-04 kernel: mlx4_core: device is working in
> RoCE mode: Roce V1
> Jul 11 13:27:36 bl460gen9-04 kernel: mlx4_core: gid_type 1 for UD QPs
> is not supported by the devicegid_type 0 was chosen instead
> Jul 11 13:27:36 bl460gen9-04 kernel: mlx4_core: UD QP Gid type is: V1
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link
> speed is 8.0GT/s, device supports 8.0GT/s
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0: PCIe link
> width is x8, device supports x8
> Jul 11 13:27:37 bl460gen9-04 kernel: <mlx4_ib> mlx4_ib_add: mlx4_ib:
> Mellanox ConnectX InfiniBand driver v3.3-1.0.0 (31 May 2016)
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0:
> mlx4_ib_add: allocated counter index 1 for port 1
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0:
> mlx4_ib_add: allocated counter index 3 for port 2
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: Mellanox ConnectX HCA
> Ethernet driver v3.3-1.0.0 (31 May 2016)
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:1
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
> Using 256 TX rings
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
> Using 16 RX rings
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
> frag:0 - size:1522 prefix:0 stride:1536
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 1:
> Initializing port
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: registered
> PHC clock
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_en 0000:09:00.0: Activating port:2
> Jul 11 13:27:37 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth4:
> renamed from eth0
> Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
> Using 256 TX rings
> Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
> Using 16 RX rings
> Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
> frag:0 - size:1522 prefix:0 stride:1536
> Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: 0000:09:00.0: Port 2:
> Initializing port
> Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_core 0000:09:00.0 eth5:
> renamed from eth0
> Jul 11 13:27:38 bl460gen9-04 kernel: mlx4_en: eth4: Link Up
> Jul 11 13:27:59 bl460gen9-04 logger[1527]: openibd: Set node_desc for
> mlx4_0: bl460gen9-04 HCA-1
> Jul 11 13:28:38 bl460gen9-04 ovs-vswitchd[2004]: EAL: probe driver:
> 15b3:1007 librte_pmd_mlx4
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]: EAL:   probe
> driver: 15b3:1007 librte_pmd_mlx4
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5430: mlx4_pci_devinit():
> using driver device index 0
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5452: mlx4_pci_devinit():
> checking device "mlx4_0"
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5463: mlx4_pci_devinit():
> PCI information matches, using device "mlx4_0" (VF: false)
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5483: mlx4_pci_devinit():
> device opened
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5486: mlx4_pci_devinit(): 2
> port(s) detected
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5508: mlx4_pci_devinit():
> using port 1 (00000001)
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5530: mlx4_pci_devinit():
> port 1 is not active: "down" (1)
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5583: mlx4_pci_devinit():
> device flags: IBV_DEVICE_QPG IBV_DEVICE_RSS
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5586: mlx4_pci_devinit():
> maximum RSS indirection table size: 256
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5595: mlx4_pci_devinit():
> checksum offloading is supported
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5600: mlx4_pci_devinit(): L2
> tunnel checksum offloads are supported
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5641: mlx4_pci_devinit():
> port 1 MAC address is 24:be:05:c0:d2:a0
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5655: mlx4_pci_devinit():
> port 1 ifname is "eth4"
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5662: mlx4_pci_devinit():
> port 1 MTU is 1500
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5721: mlx4_pci_devinit():
> forcing Ethernet interface up
> Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth4: frag:0 - size:1522
> prefix:0 stride:1536
> Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth4: Setting RSS
> context tunnel type to RSS on inner headers
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5508: mlx4_pci_devinit():
> using port 2 (00000002)
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5530: mlx4_pci_devinit():
> port 2 is not active: "down" (1)
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5583: mlx4_pci_devinit():
> device flags: IBV_DEVICE_QPG IBV_DEVICE_RSS
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5586: mlx4_pci_devinit():
> maximum RSS indirection table size: 256
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5595: mlx4_pci_devinit():
> checksum offloading is supported
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5600: mlx4_pci_devinit(): L2
> tunnel checksum offloads are supported
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5641: mlx4_pci_devinit():
> port 2 MAC address is 24:be:05:c0:d2:a8
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5655: mlx4_pci_devinit():
> port 2 ifname is "eth5"
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5662: mlx4_pci_devinit():
> port 2 MTU is 1500
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:5721: mlx4_pci_devinit():
> forcing Ethernet interface up
> Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth5: frag:0 - size:1522
> prefix:0 stride:1536
> Jul 11 13:28:38 bl460gen9-04 kernel: mlx4_en: eth5: Setting RSS
> context tunnel type to RSS on inner headers
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:732: dev_configure():
> 0x840248: TX queues number update: 0 -> 1
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:747: dev_configure():
> 0x840248: RX queues number update: 0 -> 1
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup():
> 0x840248: configuring queue 0 for 2048 descriptors
> Jul 11 13:28:38 bl460gen9-04 kernel: Modules linked in: tun
> openvswitch nf_defrag_ipv6 nf_conntrack libcrc32c crc32c_generic nfsd
> auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc rdma_ucm(OE)
> ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE)
> ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) inet_lro mlx4_ib(OE) ib_sa(OE)
> mlx4_en(OE) ib_mad(OE) ptp ib_core(OE) ib_addr(OE) ib_netlink(OE)
> pps_core mlx4_core(OE) mlx_compat(OE) x86_pkg_temp_thermal
> intel_powerclamp coretemp kvm_intel vfat fat kvm iTCO_wdt irqbypass
> iTCO_vendor_support crc32_pclmul hmac drbg ansi_cprng aesni_intel
> aes_x86_64 lrw gf128mul glue_helper ablk_helper mgag200 cryptd ttm
> pcspkr drm_kms_helper evdev drm sb_edac i2c_algo_bit edac_core
> fb_sys_fops syscopyarea sysfillrect sysimgblt lpc_ich i2c_core
> mfd_core hpilo hpwdt ioatdma dca wmi ipmi_si ipmi_msghandler
> Jul 11 13:28:38 bl460gen9-04 kernel:  pcc_cpufreq acpi_cpufreq
> processor acpi_power_meter button knem(OE) autofs4 ext4 crc16 mbcache
> jbd2 usb_storage hid_generic usbhid hid sd_mod sg crc32c_intel
> xhci_pci hpsa uhci_hcd ehci_pci xhci_hcd ehci_hcd scsi_transport_sas
> scsi_mod usbcore be2net usb_common [last unloaded: mlx_compat]
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x840248:
> CQ creation failure: Cannot allocate memory
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup():
> cleaning up 0x7ffc112649e0
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts():
> 0x7ffc112649e0: freeing WRs
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:732: dev_configure():
> 0x83c200: TX queues number update: 0 -> 1
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:747: dev_configure():
> 0x83c200: RX queues number update: 0 -> 1
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup():
> 0x83c200: configuring queue 0 for 2048 descriptors
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x83c200:
> CQ creation failure: Cannot allocate memory
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup():
> cleaning up 0x7ffc112649e0
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts():
> 0x7ffc112649e0: freeing WRs
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup():
> 0x840248: configuring queue 0 for 2048 descriptors
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x840248:
> CQ creation failure: Cannot allocate memory
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup():
> cleaning up 0x7ffc112649e0
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts():
> 0x7ffc112649e0: freeing WRs
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1992: mlx4_tx_queue_setup():
> 0x83c200: configuring queue 0 for 2048 descriptors
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1838: txq_setup(): 0x83c200:
> CQ creation failure: Cannot allocate memory
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1103: txq_cleanup():
> cleaning up 0x7ffc112649e0
> Jul 11 13:28:38 bl460gen9-04 openvswitch-switch[1682]:
> /build/dpdk-16.04/drivers/net/mlx4/mlx4.c:1057: txq_free_elts():
> 0x7ffc112649e0: freeing WRs
>
>
> This is after rebooting a system. The kicker is if I launch ovs-vsctl
> manually from the shell without --detach:
>
> # ovs-vswitchd --dpdk -c 0x3 -- unix:/var/run/openvswitch/db.sock
> -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-  
> file=/var/log/openvswitch/ovs-vswitchd.log
> --pidfile=/var/run/openvswitch/ovs-vswitchd.pid
>
> I get no errors - This is the exact same binary, and the command line
> is copied from `ps -ef | grep ovs-vswitchd` after a failed run,
> without the '--monitor --detach' options. I have a happy bridge at
> least in the sense that there aren't any errors given by ovs-vsctl and
> nothing bad in the logs, as in no 'error:' field in ovs-vsctl show and
> no error from
>
> # ovs-vsctl add-port br0 dpdkN -- set interface dpdkN type=dpdk
> # ovs-vsctl show
> 4f412dee-e2e5-42e5-be7e-dbee94c42652
>     Bridge "br0"
>         Port "br0"
>             Interface "br0"
>                 type: internal
>         Port "dpdk0"
>             Interface "dpdk0"
>                 type: dpdk
>         Port "dpdk1"
>             Interface "dpdk1"
>                 type: dpdk
>     ovs_version: "2.5.1"
>
> So I looked into vswitchd/ovs-vswitchd.c and thought that perhaps the
> issue had to do with daemonizing after rte_eal_init() possibly killing
> child threads spawned by rte_eal_init (?) and made the following
> patch:
> Index: openvswitch/vswitchd/ovs-vswitchd.c
> ===================================================================
> --- openvswitch.orig/vswitchd/ovs-vswitchd.c
> +++ openvswitch/vswitchd/ovs-vswitchd.c
> @@ -58,6 +58,16 @@ static bool want_mlockall;
>
>  static unixctl_cb_func ovs_vswitchd_exit;
>
> +#define DPDK_OPTS_SIZ 2048
> +/*
> + * variables/function for saving DPDK options off of the command line,
> + * to run dpdk_init _after_ daemonize is called.
> + */
> +char *dpdk_argv[DPDK_OPTS_SIZ];
> +int dpdk_argc;
> +static int save_dpdk_opts(int argc, char **argv);
> +
> +
>  static char *parse_options(int argc, char *argv[], char **unixctl_path);
>  OVS_NO_RETURN static void usage(void);
>
> @@ -71,7 +81,8 @@ main(int argc, char *argv[])
>      int retval;
>
>      set_program_name(argv[0]);
> -    retval = dpdk_init(argc,argv);
> +
> +    retval = save_dpdk_opts(argc, argv);
>      if (retval < 0) {
>          return retval;
>      }
> @@ -97,6 +108,12 @@ main(int argc, char *argv[])
>  #endif
>      }
>
> +    retval = dpdk_init(dpdk_argc, dpdk_argv);
> +    if (retval < 0) {
> +        return retval;
> +    }
> +
> +
>      retval = unixctl_server_create(unixctl_path, &unixctl);
>      if (retval) {
>          exit(EXIT_FAILURE);
> @@ -140,6 +157,38 @@ main(int argc, char *argv[])
>      return 0;
>  }
>
> +
> +static int
> +save_dpdk_opts(int argc, char *argv[])
> +{
> +    int i=0;
> +
> +    memset(dpdk_argv, 0, DPDK_OPTS_SIZ*sizeof(char *));
> +    dpdk_argc=0;
> +
> +    if (strcmp(argv[1], "--dpdk"))
> +        return 0;
> +
> +    dpdk_argv[0] = argv[0];
> +    dpdk_argc++;
> +
> +    for(i=1; i < argc; i++) {
> +        if (!strcmp(argv[i], "--")) {
> +        break;
> +        }
> +        dpdk_argv[i] = argv[i];
> +    dpdk_argc++;
> +    }
> +
> +    if (i < 2) {
> +      return -1;
> +    }
> +
> +    argv[i] = argv[0];
> +
> +    return i;
> +}
> +
>  static char *
>  parse_options(int argc, char *argv[], char **unixctl_pathp)
>  {
>
>
> And it miraculously caused the error to go away e.g. the ports stay
> after reboots whereas normally if I launch ovs-vswitchd without
> --detach, get a good bridge with dpdk{0,1} ports and reboot I get that
> above error state again. I have no idea why this might occur. The dpdk
> apps all seem to work fine with the mellanox card albiet with a very
> noticeable lag as they add the ports in (subjective) comparison with
> the niantic card. Other than the fact that the patch works I can't
> find any better evidence to substantiate my hypothesis that
> daemonizing after rte_eal_init is causing the problem and currently is
> just a best guess.
>
>
> Thanks,
>   John
>
>
>
>
>
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss



More information about the discuss mailing list