[ovs-discuss] segmentation fault when adding a VF in DPDK to a switch

Stokes, Ian ian.stokes at intel.com
Thu Mar 8 12:11:24 UTC 2018


Hi Riccardo,

So the issue you are seeing is not related to the VSI queue setup as previously mentioned. The issue you see is specific to the use of the i350 VF with OVS it seems.

Specifically, OVS attempts to set the MTU of a given device (in this case the SRIOV VF from an i350 interface known as igbvf) with the call

diag = rte_eth_dev_set_mtu(dev->port_id, dev->mtu);

Currently igbvf devices do not support rte_eth_dev_set_mtu. This is confirmed upon examination of the igbvf operations in DPDK (note mtu_set = 0x0 below)

$1 = {dev_configure = 0x5326e1 <igbvf_dev_configure>, dev_start = 0x532772 <igbvf_dev_start>, dev_stop = 0x532980 <igbvf_dev_stop>, dev_set_link_up = 0x0, dev_set_link_down = 0x0, dev_close = 0x532a34 <igbvf_dev_close>,  dev_reset = 0x0, link_update = 0x530bf1 <eth_igb_link_update>, promiscuous_enable = 0x532abe <igbvf_promiscuous_enable>, promiscuous_disable = 0x532aee <igbvf_promiscuous_disable>,
  allmulticast_enable = 0x532b47 <igbvf_allmulticast_enable>, allmulticast_disable = 0x532b8d <igbvf_allmulticast_disable>, mac_addr_remove = 0x0, mac_addr_add = 0x0, mac_addr_set = 0x532e4d <igbvf_default_mac_addr_set>,  set_mc_addr_list = 0x535ca2 <eth_igb_set_mc_addr_list>, mtu_set = 0x0, stats_get = 0x5303e5 <eth_igbvf_stats_get>, stats_reset = 0x530488 <eth_igbvf_stats_reset>, xstats_get = 0x53030f <eth_igbvf_xstats_get>,  xstats_reset = 0x530488 <eth_igbvf_stats_reset>, xstats_get_names = 0x530299 <eth_igbvf_xstats_get_names>, queue_stats_mapping_set = 0x0, dev_infos_get = 0x5309cf <eth_igbvf_infos_get>, rxq_info_get = 0x53f4b2 <igb_rxq_info_get>,  txq_info_get = 0x53f53f <igb_txq_info_get>, fw_version_get = 0x0, dev_supported_ptypes_get = 0x53099b <eth_igb_supported_ptypes_get>, vlan_filter_set = 0x532d4b <igbvf_vlan_filter_set>, vlan_tpid_set = 0x0,  vlan_strip_queue_set = 0x0, vlan_offload_set = 0x0, vlan_pvid_set = 0x0, rx_queue_start = 0x0, rx_queue_stop = 0x0, tx_queue_start = 0x0, tx_queue_stop = 0x0, rx_queue_setup = 0x53c42c <eth_igb_rx_queue_setup>,  rx_queue_release = 0x53c39f <eth_igb_rx_queue_release>, rx_queue_count = 0x0, rx_descriptor_done = 0x0, rx_descriptor_status = 0x0, tx_descriptor_status = 0x0, rx_queue_intr_enable = 0x0, rx_queue_intr_disable = 0x0,  tx_queue_setup = 0x53bbb1 <eth_igb_tx_queue_setup>, tx_queue_release = 0x53b42c <eth_igb_tx_queue_release>, tx_done_cleanup = 0x0, dev_led_on = 0x0, dev_led_off = 0x0, flow_ctrl_get = 0x0, flow_ctrl_set = 0x0,  priority_flow_ctrl_set = 0x0, uc_hash_table_set = 0x0, uc_all_hash_table_set = 0x0, mirror_rule_set = 0x0, mirror_rule_reset = 0x0, udp_tunnel_port_add = 0x0, udp_tunnel_port_del = 0x0, l2_tunnel_eth_type_conf = 0x0,  l2_tunnel_offload_set = 0x0, set_queue_rate_limit = 0x0, rss_hash_update = 0x0, rss_hash_conf_get = 0x0, reta_update = 0x0, reta_query = 0x0, get_reg = 0x536a60 <igbvf_get_regs>, get_eeprom_length = 0x0, get_eeprom = 0x0,  set_eeprom = 0x0, filter_ctrl = 0x0, get_dcb_info = 0x0, timesync_enable = 0x0, timesync_disable = 0x0, timesync_read_rx_timestamp = 0x0, timesync_read_tx_timestamp = 0x0, timesync_adjust_time = 0x0, timesync_read_time = 0x0,  timesync_write_time = 0x0, xstats_get_by_id = 0x0, xstats_get_names_by_id = 0x0, tm_ops_get = 0x0, mtr_ops_get = 0x0, pool_ops_supported = 0x0}

I confirmed this by also testing an i40evf device and in that case the mtu_set function is supported so I didn’t see the error/segfault you reported.

Support for rte_eth_dev_set_mtu was introduced in commit 67fe6d635193761439f791e48652acfd60076cfb. The purpose was to set the mtu for the physical device so that call to rte_eth_dev_get_mtu() would correctly report the MTU set for the port. The best solution here would be for support to be introduced in DPDK for rte_eth_dev_set_mtu() for igbvf functions. However this will have to be in a future release and means you will be blocked until then.

For testing purposes you could re-introduce the previous method of setting the mtu for the device (Note this is compile tested only):

+
+    /*
+     * Need to enable hw_strip_crc specifically for SRIOV devices.
+     */
+    conf.rxmode.hw_strip_crc = 1;
+
     /* A device may report more queues than it makes available (this has
      * been observed for Intel xl710, which reserves some of them for
      * SRIOV):  rte_eth_*_queue_setup will fail if a queue is not
@@ -718,11 +724,25 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
         }

         diag = rte_eth_dev_set_mtu(dev->port_id, dev->mtu);
-        if (diag) {
+        if (diag && (diag != -ENOTSUP)) {
             VLOG_ERR("Interface %s MTU (%d) setup error: %s",
                     dev->up.name, dev->mtu, rte_strerror(-diag));
             break;
         }
+        else {
+            /* Some devices do not support rte_eth_dev_set_mtu e.g. igbvf.
+             * If the operation is not supported attempt to set MTU manually
+             * in the devices configuration.
+             */
+            if (dev->mtu > ETHER_MTU) {
+                conf.rxmode.jumbo_frame = 1;
+                conf.rxmode.max_rx_pkt_len = dev->max_packet_len;
+            } else {
+                conf.rxmode.jumbo_frame = 0;
+                conf.rxmode.max_rx_pkt_len = 0;
+            }
+            diag = 0;
+        }

This is not a pretty approach but might enable you to test further with the igbvf function in the meantime.

As regards the segfault occurring, in this case it was due to the error code returned when rte_eth_dev_set_mtu is not supported. The error code returned by DPDK is –ENOTSUP (95 in this case). This value is returned to flag a failure in the OVS device setup during reconfiguration, however the error value is checked at the dipf_netdev layer in the port_reconfigure() function as below

if (err && (err != EOPNOTSUPP)) {
        if (err) {
            VLOG_ERR("Failed to set interface %s new configuration",
                     netdev_get_name(netdev));
            return err;
        }

Note err in this case will be –ENOTSUP. EOPNOTSUPP is the same value as –ENOTSUP. The check above is intended to check if a netdev device supports a reconfigure() function or if an error occurs during reconfiguration. In this case the error value returned during reconfiguration is the same as if the reconfigure function does not exist and is ignored. This leads to the port being polled at a later stage resulting in a segfault. You could remove the (err != EOPNOTSUPP) check but would introduce error messages in the case where a reconfigure method is not supported. With the changes for setting the mtu outlined above however this should be avoided as the error returned is set to 0 when the set mtu is not supported.

I’ll have to look a little bit closer as regards the next steps here. There is definitely some work needed around error reporting during queue setup in OVS DPDK as well as some work for DPDK itself to enable this case.

Hope this helps.

Regards
Ian


From: scaricaposta at gmail.com [mailto:scaricaposta at gmail.com] On Behalf Of Riccardo Ravaioli
Sent: Wednesday, March 7, 2018 3:20 PM
To: Stokes, Ian <ian.stokes at intel.com>
Cc: ovs-discuss at openvswitch.org
Subject: Re: [ovs-discuss] segmentation fault when adding a VF in DPDK to a switch

Hi Ian,
Thanks a lot for your patch. I applied your modifications and ran again the setup described in the original post. While the CRC-related error message has disappeared, openvswitch still crashes (with no gdb running!):

# tail ovs-vswitchd.log
2018-03-07T14:58:53.311Z|00215|dpdk|INFO|EAL: PCI device 0000:05:10.1 on NUMA socket 0
2018-03-07T14:58:53.311Z|00216|dpdk|INFO|EAL:   probe driver: 8086:1520 net_e1000_igb_vf
2018-03-07T14:58:53.518Z|00217|dpdk|INFO|PMD: eth_igbvf_dev_init():     VF MAC address not assigned by Host PF
2018-03-07T14:58:53.518Z|00218|dpdk|INFO|PMD: eth_igbvf_dev_init():     Assign randomly generated MAC address fe:00:a5:78:49:2a
2018-03-07T14:58:53.518Z|00219|netdev_dpdk|INFO|Device '0000:05:10.1' attached to DPDK
2018-03-07T14:58:53.518Z|00220|netdev_dpdk|ERR|Interface 2.switch1 MTU (1500) setup error: Operation not supported
2018-03-07T14:58:53.518Z|00221|netdev_dpdk|ERR|Interface 2.switch1(rxq:1 txq:1) configure error: Operation not supported
2018-03-07T14:58:53.884Z|00002|daemon_unix(monitor)|ERR|1 crashes: pid 4171 died, killed (Segmentation fault), core dumped, restarting

# lspci -s 0000:05:10.1 -v
05:10.1 Ethernet controller: Intel Corporation I350 Ethernet Controller Virtual Function (rev 01)
    Flags: bus master, fast devsel, latency 0
    [virtual] Memory at fbea0000 (64-bit, prefetchable) [size=16K]
    [virtual] Memory at fbe80000 (64-bit, prefetchable) [size=16K]
    Capabilities: [70] MSI-X: Enable- Count=3 Masked-
    Capabilities: [a0] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
    Capabilities: [1a0] Transaction Processing Hints
    Capabilities: [1d0] Access Control Services
    Kernel driver in use: vfio-pci

This is indeed the second issue you mentioned in a previous post. Can we get past this with a workaround?
Thanks again!
Riccardo


On 7 March 2018 at 14:41, Stokes, Ian <ian.stokes at intel.com<mailto:ian.stokes at intel.com>> wrote:
Hi Ricardo,

After some more time to look at the issue you could do something like below to enable crc for the interface (Note I haven’t fully validated this).

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index af9843a..28d7d1e 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -700,6 +700,12 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)

     conf.rxmode.hw_ip_checksum = (dev->hw_ol_features &
                                   NETDEV_RX_CHECKSUM_OFFLOAD) != 0;
+
+    /*
+     * Need to enable hw_strip_crc specifically for SRIOV devices.
+     */
+    conf.rxmode.hw_strip_crc = 1;
+

On my system this at least got past the configuration error when adding the SRIOV VF port and I was able to pass traffic through the port in a simple VF to phy port setup . As I’ve only completed minor validation on this maybe you could give it a shot and see if it works on your setup.

With regards to the VSI queue error I mentioned in previous posts, with some more investigation I found this only occurred when running the setup of SRIOV VFs in DPDK with GDB, I was able to reproduce the same issue in the DPDK l2fwd sample app so it is not specific to OVS. Once you are not running OVS with GDB during the SRIOV setup it should be OK. I’ll need to look at this in a little bit more detail when I have time but for the moment it shouldn’t block you.

Hope this helps,

Regards
Ian



From: ovs-discuss-bounces at openvswitch.org<mailto:ovs-discuss-bounces at openvswitch.org> [mailto:ovs-discuss-bounces at openvswitch.org<mailto:ovs-discuss-bounces at openvswitch.org>] On Behalf Of Stokes, Ian
Sent: Thursday, February 1, 2018 11:52 AM
To: riccardoravaioli at gmail.com<mailto:riccardoravaioli at gmail.com>

Cc: ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>
Subject: Re: [ovs-discuss] segmentation fault when adding a VF in DPDK to a switch

Hi Ricardo,

Apologies for the delay. Unfortunately with the OVS 2.9 release I haven’t had much time to look at this further.

At the very least I think work needs to be done for dpdk.c and netdev-dpdk.c to enable configuration of VFs specifically (to account for the HW_CRC and VSI queue configurations).

There would also be a task to ensure the work required for enabling a VF on the i40e driver would also cover enabling a VF for the ixgbe driver. In DPDK it’s been the case in the past that driver implementations for different NIC devices can differ.

This could be looked at in the OVS 2.10 development cycle at some point. I can post an update here when there is progress.

Thanks
Ian

From: scaricaposta at gmail.com<mailto:scaricaposta at gmail.com> [mailto:scaricaposta at gmail.com] On Behalf Of Riccardo Ravaioli
Sent: Thursday, January 25, 2018 4:35 PM
To: Stokes, Ian <ian.stokes at intel.com<mailto:ian.stokes at intel.com>>
Cc: ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>
Subject: Re: [ovs-discuss] segmentation fault when adding a VF in DPDK to a switch

Hi Ian,
Thanks for looking into the issue. Anything new?
Thanks a lot!
Riccardo

On 11 January 2018 at 23:50, Stokes, Ian <ian.stokes at intel.com<mailto:ian.stokes at intel.com>> wrote:
Hi Ricardo,

That’s for reporting the issue and providing the steps to reproduce.

I was able to reproduce this with an i40e VF using igb_uio.

In short it seems there is no support currently for ixgbe and i40e VF devices in OVS with DPDK.

There are 2  issues at play here. First is the configuration error when creating and starting the VF in DPDK, the second issue is the Segfault in OVS.

The configuration of the VF fails (For the i40e device at least) because of the expectation in DPDK that the HW_CRC stripping flag is enabled in the device configuration for VFs. In your logs you will see an error reporting this. By default this seems to be disabled for VFs in OVS.

Looking in the DPDK code this is confirmed by the following in i40evf_dev_configure()  which code execution hits

   │1568            /* For non-DPDK PF drivers, VF has no ability to disable HW
   │1569             * CRC strip, and is implicitly enabled by the PF.
   │1570             */
   │1571            if (!conf->rxmode.hw_strip_crc) {
   │1572                    vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
   │1573                    if ((vf->version_major == VIRTCHNL_VERSION_MAJOR) &&
   │1574                        (vf->version_minor <= VIRTCHNL_VERSION_MINOR)) {
   │1575                            /* Peer is running non-DPDK PF driver. */
   │1576                            PMD_INIT_LOG(ERR, "VF can't disable HW CRC Strip");
   │1577                            return -EINVAL;
   │1578                    }

Out of interest I enabled HW_CRC in the configuration for the device manually in the ovs code for testing purposes. Although this allows the queue configuration to succeed the VF will later fail to start due to an issue with VSI queue mapping when DPDK attempts to start the device. I’ll have to take another look to see what exactly is going wrong here, I suspect there is more configuration needed for VFs than PFs.

The segmentation fault happens due to the error occurring during the dpdk_eth_dev_queue_setup() function, this is a separate issue and unrelated to VFs. I have seen failures in this area cause segmentation faults before in OVS so it’s an area that needs to be looked at again to handle DPDK errors properly IMO.

I hope this answers your question and I’ll follow up once I have a little more info on how to enable the VF functionality.

Thanks
Ian



From: ovs-discuss-bounces at openvswitch.org<mailto:ovs-discuss-bounces at openvswitch.org> [mailto:ovs-discuss-bounces at openvswitch.org<mailto:ovs-discuss-bounces at openvswitch.org>] On Behalf Of Riccardo Ravaioli
Sent: Thursday, January 11, 2018 10:27 AM
To: ovs-discuss at openvswitch.org<mailto:ovs-discuss at openvswitch.org>
Subject: Re: [ovs-discuss] segmentation fault when adding a VF in DPDK to a switch

Here are the steps to reproduce the issue:
1. Create one Virtual Function (VF) on a physical interface that supports SR-IOV (in my case it's an Intel i350 interface):
$ echo 1 > /sys/class/net/eth10/device/sriov_numvfs
2. Lookup its PCI address, for example with dpdk-devbind.py:
$ dpdk-devbind.py --status-dev net
0000:05:10.3 'I350 Ethernet Controller Virtual Function 1520' if=eth11 drv=igbvf unused=igb_uio,vfio-pci,uio_pci_generic
3. Bind the VF to a DPDK-compatible driver. I'll use vfio-pci, but igb_uio too will reproduce the issue:
$ dpdk-devbind.py --bind=vfio-pci 0000:05:10.3
4. Create an OVS bridge and set its datapath type to netdev:
$ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev
5. Add the VF to the bridge as a DPDK interface:
$ ovs-vsctl add-port br0 dpdk-p0 -- set Interface dpdk-p0 type=dpdk options:dpdk-devargs=0000:05:10.3
6. Now ovs-vswitchd.log reports that OVS repeatedly crashes (segmentation fault) and restarts itself, in a loop:
2018-01-11T09:28:28.338Z|00139|dpdk|INFO|EAL: PCI device 0000:05:10.3 on NUMA socket 0
2018-01-11T09:28:28.338Z|00140|dpdk|INFO|EAL:   probe driver: 8086:1520 net_e1000_igb_vf
2018-01-11T09:28:28.338Z|00141|dpdk|INFO|EAL:   using IOMMU type 1 (Type 1)
2018-01-11T09:28:28.560Z|00142|dpdk|INFO|PMD: eth_igbvf_dev_init():     VF MAC address not assigned by Host PF
2018-01-11T09:28:28.560Z|00143|dpdk|INFO|PMD: eth_igbvf_dev_init():     Assign randomly generated MAC address c6:13:67:7b:31:6b
2018-01-11T09:28:28.560Z|00144|netdev_dpdk|INFO|Device '0000:05:10.3' attached to DPDK
2018-01-11T09:28:28.563Z|00145|dpif_netdev|INFO|PMD thread on numa_id: 0, core id:  3 created.
2018-01-11T09:28:28.566Z|00146|dpif_netdev|INFO|PMD thread on numa_id: 0, core id:  2 created.
2018-01-11T09:28:28.566Z|00147|dpif_netdev|INFO|There are 2 pmd threads on numa node 0
2018-01-11T09:28:28.646Z|00148|dpdk|INFO|PMD: igbvf_dev_configure(): VF can't disable HW CRC Strip
2018-01-11T09:28:28.646Z|00149|netdev_dpdk|ERR|Interface dpdk-p0 MTU (1500) setup error: Operation not supported
2018-01-11T09:28:28.646Z|00150|netdev_dpdk|ERR|Interface dpdk-p0(rxq:1 txq:1) configure error: Operation not supported
2018-01-11T09:28:29.062Z|00002|daemon_unix(monitor)|ERR|1 crashes: pid 2494 died, killed (Segmentation fault), core dumped, restarting
7. Removing the VF from the bridge stops this behaviour:
$ ovs-vsctl del-port br0 dpdk-p0

The same happens if I restart openvswitch between steps 4 and 5 and let it initialize itself with the list of DPDK devices, instead of hotplugging them at runtime, as described above.
Riccardo


On 11 January 2018 at 01:27, Riccardo Ravaioli <riccardoravaioli at gmail.com<mailto:riccardoravaioli at gmail.com>> wrote:
Hi,

I was going through the openvswitch+dpdk tutorial and wanted to add a virtual function (VF) to a bridge as a dpdk interface.

I can bind the VF to the vfio-pci driver successfully with dpdk-devbind.py, but as soon as I add the interface to an ovs bridge (in netdev mode), openvswitch goes in segmentation fault and continuously tries to restart itself.

I'm running openvswitch 2.8.1 and dpdk 17.11 on Debian jessie with Linux kernel 4.6.

Is this a known problem? Is there a fix?
I have the same issue with VFs bound to igb_uio, whereas with real physical interfaces it works just fine.

Here are the relevant lines from ovs-vswitchd.log:

2018-01-10T15:53:26.949Z|00157|dpdk|INFO|PMD: igbvf_dev_configure(): VF can't disable HW CRC Strip
2018-01-10T15:53:26.949Z|00158|netdev_dpdk|ERR|Interface 0.extra2 MTU (1500) setup error: Operation not supported
2018-01-10T15:53:26.949Z|00159|netdev_dpdk|ERR|Interface 0.extra2(rxq:1 txq:1) configure error: Operation not supported
2018-01-10T15:53:27.333Z|00066|daemon_unix(monitor)|ERR|fork child died before signaling startup (killed (Segmentation fault))
2018-01-10T15:53:27.333Z|00067|daemon_unix(monitor)|WARN|23 crashes: pid 21413 died, killed (Segmentation fault), core dumped, waiting until 10 seconds since last restart
2018-01-10T15:53:33.333Z|00068|daemon_unix(monitor)|ERR|23 crashes: pid 21413 died, killed (Segmentation fault), core dumped, restarting
Thanks!

Riccardo



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180308/423218e9/attachment-0001.html>


More information about the discuss mailing list