[ovs-dev] [PATCH v6] Configurable Link State Change (LSC) detection mode

Stokes, Ian ian.stokes at intel.com
Thu Apr 5 13:17:02 UTC 2018


> On 03/27/2018 04:07 PM, Stokes, Ian wrote:
> >> On 27.03.2018 13:19, Stokes, Ian wrote:
> >>>> It is possible to change LSC detection mode to polling or interrupt
> >>>> mode for DPDK interfaces. The default is polling mode. To set
> >>>> interrupt mode, option dpdk-lsc-interrupt has to be set to true.
> >>>>
> >>>> In polling mode more processor time is needed, since the OVS
> >>>> repeatedly reads the link state with a short period. It can lead to
> >>>> packet loss for certain systems.
> >>>>
> >>>> In interrupt mode the hardware itself triggers an interrupt when
> >>>> link state change happens, so less processing time needs for the OVS.
> >>>>
> >>>> For detailed description and usage see the dpdk install
> documentation.
> >> Could you, please, better describe why we need this change?
> >> Because we're not removing the polling thread. OVS will still poll
> >> the link states periodically. This config option has no effect on that
> side.
> >> Also, link state polling in OVS uses 'rte_eth_link_get_nowait()'
> >> function which will be called in both cases and should not wait for
> >> hardware reply in any implementation.
> 
> rte_eth_link_get_nowait() on Intel XL710 could take an excessive time to
> respond. The following patch, https://dpdk.org/ml/archives/dev/2018-
> March/092156.html is taking care of it from a DPDK side.
> 
> There might be other drivers that also take a long time, hence this patch
> might still be useful in the future.
> 
> > I believe it was related to a case where bonded mode in active back
> > was causing packet drops due to the frequency that the LSC was being
> > polled. Using interrupt based approach alleviated the issue. (I'm open
> > to correction on this :))
> >
> > @Robert/Eelco You may be able to provide some more light here and
> whether the patches below in DPDK resolve the issue?
> >
> This long delay can be an issue in bonding mode, as the links checks for
> bonding interfaces is holding the RW lock in bond_run(). This same lock is
> taken in the PMD thread when calling the bond_check_admissibility() for
> upcall traffic.
> >> There was recent bug fix for intel NICs that fixes waiting of an
> >> admin queue on link state requests despite of 'no_wait' flag:
> >>      http://dpdk.org/ml/archives/dev/2018-March/092156.html
> >> Will this fix your target case?
> >>
> >> So, the difference of execution time of 'rte_eth_link_get_nowait()'
> >> with enabled and disabled interrupts should be not so significant.
> >> Do you have performance measurements? Measurement with above fix
> applied?
> I do not have delay numbers but I know that we were no longer seeing
> dropped traffic compared to other NICs under the same load with upcall
> traffic present.
> >>> Thanks for working on this Robert.
> >>>
> >>> I've completed some testing including the case where LSC is not
> >> supported, in which case the port will remain in a down state and
> >> fail rx/tx traffic. This behavior conforms to the netdev_reconfigure
> >> expectations in the fail case so that's ok.
> >>
> >> I'm not sure if this is acceptable. For example, we're not failing
> >> reconfiguration in case of issues with number of queues. We're trying
> >> different numbers until we have working configuration.
> >> Maybe we need the same fall-back mechanism in case of not supported
> >> LSC interrupts? (MTU setup errors are really uncommon unlike LSC
> interrupts'
> >> support in PMDs).
> > Thanks for raising this Ilya.
> >
> > I thought of this as well. I'd like to see a fall back to the PMD but
> didn’t see how it could be done in a clean way.
> >
> > Unfortunately rte_eth_dev_configure() returns -EINVAL when lsc mode is
> requested but not supported.
> >
> > It doesn't give us a clue if the error is related to lsc mode as it
> could also relate to a number of other configure issues such as
> nb_rxq/nb_txq/portid etc.
> >
> > It would be better if we could query the device via ethdev api to see if
> it supports lsc interrupt mode but that’s not available currently.

> Maybe a DPDK patch before we continue?

It's hard to say. As part of the call to rte_eth_dev_configure() there is a check specifically to see if lsc interrupt is supported with the following.

	/* Check that the device supports requested interrupts */
	if ((dev_conf->intr_conf.lsc == 1) &&
		(!(dev->data->dev_flags & RTE_ETH_DEV_INTR_LSC))) {
			RTE_PMD_DEBUG_TRACE("driver %s does not support lsc\n",
					dev->device->driver->name);
			return -EINVAL;
	}

Even if a patch was submitted to extend the API in DPDK to allow this check specifically, I feel it will be the same code as above, just in a new function name. The check would remain in rte_eth_dev_configure() anyway.

We could do something similar in OVS for now to allow PMD fallback where it's not supported.

Ian

> > The only way I have seen lsc support queries in DPDK is by querying the
> device data itself which doesn't look great, this was discussed in an
> earlier thread I think for this patch.
> >
> > The alternative could be to introduce a generic retry for
> rte_eth_dev_configure() when configure fails but with lsc configured to
> PMD instead but I'd prefer to see an indication that the lsc mode was the
> cause.
> >
> > If we have a cleaner way to fall back to PMD that’s ok by me but I think
> we have to allow for a possible failure in during a reconfigure and follow
> the prescribed behavior as per netdev_reconfigure() which this code
> currently does.
> >
> >>> I'm a bit late to the thread but I have a few other comments below.
> >>>
> >>> I'd like to get this patch in the next pull request if possible so
> >>> I'd
> >> appreciate if others can give any comments on the patch also.
> >>> Thanks
> >>> Ian
> >>>
> >>>> Signed-off-by: Robert Mulik <robert.mulik at ericsson.com>
> >>>> ---
> >>>> v5 -> v6:
> >>>> - DPDK install documentation updated.
> >>>> - Status of lsc_interrupt_mode of DPDK interfaces can be read by
> >> command
> >>>>    ovs-appctl dpif/show.
> >>>> - It was suggested to check if the HW supports interrupt mode, but
> >>>> it is not
> >>>>    possible to do without DPDK code change, so it is skipped from
> >>>> this patch.
> >>>> ---
> >>>>   Documentation/intro/install/dpdk.rst | 33
> >>>> +++++++++++++++++++++++++++++++++
> >>>>   lib/netdev-dpdk.c                    | 24 ++++++++++++++++++++++--
> >>>>   vswitchd/vswitch.xml                 | 17 +++++++++++++++++
> >>>>   3 files changed, 72 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/Documentation/intro/install/dpdk.rst
> >>>> b/Documentation/intro/install/dpdk.rst
> >>>> index ed358d5..eb1bc7b 100644
> >>>> --- a/Documentation/intro/install/dpdk.rst
> >>>> +++ b/Documentation/intro/install/dpdk.rst
> >>>> @@ -628,6 +628,39 @@ The average number of packets per output batch
> >>>> can be checked in PMD stats::
> >>>>
> >>>>       $ ovs-appctl dpif-netdev/pmd-stats-show
> >>>>
> >>>> +Link State Change (LSC) detection configuration
> >>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>>> +
> >>>> +There are two methods to get the information when Link State
> >>>> +Change
> >>>> +(LSC) happens on a network interface: by polling or interrupt.
> >>>> +
> >>>> +With the polling method, the main process checks the link state on
> >>>> +a fixed interval. This fixed interval determines how fast a link
> >>>> +change is detected.
> >>>> +
> >>>> +If interrupts are used to get LSC information, the hardware itself
> >>>> +triggers an interrupt when link state change happens, the
> >>>> +interrupt thread wakes up from sleep, updates the information, and
> >>>> +goes back to sleep mode. When no link state change happens (most
> >>>> +of the time), the thread remains in sleep mode and doesn`t use
> >>>> +processor time at all. The disadvantage of this method is that
> >>>> +when an interrupt happens, the processor has to handle it
> >>>> +immediately, so it puts the currently running process to
> >>>> +background, handles the interrupt, and takes the
> >>>> background process back.
> >>>> +
> >>>> +Note that not all PMD drivers support LSC interrupts.
> >>>> +
> >>>> +The default configuration is polling mode. To set interrupt mode,
> >>>> +option ``dpdk-lsc-interrupt`` has to be set to ``true``.
> >>>> +
> >>>> +Command to set interrupt mode for a specific interface::
> >>>> +    $ ovs-vsctl set interface <iface_name>
> >>>> +options:dpdk-lsc-interrupt=true
> >>>> +
> >>>> +Command to set polling mode for a specific interface::
> >>>> +    $ ovs-vsctl set interface <iface_name>
> >>>> +options:dpdk-lsc-interrupt=false
> >>>> +
> >>>> +Command to remove ``dpdk-lsc-interrupt`` option::
> >>>> +    $ ovs-vsctl remove interface <iface_name> options
> >>>> +dpdk-lsc-interrupt
> >>> Just a query, why do we need the above option to remove lsc, is
> >>> setting
> >> lsc true or false with the previous commands not enough?
> >>>> +
> >>>>   Limitations
> >>>>   ------------
> >>>>
> >>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> >>>> 94fb163..e2794e8
> >>>> 100644
> >>>> --- a/lib/netdev-dpdk.c
> >>>> +++ b/lib/netdev-dpdk.c
> >>>> @@ -433,6 +433,12 @@ struct netdev_dpdk {
> >>>>           /* DPDK-ETH hardware offload features,
> >>>>            * from the enum set 'dpdk_hw_ol_features' */
> >>>>           uint32_t hw_ol_features;
> >>>> +
> >>>> +        /* Properties for link state change detection mode.
> >>>> +         * If lsc_interrupt_mode is set to false, poll mode is used,
> >>>> +         * otherwise interrupt mode is used. */
> >>>> +        bool requested_lsc_interrupt_mode;
> >>>> +        bool lsc_interrupt_mode;
> >>>>       );
> >>>>
> >>>>       PADDED_MEMBERS(CACHE_LINE_SIZE, @@ -686,12 +692,14 @@
> >>>> dpdk_watchdog(void *dummy OVS_UNUSED)  }
> >>>>
> >>>>   static int
> >>>> -dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int
> >>>> n_txq)
> >>>> +dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int
> >>>> +n_txq)
> >>>>   {
> >>>>       int diag = 0;
> >>>>       int i;
> >>>>       struct rte_eth_conf conf = port_conf;
> >>>>
> >>>> +    conf.intr_conf.lsc = dev->lsc_interrupt_mode;
> >>> Should above be dev->requested_lsc_interrupt_mode? Similar to how we
> >> request MTUs, we first have to validate if it's supported.
> >>> Since we have no way of knowing if the interrupt mode is supported
> >>> at
> >> the moment we should set conf.intr_conf.lsc = dev-
> >>> requested_lsc_interrupt_mode and once configuration succeeds set
> >>> dev- lsc_interrupt_mode = conf.intr_conf.lsc.
> >>>
> >>> If we don't then I'm not sure of the purpose of having
> >> requested_lsc_interrupt_mode and a separate lsc_interrupt_mode for dev.
> >>> Thoughts?
> >>>
> >>>> +
> >>>>       /* For some NICs (e.g. Niantic), scatter_rx mode needs to be
> >>>> explicitly
> >>>>        * enabled. */
> >>>>       if (dev->mtu > ETHER_MTU) {
> >>>> @@ -801,7 +809,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
> >>>>       n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
> >>>>       n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> >>>>
> >>>> -    diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
> >>>> +    diag = dpdk_eth_dev_port_config(dev, n_rxq, n_txq);
> >>>>       if (diag) {
> >>>>           VLOG_ERR("Interface %s(rxq:%d txq:%d) configure error: %s",
> >>>>                    dev->up.name, n_rxq, n_txq,
> >>>> rte_strerror(-diag));
> >>> With the change from dpdk_eth_dev_queue_setup to
> >> dpdk_eth_dev_port_config I think we need to improve the error message
> >> also logged above.
> >>> Instead of only reporting the rxq and txq we should include the
> >> requested lsc mode also as that could cause a failure if not supported.
> >>>   @@ -
> >>>> 897,6 +905,7 @@ common_construct(struct netdev *netdev, dpdk_port_t
> >>>> port_no,
> >>>>       dev->flags = 0;
> >>>>       dev->requested_mtu = ETHER_MTU;
> >>>>       dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> >>>> +    dev->requested_lsc_interrupt_mode = 0;
> >>>>       ovsrcu_index_init(&dev->vid, -1);
> >>>>       dev->vhost_reconfigured = false;
> >>>>       dev->attached = false;
> >>>> @@ -1320,6 +1329,8 @@ netdev_dpdk_get_config(const struct netdev
> >>>> *netdev, struct smap *args)
> >>>>           } else {
> >>>>               smap_add(args, "rx_csum_offload", "false");
> >>>>           }
> >>>> +        smap_add(args, "lsc_interrupt_mode",
> >>>> +                 dev->lsc_interrupt_mode?"true":"false");
> >>> Need space before/after operators '?' and ':' above.
> >>>
> >>>>       }
> >>>>       ovs_mutex_unlock(&dev->mutex);
> >>>>
> >>>> @@ -1520,6 +1531,12 @@ netdev_dpdk_set_config(struct netdev
> >>>> *netdev, const struct smap *args,
> >>>>           goto out;
> >>>>       }
> >>>>
> >>>> +    bool lsc_interrupt_mode = smap_get_bool(args,
> >>>> + "dpdk-lsc-interrupt",
> >>> This comes down to style preference but can you declare this bool at
> >>> the
> >> beginning of the function? There are other bool variable in the
> >> function declared there already so it keeps with the existing style
> >> of the function.
> >>
> >> +1
> >>
> >>>> false);
> >>>> +    if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) {
> >>>> +        dev->requested_lsc_interrupt_mode = lsc_interrupt_mode;
> >>>> +        netdev_request_reconfigure(netdev);
> >>>> +    }
> >>>> +
> >>>>       rx_fc_en = smap_get_bool(args, "rx-flow-ctrl", false);
> >>>>       tx_fc_en = smap_get_bool(args, "tx-flow-ctrl", false);
> >>>>       autoneg = smap_get_bool(args, "flow-ctrl-autoneg", false); @@
> >>>> -3546,6
> >>>> +3563,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> >>>>       if (netdev->n_txq == dev->requested_n_txq
> >>>>           && netdev->n_rxq == dev->requested_n_rxq
> >>>>           && dev->mtu == dev->requested_mtu
> >>>> +        && dev->lsc_interrupt_mode ==
> >>>> + dev->requested_lsc_interrupt_mode
> >>>>           && dev->rxq_size == dev->requested_rxq_size
> >>>>           && dev->txq_size == dev->requested_txq_size
> >>>>           && dev->socket_id == dev->requested_socket_id) { @@
> >>>> -3561,6
> >>>> +3579,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
> >>>>           goto out;
> >>>>       }
> >>>>
> >>>> +    dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode;
> >>> I think this would be removed from here and only set once
> >> rte_eth_dev_configure succeeds in dpdk_eth_dev_port_config as
> >> mentioned earlier?
> >>>> +
> >>>>       netdev->n_txq = dev->requested_n_txq;
> >>>>       netdev->n_rxq = dev->requested_n_rxq;
> >>>>
> >>> It could be useful to report the LSC mode configured for the port
> >>> also as part of the port status, something like
> >>>
> >>> @@ -2781,6 +2783,8 @@ netdev_dpdk_get_status(const struct netdev
> >> *netdev, struct smap *args)
> >>>                              dev_info.max_hash_mac_addrs);
> >>>       smap_add_format(args, "max_vfs", "%u", dev_info.max_vfs);
> >>>       smap_add_format(args, "max_vmdq_pools", "%u",
> >>> dev_info.max_vmdq_pools);
> >>> +    smap_add_format(args, "lsc_mode", "%s",
> >>> +                           dev->lsc_interrupt_mode ? "Interrupt" :
> >>> + "PMD");
> >>>
> >>> Thoughts?
> >> We have this in 'netdev_dpdk_get_config()'. Do you think we need this
> >> in two places? Also, 'netdev_dpdk_get_status()' is more like HW
> >> specific invariants. (Not sure why we're placing these values in
> >> 'status', it's a bit strange.)
> > Ya, I wasn't sure here so thought I'd suggest it.
> >
> > You have values in get_status like max_rx_packet_length that change
> based on user configuration, so for completeness really I suggested it be
> added here.
> >
> > I don’t feel particularly strongly about it though, wanted to raise it
> for discussion as it may be something the user expects.
> >
> > Ian
> >>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> >>>> 0c6a43d..3c9e637 100644
> >>>> --- a/vswitchd/vswitch.xml
> >>>> +++ b/vswitchd/vswitch.xml
> >>>> @@ -3631,6 +3631,23 @@ ovs-vsctl add-port br0 p0 -- set Interface
> >>>> p0 type=patch options:peer=p1 \
> >>>>         </column>
> >>>>       </group>
> >>>>
> >>>> +    <group title="Link State Change detection mode">
> >>>> +      <column name="options" key="dpdk-lsc-interrupt"
> >>>> +              type='{"type": "boolean"}'>
> >>>> +        <p>
> >>>> +          Set this value to <code>true</code> to configure
> >>>> + interrupt mode
> >>>> for
> >>>> +          Link State Change (LSC) detection instead of poll mode
> >>>> + for the
> >>>> DPDK
> >>>> +          interface.
> >>>> +        </p>
> >>>> +        <p>
> >>>> +          If this value is not set, poll mode is configured.
> >>>> +        </p>
> >>>> +        <p>
> >>>> +          This parameter has an effect only on netdev dpdk
> interfaces.
> >>>> +        </p>
> >>>> +      </column>
> >>>> +    </group>
> >>>> +
> >>>>       <group title="Common Columns">
> >>>>         The overall purpose of these columns is described under
> >>>> <code>Common
> >>>>         Columns</code> at the beginning of this document.
> >>>> --
> >>>> 1.9.1
> >>>>
> >>>> _______________________________________________
> >>>> dev mailing list
> >>>> dev at openvswitch.org
> >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>
> >>>



More information about the dev mailing list