[ovs-dev] [PATCH 3/4] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes.
Ilya Maximets
i.maximets at samsung.com
Tue May 30 14:14:22 UTC 2017
On 30.05.2017 16:53, Kevin Traynor wrote:
> On 05/29/2017 04:37 PM, Ilya Maximets wrote:
>> On 29.05.2017 18:00, Kevin Traynor wrote:
>>> On 05/29/2017 02:31 PM, Ilya Maximets wrote:
>>>> On 29.05.2017 16:26, Kevin Traynor wrote:
>>>>> On 05/29/2017 01:22 PM, Ilya Maximets wrote:
>>>>>> On 26.05.2017 20:14, Kevin Traynor wrote:
>>>>>>> On 05/26/2017 03:55 PM, Ilya Maximets wrote:
>>>>>>>> On 10.03.2017 07:27, Daniele Di Proietto wrote:
>>>>>>>>> 2017-02-21 6:49 GMT-08:00 Ilya Maximets <i.maximets at samsung.com>:
>>>>>>>>>> Reconfiguration of HW NICs may lead to packet drops.
>>>>>>>>>> In current model all physical ports will be reconfigured each
>>>>>>>>>> time number of PMD threads changed. Since we not stopping
>>>>>>>>>> threads on pmd-cpu-mask changes, this patch will help to further
>>>>>>>>>> decrease port's downtime by setting the maximum possible number
>>>>>>>>>> of wanted tx queues to avoid unnecessary reconfigurations.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
>>>>>>>>>
>>>>>>>>> I haven't thought this through a lot, but the last big series we pushed
>>>>>>>>> on master went exactly in the opposite direction, i.e. created one txq
>>>>>>>>> for each thread in the datapath.
>>>>>>>>>
>>>>>>>>> I thought this was a good idea because:
>>>>>>>>>
>>>>>>>>> * On some systems with hyperthreading we can have a lot of cpus (we received
>>>>>>>>> reports of systems with 72 cores). If you want to use only a couple of cores
>>>>>>>>> you're still forced to have a lot of unused txqs, which prevent you
>>>>>>>>> from having
>>>>>>>>> lockless tx.
>>>>>>>>> * We thought that reconfiguring the number of pmds would not be a frequent
>>>>>>>>> operation.
>>>>>>>>>
>>>>>>>>> Why do you want to reconfigure the threads that often? Is it to be
>>>>>>>>> able to support more throughput quickly?
>>>>>>>>
>>>>>>>> Yes.
>>>>>>>>
>>>>>>>>> In this case I think we shouldn't use the number of cpus,
>>>>>>>>> but something else coming from the user, so that the administrator can
>>>>>>>>> balance how
>>>>>>>>> quickly pmd threads can be reconfigured vs how many txqs should be on
>>>>>>>>> the system.
>>>>>>>>> I'm not sure how to make this user friendly though.
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>> Right now I'm thinking about combined solution which will respect
>>>>>>>> both issues (too big number of TXQs and frequent device reconfiguration).
>>>>>>>> I think, we can implement additional function to get port's limitations.
>>>>>>>> For now we can request the maximum number of TX queues from netdev and
>>>>>>>> use it while number of cores less then number of queues.
>>>>>>>> Something like this:
>>>>>>>>
>>>>>>>> lib/netdev-dpdk.c:
>>>>>>>> uint32_t netdev_dpdk_get_max_txqs(struct netdev *netdev)
>>>>>>>> {
>>>>>>>> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>>>>>>> struct rte_eth_dev_info dev_info;
>>>>>>>>
>>>>>>>> ovs_mutex_lock(&dev->mutex);
>>>>>>>> rte_eth_dev_info_get(dev->port_id, &dev_info);
>>>>>>>> ovs_mutex_unlock(&dev->mutex);
>>>>>>>>
>>>>>>>> return dev_info.max_tx_queues;
>>>>>>>> }
>>>>>>>>
>>>>>>>> lib/dpif-netdev.c:reconfigure_datapath():
>>>>>>>>
>>>>>>>> <----->
>>>>>>>> max_tx_queues = netdev_get_max_txqs(port->netdev);
>>>>>>>> number_of_threads = cmap_count(&dp->poll_threads);
>>>>>>>> wanted_txqs = MAX(max_tx_queues, number_of_threads);
>>>>>>>> <----->
>>>>>>>>
>>>>>>>> In this implementation there will be no additional locking if number of
>>>>>>>> threads less or equal to maximum possible number of tx queues in HW.
>>>>>>>>
>>>>>>>> What do you think? Ian? Darrell?
>>>>>>>>
>>>>>>>
>>>>>>> I'm not sure about using rte_eth_dev_info_get() as IIRC there was
>>>>>>> problems previously with it reporting a number, but then when you went
>>>>>>> to configure them they were not all available depending on mode. That
>>>>>>> was why the trial and error queue configure was put in.
>>>>>>>
>>>>>>> What about replacing 'max_tx_queues' above with a number like 16 that is
>>>>>>> likely to be supported by the ports and unlikely be exceeded by
>>>>>>> number_of_threads?
>>>>>>>
>>>>>>> Kevin.
>>>>>>
>>>>>> Hi Kevin. Thank you for reminding me of this issue.
>>>>>>
>>>>>> But I think that magic numbers is not a good solution.
>>>>>>
>>>>>> One issue in my first implementation is that desired number of queues is
>>>>>> not actually the same as required number.
>>>>>>
>>>>>> How about something like this:
>>>>>> <----------------------------------------------------------------->
>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>> index 97f72d3..1a18e4f 100644
>>>>>> --- a/lib/dpif-netdev.c
>>>>>> +++ b/lib/dpif-netdev.c
>>>>>> @@ -3448,7 +3448,7 @@ reconfigure_datapath(struct dp_netdev *dp)
>>>>>> {
>>>>>> struct dp_netdev_pmd_thread *pmd;
>>>>>> struct dp_netdev_port *port;
>>>>>> - int wanted_txqs;
>>>>>> + int needed_txqs, wanted_txqs;
>>>>>>
>>>>>> dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
>>>>>>
>>>>>> @@ -3456,7 +3456,15 @@ reconfigure_datapath(struct dp_netdev *dp)
>>>>>> * on the system and the user configuration. */
>>>>>> reconfigure_pmd_threads(dp);
>>>>>>
>>>>>> - wanted_txqs = cmap_count(&dp->poll_threads);
>>>>>> + /* We need 1 Tx queue for each thread to avoid locking, but we will try
>>>>>> + * to allocate the maximum possible value to minimize the number of port
>>>>>> + * reconfigurations. */
>>>>>> + needed_txqs = cmap_count(&dp->poll_threads);
>>>>>> + /* (n_cores + 1) is the maximum that we might need to have.
>>>>>> + * Additional queue is for non-PMD threads. */
>>>>>> + wanted_txqs = ovs_numa_get_n_cores();
>>>>>> + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
>>>>>> + wanted_txqs++;
>>>>>>
>>>>>> /* The number of pmd threads might have changed, or a port can be new:
>>>>>> * adjust the txqs. */
>>>>>> @@ -3469,9 +3477,13 @@ reconfigure_datapath(struct dp_netdev *dp)
>>>>>>
>>>>>> /* Check for all the ports that need reconfiguration. We cache this in
>>>>>> * 'port->reconfigure', because netdev_is_reconf_required() can change at
>>>>>> - * any time. */
>>>>>> + * any time. Also mark for reconfiguration all ports which will likely
>>>>>> + * change their 'dynamic_txqs' parameter. It's required to stop using
>>>>>> + * them before changing this setting. */
>>>>>> HMAP_FOR_EACH (port, node, &dp->ports) {
>>>>>> - if (netdev_is_reconf_required(port->netdev)) {
>>>>>> + if (netdev_is_reconf_required(port->netdev)
>>>>>> + || (port->dynamic_txqs
>>>>>> + != netdev_n_txq(port->netdev) < needed_txqs)) {
>>>>>
>>>>> I'm not quite sure why there needs to be a reconfigure here. It will
>>>>> again try with the same wanted_txqs. Is it not enough to just update the
>>>>> dynamic_txqs flag as is done later ?
>>>>
>>>> It's not enough because reconfiguration will not be requested if we're
>>>> trying to set up same number of queues second time.
>>>>
>>>
>>> My question is - why request a reconfigure at all based on needed_txqs?
>>>
>>> We already configured requesting wanted_txqs (=ovs_numa_get_n_cores())
>>> and now we would configure again requesting the same num wanted_txqs,
>>> regardless of the new needed_txqs value.
>>>
>>> Could we just update the dynamic_txqs flag (as you have below) to
>>> reflect any new difference between configured txqs and needed_txqs.
>>> Maybe I missed some part?
>>>
>>> Kevin.
>>
>> Ok. I got it. The reason is in fact that we need to remove these ports
>> from PMD threads. It's simpler to mark ports here and allow
>> 'pmd_remove_stale_ports' to remove them from threads.
>>
>> After that where will be no actual reconfiguration because 'port_reconfigure'
>> will just check that nothing changed (branch /* Reconfiguration is unnecessary */).
>>
>> If we want to avoid calling 'port_reconfigure' we'll have to remove
>> ports from threads using additional loop and handle them separately
>> in reconfiguration loop. I think, my solution just simpler and not
>> really slower because there will be no actual reconfiguration.
>>
>> What do you think?
>>
>
> Thanks, I understand now. Seems ok in that case, maybe you can add an
> even more gratuitous comment?
Ok. Please, check v2 for updated comment.
>
> I'm thinking that rxqs don't need to be removed from the pmd, but maybe
> it's not a good idea to be asynchronous and anyway that could be an
> optimization for a different time.
>
>>>
>>>> See lib/netdev-dpdk.c:netdev_dpdk_set_tx_multiq() for details:
>>>>
>>>> if (dev->requested_n_txq == n_txq) {
>>>> goto out;
>>>> }
>>>>
>>>>
>>>>
>>>>>> port->need_reconfigure = true;
>>>>>> }
>>>>>> }
>>>>>> @@ -3505,7 +3517,7 @@ reconfigure_datapath(struct dp_netdev *dp)
>>>>>> seq_change(dp->port_seq);
>>>>>> port_destroy(port);
>>>>>> } else {
>>>>>> - port->dynamic_txqs = netdev_n_txq(port->netdev) < wanted_txqs;
>>>>>> + port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs;
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> <----------------------------------------------------------------->
>>>>>>
>>>>>> Best regards, Ilya Maximets.
>>>>>>
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Daniele
>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>> lib/dpif-netdev.c | 6 +++++-
>>>>>>>>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>>>>>> index 6e575ab..e2b4f39 100644
>>>>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>>>>> @@ -3324,7 +3324,11 @@ reconfigure_datapath(struct dp_netdev *dp)
>>>>>>>>>> * on the system and the user configuration. */
>>>>>>>>>> reconfigure_pmd_threads(dp);
>>>>>>>>>>
>>>>>>>>>> - wanted_txqs = cmap_count(&dp->poll_threads);
>>>>>>>>>> + /* We need 1 Tx queue for each possible cpu core. */
>>>>>>>>>> + wanted_txqs = ovs_numa_get_n_cores();
>>>>>>>>>> + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
>>>>>>>>>> + /* And 1 Tx queue for non-PMD threads. */
>>>>>>>>>> + wanted_txqs++;
>>>>>>>>>>
>>>>>>>>>> /* The number of pmd threads might have changed, or a port can be new:
>>>>>>>>>> * adjust the txqs. */
>>>>>>>>>> --
>>>>>>>>>> 2.7.4
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dev mailing list
>>>>>>>> dev at openvswitch.org
>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>
>
>
>
More information about the dev
mailing list