[ovs-dev] [PATCH] netdev-dpdk: Fix PMD threads hang in __netdev_dpdk_vhost_send().

Thu Jun 2 05:23:57 UTC 2016

On 02.06.2016 04:32, Daniele Di Proietto wrote:
> 
> On 25/05/2016 04:03, "Ilya Maximets" <i.maximets at samsung.com> wrote:
> 
>> On 23.05.2016 17:55, Traynor, Kevin wrote:
>>>> -----Original Message-----
>>>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
>>>> Sent: Tuesday, May 17, 2016 4:09 PM
>>>> To: dev at openvswitch.org; Daniele Di Proietto <diproiettod at vmware.com>
>>>> Cc: Dyasly Sergey <s.dyasly at samsung.com>; Heetae Ahn
>>>> <heetae82.ahn at samsung.com>; Flavio Leitner <fbl at sysclose.org>;
>>>> Traynor, Kevin <kevin.traynor at intel.com>; Pravin B Shelar
>>>> <pshelar at ovn.org>; Ilya Maximets <i.maximets at samsung.com>
>>>> Subject: [PATCH] netdev-dpdk: Fix PMD threads hang in
>>>> __netdev_dpdk_vhost_send().
>>>>
>>>> There are situations when PMD thread can hang forever inside
>>>> __netdev_dpdk_vhost_send() because of broken virtqueue ring.
>>>>
>>>> This happens if rte_vring_available_entries() always positive and
>>>> rte_vhost_enqueue_burst() can't send anything (possible with broken
>>>> ring).
>>>>
>>>> In this case time expiration will be never checked and 'do {} while
>>>> (cnt)'
>>>> loop will be infinite.
>>>>
>>>> This scenario sometimes reproducible with dpdk-16.04-rc2 inside guest
>>>> VM.
>>>> Also it may be reproduced by manual braking of ring structure inside
>>>> the guest VM.
>>>>
>>>> Fix that by checking time expiration even if we have available
>>>> entries.
>>>
>>> Hi Ilya,
>>
>> Hi, Kevin.
>>
>> Christian and Thiago CC-ed, because, I think, they're faced with similar issue.
>>
>>>
>>> Thanks for catching this. This intersects with something else I've seen
>>> wrt retry code and there's a few options...
>>>
>>> 1. Remove retries when nothing sent. For the VM that needs retries it is a
>>> good thing to have, but Bhanu and I saw in a test with multiple VM's recently
>>> that if one VM causes a lot of retries there is a large performance degradation
>>> for the other VM's. So I changed the retry to only occur when at least one packet
>>> has been sent on the previous call. I put a patch up here.
>>> http://openvswitch.org/pipermail/dev/2016-May/071517.html
>>>
>>> If we keep retries we can either
>>>
>>> 2. Make more robust coordination between rte_ring_available_entries() and
>>> rte_vhost_enqueue_burst(), as per your patch.
>>>
>>> 3. As you've shown that we can't rely on the rte_ring_available_entries() to know we
>>> can enqueue, how about just remove it and use rte_vhost_enqueue_burst() directly
>>> in the retry loop.
>>>
>>> My preference would be for 1. because on balance I'd rather one VM did not degrade
>>> performance of others, more than I'd like it to have retries. Of course there could
>>> be some compromise between them as well i.e. reduce amount of retries, but any retries
>>> could affect performance for another path if they are using the same core.
>>>
>>> What do you think?
>>
>> I'm worry about scenarios with "pulsing" traffic, i.e. if we have not very big but
>> enough amount of packets to overload vring in a short time and long period of silence
>> after that. HW can keep in its RX queues much more packets than can be pushed to
>> virtio ring. In this scenario, without retrying, most of packets will be dropped.
>>
>> How about just decreasing of VHOST_ENQ_RETRY_USECS to, may be, 1 usec with my fix
>> applied of course? Such interval should be enough to handle 20G traffic with 64B
>> packets by one PMD thread. And, also, this timeout may be applied to both cases
>> (something sent or not) to decrease cost of retrying.
>>
>> Best regards, Ilya Maximets.
> 
> I haven't done any performance testing with many vms, but ...
> 
> I think the retry logic was introduced because at the time NETDEV_MAX_BURST
> was 192 and we felt that batches of 192 packets could easily overload the
> guest ring.
> 
> Now that NETDEV_MAX_BURST is only 32, I agree with Kevin on applying solution 1.
> Retries can degrade performance for other vms, which IMHO is worse than
> dropping packets destined for a "slow" receiver vm.
> 
> Thanks,
> 
> Daniele

Ok. I agree, that removing of timeout is reasonable.
Another thing: Implementation provided by Kevin allows situation (in some
corner case) where rte_vhost_enqueue_burst() will be called 32 times in a row.
How much time will it take? May be we should limit number of retries or time
spent inside __netdev_dpdk_vhost_send() ?

What do you think?

Best regards, Ilya Maximets.