[ovs-dev] [PATCH] netdev-dpdk: Fix PMD threads hang in __netdev_dpdk_vhost_send().

Fri Jun 10 16:52:51 UTC 2016

> -----Original Message-----
> From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Ilya
> Maximets
> Sent: Thursday, June 2, 2016 6:24 AM
> To: Daniele Di Proietto <diproiettod at vmware.com>; Traynor, Kevin
> <kevin.traynor at intel.com>; dev at openvswitch.org
> Cc: Dyasly Sergey <s.dyasly at samsung.com>; Flavio Leitner
> <fbl at sysclose.org>
> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Fix PMD threads hang in
> __netdev_dpdk_vhost_send().
> 
> On 02.06.2016 04:32, Daniele Di Proietto wrote:
> >
> > On 25/05/2016 04:03, "Ilya Maximets" <i.maximets at samsung.com> wrote:
> >
> >> On 23.05.2016 17:55, Traynor, Kevin wrote:
> >>>> -----Original Message-----
> >>>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
> >>>> Sent: Tuesday, May 17, 2016 4:09 PM
> >>>> To: dev at openvswitch.org; Daniele Di Proietto
> <diproiettod at vmware.com>
> >>>> Cc: Dyasly Sergey <s.dyasly at samsung.com>; Heetae Ahn
> >>>> <heetae82.ahn at samsung.com>; Flavio Leitner <fbl at sysclose.org>;
> >>>> Traynor, Kevin <kevin.traynor at intel.com>; Pravin B Shelar
> >>>> <pshelar at ovn.org>; Ilya Maximets <i.maximets at samsung.com>
> >>>> Subject: [PATCH] netdev-dpdk: Fix PMD threads hang in
> >>>> __netdev_dpdk_vhost_send().
> >>>>
> >>>> There are situations when PMD thread can hang forever inside
> >>>> __netdev_dpdk_vhost_send() because of broken virtqueue ring.
> >>>>
> >>>> This happens if rte_vring_available_entries() always positive and
> >>>> rte_vhost_enqueue_burst() can't send anything (possible with
> broken
> >>>> ring).
> >>>>
> >>>> In this case time expiration will be never checked and 'do {}
> while
> >>>> (cnt)'
> >>>> loop will be infinite.
> >>>>
> >>>> This scenario sometimes reproducible with dpdk-16.04-rc2 inside
> guest
> >>>> VM.
> >>>> Also it may be reproduced by manual braking of ring structure
> inside
> >>>> the guest VM.
> >>>>
> >>>> Fix that by checking time expiration even if we have available
> >>>> entries.
> >>>
> >>> Hi Ilya,
> >>
> >> Hi, Kevin.
> >>
> >> Christian and Thiago CC-ed, because, I think, they're faced with
> similar issue.
> >>
> >>>
> >>> Thanks for catching this. This intersects with something else I've
> seen
> >>> wrt retry code and there's a few options...
> >>>
> >>> 1. Remove retries when nothing sent. For the VM that needs retries
> it is a
> >>> good thing to have, but Bhanu and I saw in a test with multiple
> VM's recently
> >>> that if one VM causes a lot of retries there is a large
> performance degradation
> >>> for the other VM's. So I changed the retry to only occur when at
> least one packet
> >>> has been sent on the previous call. I put a patch up here.
> >>> http://openvswitch.org/pipermail/dev/2016-May/071517.html
> >>>
> >>> If we keep retries we can either
> >>>
> >>> 2. Make more robust coordination between
> rte_ring_available_entries() and
> >>> rte_vhost_enqueue_burst(), as per your patch.
> >>>
> >>> 3. As you've shown that we can't rely on the
> rte_ring_available_entries() to know we
> >>> can enqueue, how about just remove it and use
> rte_vhost_enqueue_burst() directly
> >>> in the retry loop.
> >>>
> >>> My preference would be for 1. because on balance I'd rather one VM
> did not degrade
> >>> performance of others, more than I'd like it to have retries. Of
> course there could
> >>> be some compromise between them as well i.e. reduce amount of
> retries, but any retries
> >>> could affect performance for another path if they are using the
> same core.
> >>>
> >>> What do you think?
> >>
> >> I'm worry about scenarios with "pulsing" traffic, i.e. if we have
> not very big but
> >> enough amount of packets to overload vring in a short time and long
> period of silence
> >> after that. HW can keep in its RX queues much more packets than can
> be pushed to
> >> virtio ring. In this scenario, without retrying, most of packets
> will be dropped.
> >>
> >> How about just decreasing of VHOST_ENQ_RETRY_USECS to, may be, 1
> usec with my fix
> >> applied of course? Such interval should be enough to handle 20G
> traffic with 64B
> >> packets by one PMD thread. And, also, this timeout may be applied
> to both cases
> >> (something sent or not) to decrease cost of retrying.
> >>
> >> Best regards, Ilya Maximets.
> >
> > I haven't done any performance testing with many vms, but ...
> >
> > I think the retry logic was introduced because at the time
> NETDEV_MAX_BURST
> > was 192 and we felt that batches of 192 packets could easily
> overload the
> > guest ring.
> >
> > Now that NETDEV_MAX_BURST is only 32, I agree with Kevin on applying
> solution 1.
> > Retries can degrade performance for other vms, which IMHO is worse
> than
> > dropping packets destined for a "slow" receiver vm.
> >
> > Thanks,
> >
> > Daniele
> 
> Ok. I agree, that removing of timeout is reasonable.
> Another thing: Implementation provided by Kevin allows situation (in
> some
> corner case) where rte_vhost_enqueue_burst() will be called 32 times
> in a row.
> How much time will it take? May be we should limit number of retries
> or time
> spent inside __netdev_dpdk_vhost_send() ?
> 
> What do you think?

Apologies for delay in replying.

hmm, that's true. I chatted with Yuanhan and it's not typical but there's no
guarantee it can't happen. How about max 8 retries in that case, then we drop?

I've put a revised patch up here
http://openvswitch.org/pipermail/dev/2016-June/072682.html

thanks,
Kevin.

> 
> Best regards, Ilya Maximets.
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev