[ovs-dev] [dpdk-latest PATCH] netdev-dpdk: add custom vhost statistics to count IRQs

Mon Oct 28 10:14:43 UTC 2019

On 25 Oct 2019, at 17:06, Ilya Maximets wrote:

> On 25.10.2019 15:51, Eelco Chaudron wrote:

<SNIP>

>>>>  +static
>>>> +void vhost_guest_notified(int vid)
>>>> +{
>>>> +    struct netdev_dpdk *dev;
>>>> +
>>>> +    ovs_mutex_lock(&dpdk_mutex);
>>>> +    LIST_FOR_EACH (dev, list_node, &dpdk_list) {
>>>> +        if (netdev_dpdk_get_vid(dev) == vid) {
>>>> +            ovs_mutex_lock(&dev->mutex);
>>>> +            rte_spinlock_lock(&dev->stats_lock);
>>>> +            dev->vhost_irqs++;
>>>> +            rte_spinlock_unlock(&dev->stats_lock);
>>>> +            ovs_mutex_unlock(&dev->mutex);
>>>> +            break;
>>>> +        }
>>>> +    }
>>>> +    ovs_mutex_unlock(&dpdk_mutex);
>>>
>>> So, for every single eventfd_write() we're taking the global mutex,
>>> traversing list of all the devices, taking one more mutex and 
>>> finally
>>> taking the spinlock just to increment a single counter.
>>> I think, it's too much.
>>
>> Yes you are right this might be a bit of overkill…
>>
>>> Wouldn't it significantly affect interrupt based virtio drivers in
>>> terms of performance?  How frequently 2 vhost-user ports will lock
>>> each other on the global device mutex?  How frequently 2 PMD 
>>> threads
>>> will lock each other on rx/tx to different vrings of the same vhost
>>> port?
>>
>> I used iperf3 and did not show any negative effects (maybe I should 
>> add more than 4 physical queues, 1 had one VM queue), but also the 
>> results show large deviations.
>
> From my experience, I'd say that iperf via kernel virtio driver is not 
> able
> to saturate a PMD thread.  They are likely relaxed and have time to 
> wait
> on the mutex.  Also, You, probably, have only couple of ports, so the 
> critical
> section is not large enough to produce frequent interlocking.
>
> I'm just pointing that to count number of eventfd syscalls we're 
> probably
> making 2 other syscalls and probably sleeping in them.

Yes I agree and send out a v2 using a simple coverage counter

> For testing purposes, I'd suggest to add 10-20(100?) dummy pmd ports 
> (e.g.
> vhost without VMs) and assign them to different thread. Probably, more 
> threads
> in iperf for traffic generation, switching to small udp packets.
> You could also replace lock() with trylock() for dpdk_mutex and count
> contentions in a same way as it done for tx_lock in David's patch.

The try lock would have been a good thing to see if I hit this lock 
contention.
Was already testing with small UDP packets, but used iPerf as I needed a 
kernel like tool not a poll mode one based on DPDK.

>>
>>> I see that 'guest_notified' patch is already in dpdk master, but
>>> wouldn't it be better to accumulate this statistics inside DPDK
>>> and return it with some new API like rte_vhost_get_vring_xstats()
>>> if required?  I don't see any usecase for this callback beside
>>> counting some stats.
>>
>> I agree, however, the vhost library has no internal counters (only 
>> the PMD implementation which we do not use), so hence they liked the 
>> callback.
>
> One more possible issue is that this not-so-useful callback hijacked
> the reserved space in the structure and we could suffer in the
> future while adding new callbacks due to ABI breakage.
> (In context of DPDK API/ABI stability discussions)
>
>>
>>> For the current patch I could only suggest to convert it to simple
>>> coverage counter like it done in 'vhost_tx_contention' patch here:
>>> https://patchwork.ozlabs.org/patch/1175849/
>>
>> This is what I’ll do (and will send a patch soon). Although from a 
>> user perspective a per vhost user would make more sense as it could 
>> easily indicate which VM is configured wrongly…
>
> Sure.  I think that we need to re-imagine concept and implementation 
> of
> pmd-perf-stats in a "coverage"-like way, as I suggested previously 
> while
> discussing initial patches for pmd-perf, to count stats like vhost 
> qlen,
> tx contentions and this one.  At least, this should be not so hard to 
> do
> on a per-PMD basis.  per-netdev might be more tricky, but I believe 
> that
> it is possible.

This might be something that should be added. For per-netdev we could 
use atomics assuming the counters are for exception use cases (or per 
CPU) and get ride of any locking.

> For now we have architectural limitations that we have to live with.
>
> Best regards, Ilya Maximets.