[ovs-dev] [PATCH v2 1/2] netdev-dpdk: Free mempool only when no in-use mbufs.

Kevin Traynor ktraynor at redhat.com
Fri Apr 13 16:43:43 UTC 2018


On 04/12/2018 04:40 PM, Ilya Maximets wrote:
> On 10.04.2018 21:12, Kevin Traynor wrote:
>> DPDK mempools are freed when they are no longer needed.
>> This can happen when a port is removed or a port's mtu
>> is reconfigured so that a new mempool is used.
>>
>> It is possible that an mbuf is attempted to be returned
>> to a freed mempool from NIC Tx queues and this can lead
>> to a segfault.
>>
>> In order to prevent this, only free mempools when they
>> are not needed and have no in-use mbufs. As this might
>> not be possible immediately, sweep the mempools anytime
>> a port tries to get a mempool.
>>
>> Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
>> Cc: mark.b.kavanagh81 at gmail.com
>> Cc: Ilya Maximets <i.maximets at samsung.com>
>> Reported-by: Venkatesan Pradeep <venkatesan.pradeep at ericsson.com>
>> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
>> ---
>>
>> v2: Add second call to rte_mempool_full() as it is not atomic
>>     so we can't trust one call to it.
>>
>>  lib/netdev-dpdk.c | 48 +++++++++++++++++++++++++++++++++++-------------
>>  1 file changed, 35 insertions(+), 13 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index c19cedc..00306c4 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -590,8 +590,32 @@ dpdk_mp_create(int socket_id, int mtu)
>>  }
>>  
>> +/* Free unused mempools. */
>> +static void
>> +dpdk_mp_sweep(void) OVS_REQUIRES(dpdk_mp_mutex)
>> +{
>> +    struct dpdk_mp *dmp, *next;
>> +
>> +    LIST_FOR_EACH_SAFE (dmp, next, list_node, &dpdk_mp_list) {
>> +        if (!dmp->refcount && rte_mempool_full(dmp->mp)) {
>> +            /*
>> +             * rte_mempool_full() is not atomic, but at this point
>> +             * we will not be getting any more mbufs from the mempool
>> +             * so if it reports full again on a subsequent call, then
>> +             * we can be sure that it really is full.
>> +             */
> 
> 
> Checking of the rte_mempool_full() twice will not guarantee that it's
> really full because it sums cache sizes with ring size and truncates the
> value from the upper side. For example:
> 
> 0. Let the mempool size == 250. Current mempool ring contains 150 packets.
>    Some of the packets in caches and in use.
> 1. Sum of the caches calculated. total_cache_len = 100;
> 2. In a separate thread some of the packets (let it be 50) flushed from the cache.
> 3. Getting the mempool ring count (200) and summing with previously calculated
>    cache sizes. total_size = 100 + 200 == 300.
>    --> Here we have twice accounted flushed packets. Truncated to 250 before return.

It does not happen in this order - maybe you are just trying to
illustrate what would happen if the order was reversed in the
implementation. Currently it would be like this:

1. The ring is counted (150)
2. Packets flushed from cache to ring (50)
3. Caches counted (50)
4. Total counted = 150 + 50 = 200
5. rte_mempool_full() returns false.

Which is a false negative and is handled correctly in OVS.

That is why with knowledge of the count order it is safe to only call
rte_mempool_full() once, as we know mbufs could only possibly go from
cache-->ring in the mempool now.

> 
> 4. rte_mempool_full() returns true.
> 5. Second call:
> 6. Sum of the caches calculated. total_cache_len = 50;
> 7. In a separate thread some of the packets (let it be 30) flushed from the cache.
> 8. Getting the mempool ring count (230) and summing with previously calculated
>    cache sizes. total_size = 50 + 230 == 280.
>    --> We still have twice accounted mbufs. Truncated to 250 before return.
> 9. rte_mempool_full() returns true which could be false positive.
> 
> So, it's possible to have 2 false positives in a row. The only way to be sure
> is to check twice rte_mempool_ops_get_count(), which is thread safe.
> 

ok, let's do it. I'll send a v3. All of the solutions discussed here are
a big improvement on the current situation anyway. If we get a more
elegant solution from DPDK in future we can use it.

> 
>> +            if (OVS_LIKELY(rte_mempool_full(dmp->mp))) {
>> +                ovs_list_remove(&dmp->list_node);
>> +                rte_mempool_free(dmp->mp);
>> +                rte_free(dmp);
>> +            }
>> +        }
>> +    }
>> +}
>> +
>>  static struct dpdk_mp *
>>  dpdk_mp_get(int socket_id, int mtu)
>>  {
>>      struct dpdk_mp *dmp;
>> +    bool reuse = false;
>>  
>>      ovs_mutex_lock(&dpdk_mp_mutex);
>> @@ -599,15 +623,18 @@ dpdk_mp_get(int socket_id, int mtu)
>>          if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
>>              dmp->refcount++;
>> -            goto out;
>> +            reuse = true;
>> +            break;
>>          }
>>      }
>> +    /* Sweep mempools after reuse or before create. */
>> +    dpdk_mp_sweep();
>>  
>> -    dmp = dpdk_mp_create(socket_id, mtu);
>> -    if (dmp) {
>> -        ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>> +    if (!reuse) {
>> +        dmp = dpdk_mp_create(socket_id, mtu);
>> +        if (dmp) {
>> +            ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>> +        }
>>      }
>>  
>> -out:
>> -
>>      ovs_mutex_unlock(&dpdk_mp_mutex);
>>  
>> @@ -615,5 +642,5 @@ out:
>>  }
>>  
>> -/* Release an existing mempool. */
>> +/* Decrement reference to a mempool. */
>>  static void
>>  dpdk_mp_put(struct dpdk_mp *dmp)
>> @@ -625,10 +652,5 @@ dpdk_mp_put(struct dpdk_mp *dmp)
>>      ovs_mutex_lock(&dpdk_mp_mutex);
>>      ovs_assert(dmp->refcount);
>> -
>> -    if (!--dmp->refcount) {
>> -        ovs_list_remove(&dmp->list_node);
>> -        rte_mempool_free(dmp->mp);
>> -        rte_free(dmp);
>> -     }
>> +    dmp->refcount--;
>>      ovs_mutex_unlock(&dpdk_mp_mutex);
>>  }
>>



More information about the dev mailing list