[ovs-dev] [branch-2.9 PATCH 1/2] netdev-dpdk: Free mempool only when no in-use mbufs.

Kevin Traynor ktraynor at redhat.com
Mon Apr 9 14:36:27 UTC 2018


On 04/06/2018 04:51 PM, Ilya Maximets wrote:
>>> DPDK mempools are freed when they are no longer needed.
>>> This can happen when a port is removed or a port's mtu is reconfigured so
>>> that a new mempool is used.
>>>
>>> It is possible that an mbuf is attempted to be returned to a freed mempool
>>> from NIC Tx queues and this can lead to a segfault.
>>>
>>> In order to prevent this, only free mempools when they are not needed and
>>> have no in-use mbufs. As this might not be possible immediately, sweep the
>>> mempools anytime a port tries to get a mempool.
>>>
>>
>> Thanks for working on this Kevin. Just a query below. From a testing POV I didn't come across any issues.
>>
>>> Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
>>> Cc: mark.b.kavanagh81 at gmail.com
>>> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
>>> ---
>>>
>>> Found on OVS 2.6, but code is operationally similar on recent the branch-
>>> 2.*'s. If the patch is acceptable I can backport to older branches.
> 
> This issue was actually rised back in January while discussing mempool issue.
> 

Hmmm, thanks for pointing it out. Seems the last mails in the thread
coincided with when I was traveling and I didn't read them back then.

>>
>> Sounds good to me.
>>>
>>>  lib/netdev-dpdk.c | 40 +++++++++++++++++++++++++++-------------
>>>  1 file changed, 27 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c19cedc..9b8e732
>>> 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -590,8 +590,24 @@ dpdk_mp_create(int socket_id, int mtu)  }
>>>
>>> +/* Free unused mempools. */
>>> +static void
>>> +dpdk_mp_sweep(void) OVS_REQUIRES(dpdk_mp_mutex) {
>>> +    struct dpdk_mp *dmp, *next;
>>> +
>>> +    LIST_FOR_EACH_SAFE (dmp, next, list_node, &dpdk_mp_list) {
>>> +        if (!dmp->refcount && rte_mempool_full(dmp->mp)) {
>>
>> I hadn't looked at rte_mempool_full() before. I noticed the documentation warns not to use it as part of the datapath but for debug purposes only.
> 
> Yes, rte_mempool_full() is racy and could return false-positives, but it's
> the best and only useful function in DPDK API. In practice we can never be
> sure if someone still uses the memory pool. I also used this function in
> my solution for branch 2.8 posted here:
> 
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-January/046100.html
> 

The similarities and differences are interesting!

> Memory handling in DPDK is awful. In fact, noone is able to use rte_mempool_free(),
> because there is no way correct to check if memory pool is still in use.
> 
> There is one dirty hack to check if mempool is really full. It based on the
> internal mempool implementation and the fact that we're not allocating any
> mbufs from the pool:
> 1.  Check rte_mempool_ops_get_count(). <-- this exposed externally for some reason
> 2.  Check rte_mempool_full().
> 3.  Check rte_mempool_ops_get_count() again.
> 4.  If rte_mempool_full() was 'true' and both calls to
>     rte_mempool_ops_get_count() returned same value ----> Mempool really full.
> 5.  Here we could safely call rte_mempool_free().
> 

Yes, that would work.

The ring count is made before the cache count in rte_mempool_full() and
as we are not getting mbufs I think we should only have the possibility
of mbufs moving from a cache to the ring. In that case we may get a
false negative but wouldn't get a false positive for rte_mempool_full().
Obviously the downside is that it is implementation specific - but maybe
it could be argued the implementation will not change on already
released DPDK versions. What do you think?

>>
>> Does its use add to the downtime in a noticeable way while we reconfigure? I'm thinking of a deployment where the number of lcores is high and we reconfigure often. When cache is enabled, it has to browse the length of all lcores. There may not be an issue here but was curious.
> 

I think it should be insignificant compared to actually
allocating/freeing a mempool. I will try to check it.

> That is the interesting question. In my version (referenced above) I used
> watchdog thread to periodically free mempools with zero refcount.
> Not sure which solution is better.
> 

The advantage I see of being on the watchdog timer thread is that
probably no one cares how long it takes :-) The advantage of being in
the mempool config is that it runs (and guarantees to run) just before a
new mempool is going to be requested, so there is smaller risk of
additional memory requirements.

In the short term I think we should use the a fix like the patches that
don't cover every possible corner case or the get_count() that isn't the
most elegant. Longer term we can think about what we would want to add
to DPDK to make it easier to do this.

thanks,
Kevin.

>>
>> Did you do any testing around this?
>>
>> Thanks
>> Ian
>>
>>> +            ovs_list_remove(&dmp->list_node);
>>> +            rte_mempool_free(dmp->mp);
>>> +            rte_free(dmp);
>>> +        }
>>> +    }
>>> +}
>>> +
>>>  static struct dpdk_mp *
>>>  dpdk_mp_get(int socket_id, int mtu)
>>>  {
>>>      struct dpdk_mp *dmp;
>>> +    bool reuse = false;
>>>
>>>      ovs_mutex_lock(&dpdk_mp_mutex);
>>> @@ -599,15 +615,18 @@ dpdk_mp_get(int socket_id, int mtu)
>>>          if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
>>>              dmp->refcount++;
>>> -            goto out;
>>> +            reuse = true;
>>> +            break;
>>>          }
>>>      }
>>> +    /* Sweep mempools after reuse or before create. */
>>> +    dpdk_mp_sweep();
>>>
>>> -    dmp = dpdk_mp_create(socket_id, mtu);
>>> -    if (dmp) {
>>> -        ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>>> +    if (!reuse) {
>>> +        dmp = dpdk_mp_create(socket_id, mtu);
>>> +        if (dmp) {
>>> +            ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>>> +        }
>>>      }
>>>
>>> -out:
>>> -
>>>      ovs_mutex_unlock(&dpdk_mp_mutex);
>>>
>>> @@ -615,5 +634,5 @@ out:
>>>  }
>>>
>>> -/* Release an existing mempool. */
>>> +/* Decrement reference to a mempool. */
>>>  static void
>>>  dpdk_mp_put(struct dpdk_mp *dmp)
>>> @@ -625,10 +644,5 @@ dpdk_mp_put(struct dpdk_mp *dmp)
>>>      ovs_mutex_lock(&dpdk_mp_mutex);
>>>      ovs_assert(dmp->refcount);
>>> -
>>> -    if (!--dmp->refcount) {
>>> -        ovs_list_remove(&dmp->list_node);
>>> -        rte_mempool_free(dmp->mp);
>>> -        rte_free(dmp);
>>> -     }
>>> +    dmp->refcount--;
>>>      ovs_mutex_unlock(&dpdk_mp_mutex);
>>>  }
>>> --
>>> 1.8.3.1
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev



More information about the dev mailing list