[ovs-dev] [branch-2.9 PATCH 1/2] netdev-dpdk: Free mempool only when no in-use mbufs.

Kevin Traynor ktraynor at redhat.com
Tue Apr 10 14:21:46 UTC 2018


On 04/09/2018 03:36 PM, Kevin Traynor wrote:
> On 04/06/2018 04:51 PM, Ilya Maximets wrote:
>>>> DPDK mempools are freed when they are no longer needed.
>>>> This can happen when a port is removed or a port's mtu is reconfigured so
>>>> that a new mempool is used.
>>>>
>>>> It is possible that an mbuf is attempted to be returned to a freed mempool
>>>> from NIC Tx queues and this can lead to a segfault.
>>>>
>>>> In order to prevent this, only free mempools when they are not needed and
>>>> have no in-use mbufs. As this might not be possible immediately, sweep the
>>>> mempools anytime a port tries to get a mempool.
>>>>
>>>
>>> Thanks for working on this Kevin. Just a query below. From a testing POV I didn't come across any issues.
>>>
>>>> Fixes: 8d38823bdf8b ("netdev-dpdk: fix memory leak")
>>>> Cc: mark.b.kavanagh81 at gmail.com
>>>> Signed-off-by: Kevin Traynor <ktraynor at redhat.com>
>>>> ---
>>>>
>>>> Found on OVS 2.6, but code is operationally similar on recent the branch-
>>>> 2.*'s. If the patch is acceptable I can backport to older branches.
>>
>> This issue was actually rised back in January while discussing mempool issue.
>>
> 
> Hmmm, thanks for pointing it out. Seems the last mails in the thread
> coincided with when I was traveling and I didn't read them back then.
> 
>>>
>>> Sounds good to me.
>>>>
>>>>  lib/netdev-dpdk.c | 40 +++++++++++++++++++++++++++-------------
>>>>  1 file changed, 27 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c19cedc..9b8e732
>>>> 100644
>>>> --- a/lib/netdev-dpdk.c
>>>> +++ b/lib/netdev-dpdk.c
>>>> @@ -590,8 +590,24 @@ dpdk_mp_create(int socket_id, int mtu)  }
>>>>
>>>> +/* Free unused mempools. */
>>>> +static void
>>>> +dpdk_mp_sweep(void) OVS_REQUIRES(dpdk_mp_mutex) {
>>>> +    struct dpdk_mp *dmp, *next;
>>>> +
>>>> +    LIST_FOR_EACH_SAFE (dmp, next, list_node, &dpdk_mp_list) {
>>>> +        if (!dmp->refcount && rte_mempool_full(dmp->mp)) {
>>>
>>> I hadn't looked at rte_mempool_full() before. I noticed the documentation warns not to use it as part of the datapath but for debug purposes only.
>>
>> Yes, rte_mempool_full() is racy and could return false-positives, but it's
>> the best and only useful function in DPDK API. In practice we can never be
>> sure if someone still uses the memory pool. I also used this function in
>> my solution for branch 2.8 posted here:
>>
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-January/046100.html
>>
> 
> The similarities and differences are interesting!
> 
>> Memory handling in DPDK is awful. In fact, noone is able to use rte_mempool_free(),
>> because there is no way correct to check if memory pool is still in use.
>>
>> There is one dirty hack to check if mempool is really full. It based on the
>> internal mempool implementation and the fact that we're not allocating any
>> mbufs from the pool:
>> 1.  Check rte_mempool_ops_get_count(). <-- this exposed externally for some reason
>> 2.  Check rte_mempool_full().
>> 3.  Check rte_mempool_ops_get_count() again.
>> 4.  If rte_mempool_full() was 'true' and both calls to
>>     rte_mempool_ops_get_count() returned same value ----> Mempool really full.
>> 5.  Here we could safely call rte_mempool_free().
>>
> 
> Yes, that would work.
> 
> The ring count is made before the cache count in rte_mempool_full() and
> as we are not getting mbufs I think we should only have the possibility
> of mbufs moving from a cache to the ring. In that case we may get a
> false negative but wouldn't get a false positive for rte_mempool_full().
> Obviously the downside is that it is implementation specific - but maybe
> it could be argued the implementation will not change on already
> released DPDK versions. What do you think?
> 
>>>
>>> Does its use add to the downtime in a noticeable way while we reconfigure? I'm thinking of a deployment where the number of lcores is high and we reconfigure often. When cache is enabled, it has to browse the length of all lcores. There may not be an issue here but was curious.
>>
> 
> I think it should be insignificant compared to actually
> allocating/freeing a mempool. I will try to check it.
> 

I measured 4K-5K cycles, so extra time for rte_mempool_full() is not an
issue in the control path.

>> That is the interesting question. In my version (referenced above) I used
>> watchdog thread to periodically free mempools with zero refcount.
>> Not sure which solution is better.
>>
> 
> The advantage I see of being on the watchdog timer thread is that
> probably no one cares how long it takes :-) The advantage of being in
> the mempool config is that it runs (and guarantees to run) just before a
> new mempool is going to be requested, so there is smaller risk of
> additional memory requirements.
> 
> In the short term I think we should use the a fix like the patches that
> don't cover every possible corner case or the get_count() that isn't the
> most elegant. Longer term we can think about what we would want to add
> to DPDK to make it easier to do this.
> 
> thanks,
> Kevin.
> 
>>>
>>> Did you do any testing around this?
>>>
>>> Thanks
>>> Ian
>>>
>>>> +            ovs_list_remove(&dmp->list_node);
>>>> +            rte_mempool_free(dmp->mp);
>>>> +            rte_free(dmp);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>>  static struct dpdk_mp *
>>>>  dpdk_mp_get(int socket_id, int mtu)
>>>>  {
>>>>      struct dpdk_mp *dmp;
>>>> +    bool reuse = false;
>>>>
>>>>      ovs_mutex_lock(&dpdk_mp_mutex);
>>>> @@ -599,15 +615,18 @@ dpdk_mp_get(int socket_id, int mtu)
>>>>          if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
>>>>              dmp->refcount++;
>>>> -            goto out;
>>>> +            reuse = true;
>>>> +            break;
>>>>          }
>>>>      }
>>>> +    /* Sweep mempools after reuse or before create. */
>>>> +    dpdk_mp_sweep();
>>>>
>>>> -    dmp = dpdk_mp_create(socket_id, mtu);
>>>> -    if (dmp) {
>>>> -        ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>>>> +    if (!reuse) {
>>>> +        dmp = dpdk_mp_create(socket_id, mtu);
>>>> +        if (dmp) {
>>>> +            ovs_list_push_back(&dpdk_mp_list, &dmp->list_node);
>>>> +        }
>>>>      }
>>>>
>>>> -out:
>>>> -
>>>>      ovs_mutex_unlock(&dpdk_mp_mutex);
>>>>
>>>> @@ -615,5 +634,5 @@ out:
>>>>  }
>>>>
>>>> -/* Release an existing mempool. */
>>>> +/* Decrement reference to a mempool. */
>>>>  static void
>>>>  dpdk_mp_put(struct dpdk_mp *dmp)
>>>> @@ -625,10 +644,5 @@ dpdk_mp_put(struct dpdk_mp *dmp)
>>>>      ovs_mutex_lock(&dpdk_mp_mutex);
>>>>      ovs_assert(dmp->refcount);
>>>> -
>>>> -    if (!--dmp->refcount) {
>>>> -        ovs_list_remove(&dmp->list_node);
>>>> -        rte_mempool_free(dmp->mp);
>>>> -        rte_free(dmp);
>>>> -     }
>>>> +    dmp->refcount--;
>>>>      ovs_mutex_unlock(&dpdk_mp_mutex);
>>>>  }
>>>> --
>>>> 1.8.3.1
>>>>
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev at openvswitch.org
>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 



More information about the dev mailing list