[ovs-discuss] Performance drop with conntrack flows

Ilya Maximets i.maximets at ovn.org
Tue Aug 18 14:03:18 UTC 2020


On 8/18/20 4:00 PM, Ilya Maximets wrote:
> On 8/18/20 12:42 PM, K Venkata Kiran wrote:
>> Hi,
>>
>> We did further tests and found that it is indeed the conntrack global lock that was introduced with below commit that is causing the performance degradation.
>>
>> We did Perf tool analysis with and without below commit and we could see huge increase in pthread_mutex_lock samples.  In our testbed we had 4 PMD threads handling traffic from two dpdk and various VHU ports.
>>
>> At a data structure level , we could see a major change w.r.t to how the connections were being stored in conntrack structure.
>>
>> *Before :*
>>
>> conntrack_bucket {
>>               struct ct_lock lock;
>>               struct hmap connections OVS_GUARDED;
>>               struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
>>               struct ovs_mutex cleanup_mutex;
>>               long long next_cleanup OVS_GUARDED;
>> }
>>
>> *After :*
>>
>> struct conntrack {
>> -    /* Independent buckets containing the connections */
>> -    struct conntrack_bucket buckets[CONNTRACK_BUCKETS];
>> ..
>> +    struct ovs_mutex ct_lock; /* Protects 2 following fields. */
>> +    struct cmap conns OVS_GUARDED;
>> +    struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
>> }
>>
>> Earlier ‘conntrack_bucket’ structure  was holding list of connections for given hash bucket . This was removed and all connections added to main ‘conntrack’ structure and that list traversal now is protected by conntrack global ‘ct_lock’.
>>
>> We see the global 'ct->ct_lock' taken to do 'conn_update_expiration' (which happens for every packet) is adding too much of the performance drop
>>
>> Earlier with the conn_key_hash the connections created are mapped to matching hash bucket. Any update of state (mostly expiration time) involves moving the connection back into the list of connections belonging to that hash bucket. This was done with bucket level lock and with 256 buckets we have less contention.
>>
>> Now this ‘ct->ct_lock’ adds more contention and is causing the performance degradation.
>>
>> We also did the test-conntrack benchmarking
>>
>> *1. The standard 1 thread test :*
>>
>> After commit
>> $ ./ovstest test-conntrack benchmark 1 14880000 32
>> conntrack:   2230 ms
>>
>> Before commit
>> $ ./ovstest test-conntrack benchmark 1 14880000 32
>> conntrack:   1673 ms
>>
>> *2. We also did multiple thread test (4 threads) *
>>
>> $ ./ovstest test-conntrack benchmark 4 33554432 32 1    (32 Million packets)
>> Before : conntrack:  15043 ms / conntrack:  14644 ms
>> After  : conntrack:  71373 ms / conntrack:  65816 ms
>>
>> So with increase in number of connections and multiple threads doing conntrack_execute the impact is more and profound.
> 
> Thanks for testing and investigation.  I fully agree that userspace conntrack
> is not in a good shape, especially in terms of multi-threading and locking
> scheme.  And, unfortunately, it's not actively developed right now.
> 
>> Are there any changes that are expected to fix this performance issue in the near future?
> 
> I'm not aware of any ongoing development in this area.
> 
>> Do we have  conntrack related  performance tests that are run with every release ?
> 
> I'm not aware of any specific conntrack-related performance tests.
> We are lucking performance tests in many areas, actually.  We do not

s/lucking/lacking/

> have any public infrastructure to run these tests by ourselves.
> 
> Volunteers are always welcome.
> 
> Best regards, Ilya Maximets.
> 
>>
>> Thanks
>> Kiran
>>
>> *From:* K Venkata Kiran
>> *Sent:* Thursday, August 6, 2020 4:20 PM
>> *To:* ovs-dev at openvswitch.org; ovs-discuss at openvswitch.org; Darrell Ball <dlu998 at gmail.com>; blp at ovn.org
>> *Cc:* Anju Thomas <anju.thomas at ericsson.com>; K Venkata Kiran <k.venkata.kiran at ericsson.com>
>> *Subject:* Performance drop with conntrack flows
>>
>> Hi,
>>
>> We see 40% traffic drop with UDP traffic over VxLAN and 20% traffic drop with UDP traffic over MPLSoGRE between OVS 2.8.2 & OVS 2.12.1.
>>
>> We narrowed the drop in performance in our test is due to below commit and backing out the commit fixed the performance drop problem.
>>
>> The commit of concern is :
>> https://github.com/openvswitch/ovs/commit/967bb5c5cd9070112138d74a2f4394c50ae48420
>> commit 967bb5c5cd9070112138d74a2f4394c50ae48420
>> Author: Darrell Ball <dlu998 at gmail.com <mailto:dlu998 at gmail.com>>
>> Date:   Thu May 9 08:15:07 2019 -0700
>>  conntrack: Add rcu support.
>>
>> We suspect ‘ct->ct_lock’ lock taken to do ‘conn_update_state’ and for conn_key_lookup could be causing the issue.
>>
>> Anyone noticed the issue and any pointers on fix? We could not get any obvious commit that could solve the issue. Any guidance in solving this issue helps?
>>
>> Thanks
>>
>> Kiran



More information about the discuss mailing list