[ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

Kevin Traynor ktraynor at redhat.com
Wed Sep 6 09:23:53 UTC 2017


On 09/06/2017 08:03 AM, 王志克 wrote:
> Hi Darrell,
> 
> pmd-rxq-affinity has below limitation: (so isolated pmd can not be used for others, which is not my expectation. Lots of VMs come and go on the fly, and manully assignment is not feasible.)
>           >>After that PMD threads on cores where RX queues was pinned will become isolated. This means that this thread will poll only pinned RX queues
> 
> My problem is that I have several CPUs spreading on different NUMA nodes. I hope all these CPU can have chance to serve the rxq. However, because the phy NIC only locates on one certain socket node, non-same numa pmd/CPU would be excluded. So I am wondering whether we can have different behavior for phy port rxq: 
>       round-robin to all PMDs even the pmd on different NUMA socket.
> 
> I guess this is a common case, and I believe it would improve rx performance.
> 

The issue is that cross numa datapaths occur a large performance penalty
(~2x cycles). This is the reason rxq assignment uses pmds from the same
numa node as the port. Also, any rxqs from other ports that are also
scheduled on the same pmd could suffer as a result of cpu starvation
from that cross-numa assignment.

An issue was that in the case of no pmds available on the correct NUMA
node for a port, it meant that rxqs from that port were not polled at
all. Billy's commit addressed that by allowing cross-numa assignment
*only* in the event of no pmds on the same numa node as the port.

If you look through the threads on Billy's patch you'll see more
discussion on it.

Kevin.


> Br,
> Wang Zhike
> -----Original Message-----
> From: Darrell Ball [mailto:dball at vmware.com] 
> Sent: Wednesday, September 06, 2017 1:39 PM
> To: 王志克; ovs-discuss at openvswitch.org; ovs-dev at openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port
> 
> You could use  pmd-rxq-affinity for the queues you want serviced locally and let the others go remote
> 
> On 9/5/17, 8:14 PM, "王志克" <wangzhike at jd.com> wrote:
> 
>     It is a bit different from my expectation.
>     
>     
>     
>     I have separate CPU and pmd for each NUMA node. However, the physical NIC only locates on NUMA socket0. So only part of CPU and pmd (the ones in same NUMA node) can poll the physical NIC. Since I have multiple rx queue, I hope part queues can be polled with pmd on same node, others can be polled with pmd on non-local numa node. In this way, we have more pmds contributes the polling of physical NIC, so performance improvement is expected from total rx traffic view.
>     
>     
>     
>     Br,
>     
>     Wang Zhike
>     
>     
>     
>     -----Original Message-----
>     
>     From: Darrell Ball [mailto:dball at vmware.com] 
>     
>     Sent: Wednesday, September 06, 2017 10:47 AM
>     
>     To: 王志克; ovs-discuss at openvswitch.org; ovs-dev at openvswitch.org
>     
>     Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port
>     
>     
>     
>     This same numa node limitation was already removed, although same numa is preferred for performance reasons.
>     
>     
>     
>     commit c37813fdb030b4270d05ad61943754f67021a50d
>     
>     Author: Billy O'Mahony <billy.o.mahony at intel.com>
>     
>     Date:   Tue Aug 1 14:38:43 2017 -0700
>     
>     
>     
>         dpif-netdev: Assign ports to pmds on non-local numa node.
>     
>         
>     
>         Previously if there is no available (non-isolated) pmd on the numa node
>     
>         for a port then the port is not polled at all. This can result in a
>     
>         non-operational system until such time as nics are physically
>     
>         repositioned. It is preferable to operate with a pmd on the 'wrong' numa
>     
>         node albeit with lower performance. Local pmds are still chosen when
>     
>         available.
>     
>         
>     
>         Signed-off-by: Billy O'Mahony <billy.o.mahony at intel.com>
>     
>         Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
>     
>         Co-authored-by: Ilya Maximets <i.maximets at samsung.com>
>     
>     
>     
>     
>     
>     The sentence “The rx queues are assigned to pmd threads on the same NUMA node in a round-robin fashion.”
>     
>     
>     
>     under
>     
>     
>     
>     DPDK Physical Port Rx Queues¶
>     
>     
>     
>     should be removed since it is outdated in a couple of ways and there is other correct documentation on the same page
>     
>     and also here https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_howto_dpdk_&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=iNebKvfYjcXbjMsmtLJqThRUImv8W4PRrYWpD-QwUVg&s=KG3MmQe4QkUkyG3xsCoF6DakFsZh_eg9aEyhYFUKF2c&e= 
>     
>     
>     
>     Maybe you could submit a patch ?
>     
>     
>     
>     Thanks Darrell
>     
>     
>     
>     
>     
>     On 9/5/17, 7:18 PM, "ovs-dev-bounces at openvswitch.org on behalf of 王志克" <ovs-dev-bounces at openvswitch.org on behalf of wangzhike at jd.com> wrote:
>     
>     
>     
>         Hi All,
>     
>         
>     
>         
>     
>         
>     
>         I read below doc about pmd assignment for physical port. I think the limitation “on the same NUMA node” may be not efficient.
>     
>         
>     
>         
>     
>         
>     
>         https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8&s=4wch_Q6fqo0stIDE4K2loh0z-dshuligqsrAV_h-QuU&e= 
>     
>         
>     
>         DPDK Physical Port Rx Queues¶<https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_intro_install_dpdk_-23dpdk-2Dphysical-2Dport-2Drx-2Dqueues&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8&s=SexDthg-hfPaGjvjCRjkPPY1kK1NfycLQSDw6WHVArQ&e=>
>     
>         
>     
>         
>     
>         
>     
>         $ ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>
>     
>         
>     
>         
>     
>         
>     
>         The above command sets the number of rx queues for DPDK physical interface. The rx queues are assigned to pmd threads on the same NUMA node in a round-robin fashion.
>     
>         
>     
>         Consider below case:
>     
>         
>     
>         
>     
>         
>     
>         One host has one PCI NIC on NUMA node 0, and has 4 VMs, which spread in NUMA node 0 and 1. There are multiple rx queues configured on the physical NIC. We configured 4 pmd (two cpu from NUMA node0, and two cpu from node 1). Since the physical NIC locates on NUMA node0, only pmds on same NUMA node can poll its rxq. As a result, only two cpu can be used for polling physical NIC.
>     
>         
>     
>         
>     
>         
>     
>         If we compare the OVS kernel mode, there is no such limitation.
>     
>         
>     
>         
>     
>         
>     
>         So question:
>     
>         
>     
>         should we remove “same NUMA node” limitation fro physical port rx queues? Or we have other options to improve the performance for this case?
>     
>         
>     
>         
>     
>         
>     
>         Br,
>     
>         
>     
>         Wang Zhike
>     
>         
>     
>         
>     
>         
>     
>         _______________________________________________
>     
>         dev mailing list
>     
>         dev at openvswitch.org
>     
>         https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=pqvCrQwfrcDxvwcpuouzVymiBkev1vHpnOlef-ZMev8&s=Whz73vLTYWkBuEL6reD88bkzCgSfqpgb7MDiCG5fB4A&e= 
>     
>         
>     
>     
>     
>     
> 
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 



More information about the discuss mailing list