[ovs-discuss] [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

王志克 wangzhike at jd.com
Thu Sep 7 04:04:31 UTC 2017


Hi Billy,

Please see my reply in line.

Br,
Wang Zhike

-----Original Message-----
From: O Mahony, Billy [mailto:billy.o.mahony at intel.com] 
Sent: Wednesday, September 06, 2017 9:01 PM
To: 王志克; Darrell Ball; ovs-discuss at openvswitch.org; ovs-dev at openvswitch.org; Kevin Traynor
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

Hi Wang,

I think the mention of pinning was confusing me a little. Let me see if I fully understand your use case:  You don't 'want' to pin anything but you are using it as a way to force the distribution of rxq from a single nic across to PMDs on different NUMAs. As without pinning all rxqs are assigned to the NUMA-local pmd leaving the other PMD totally unused.

But then when you used pinning you the PMDs became isolated so the vhostuser ports rxqs would not be assigned to the PMDs unless they too were pinned. Which worked but was not manageable as VM (and vhost ports) came and went.

Yes? 
[Wang Zhike] Yes, exactly.

In that case what we probably want is the ability to pin an rxq to a pmd but without also isolating the pmd. So the PMD could be assigned some rxqs manually and still have others automatically assigned. 

But what I still don't understand is why you don't put both PMDs on the same NUMA node. Given that you cannot program the NIC to know which VM a frame is for then you would have to RSS the frames across rxqs (ie across NUMA nodes). Of those going to the NICs local-numa node 50% would have to go across the NUMA boundary when their destination VM was decided - which is okay - they have to cross the boundary at some point. But for or frames going to non-local NUMA, 50% of these will actually be destined for what was originally the local NUMA node. Now these packets (25% of all traffic would ) will cross NUMA *twice* whereas if all PMDs were on the NICs NUMA node those frames would never have had to pass between NUMA nodes.

In short I think it's more efficient to have both PMDs on the same NUMA node as the NIC.

[Wang Zhike] If considering Tx direction, i.e, from VM on different NUMA node to phy NIC, I am not sure whether your proposal would downgrade the TX performance...
I will try to test different cross-NUMA scenario to get the performance penalty data.

There is one more comments below..

> -----Original Message-----
> From: 王志克 [mailto:wangzhike at jd.com]
> Sent: Wednesday, September 6, 2017 12:50 PM
> To: O Mahony, Billy <billy.o.mahony at intel.com>; Darrell Ball
> <dball at vmware.com>; ovs-discuss at openvswitch.org; ovs-
> dev at openvswitch.org; Kevin Traynor <ktraynor at redhat.com>
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Billy,
> 
> See my reply in line.
> 
> Br,
> Wang Zhike
> 
> -----Original Message-----
> From: O Mahony, Billy [mailto:billy.o.mahony at intel.com]
> Sent: Wednesday, September 06, 2017 7:26 PM
> To: 王志克; Darrell Ball; ovs-discuss at openvswitch.org; ovs-
> dev at openvswitch.org; Kevin Traynor
> Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> Hi Wang,
> 
> You are going to have to take the hit crossing the NUMA boundary at some
> point if your NIC and VM are on different NUMAs.
> 
> So are you saying that it is more expensive to cross the NUMA boundary
> from the pmd to the VM that to cross it from the NIC to the PMD?
> 
> [Wang Zhike] I do not have such data. I hope we can try the new behavior
> and get the test result, and then know whether and how much performance
> can be improved.

[[BO'M]] You don't need to a code change to compare performance of these two scenarios. You can simulate it by pinning queues to VMs. I'd imagine crossing the NUMA boundary during the PCI DMA would be cheaper that crossing it over vhost. But I don't know what the result would be and this would a pretty interesting figure to have by the way.


> 
> If so then in that case you'd like to have two (for example) PMDs polling 2
> queues on the same NIC. With the PMDs on each of the NUMA nodes
> forwarding to the VMs local to that NUMA?
> 
> Of course your NIC would then also need to be able know which VM (or at
> least which NUMA the VM is on) in order to send the frame to the correct
> rxq.
> 
> [Wang Zhike] Currently I do not know how to achieve it. From my view, NIC
> do not know which NUMA should be the destination of the packet. Only
> after OVS handling (eg lookup the fowarding rule in OVS), then it can know
> the destination. If NIC does not know the destination NUMA socket, it does
> not matter which PMD to poll it.
> 
> 
> /Billy.
> 
> > -----Original Message-----
> > From: 王志克 [mailto:wangzhike at jd.com]
> > Sent: Wednesday, September 6, 2017 11:41 AM
> > To: O Mahony, Billy <billy.o.mahony at intel.com>; Darrell Ball
> > <dball at vmware.com>; ovs-discuss at openvswitch.org; ovs-
> > dev at openvswitch.org; Kevin Traynor <ktraynor at redhat.com>
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Billy,
> >
> > It depends on the destination of the traffic.
> >
> > I observed that if the traffic destination is across NUMA socket, the
> > "avg processing cycles per packet" would increase 60% than the traffic
> > to same NUMA socket.
> >
> > Br,
> > Wang Zhike
> >
> > -----Original Message-----
> > From: O Mahony, Billy [mailto:billy.o.mahony at intel.com]
> > Sent: Wednesday, September 06, 2017 6:35 PM
> > To: 王志克; Darrell Ball; ovs-discuss at openvswitch.org; ovs-
> > dev at openvswitch.org; Kevin Traynor
> > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > physical port
> >
> > Hi Wang,
> >
> > If you create several PMDs on the NUMA of the physical port does that
> > have the same performance characteristic?
> >
> > /Billy
> >
> >
> >
> > > -----Original Message-----
> > > From: 王志克 [mailto:wangzhike at jd.com]
> > > Sent: Wednesday, September 6, 2017 10:20 AM
> > > To: O Mahony, Billy <billy.o.mahony at intel.com>; Darrell Ball
> > > <dball at vmware.com>; ovs-discuss at openvswitch.org; ovs-
> > > dev at openvswitch.org; Kevin Traynor <ktraynor at redhat.com>
> > > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > > physical port
> > >
> > > Hi Billy,
> > >
> > > Yes, I want to achieve better performance.
> > >
> > > The commit "dpif-netdev: Assign ports to pmds on non-local numa node"
> > > can NOT meet my needs.
> > >
> > > I do have pmd on socket 0 to poll the physical NIC which is also on socket
> 0.
> > > However, this is not enough since I also have other pmd on socket 1.
> > > I hope such pmds on socket 1 can together poll physical NIC. In this
> > > way, we have more CPU (in my case, double CPU) to poll the NIC,
> > > which results in performance improvement.
> > >
> > > BR,
> > > Wang Zhike
> > >
> > > -----Original Message-----
> > > From: O Mahony, Billy [mailto:billy.o.mahony at intel.com]
> > > Sent: Wednesday, September 06, 2017 5:14 PM
> > > To: Darrell Ball; 王志克; ovs-discuss at openvswitch.org; ovs-
> > > dev at openvswitch.org; Kevin Traynor
> > > Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > > physical port
> > >
> > > Hi Wang,
> > >
> > > A change was committed to head of master 2017-08-02 "dpif-netdev:
> > > Assign ports to pmds on non-local numa node" which if I understand
> > > your request correctly will do what you require.
> > >
> > > However it is not clear to me why you are pinning rxqs to PMDs in
> > > the first instance. Currently if you configure at least on pmd on
> > > each numa there should always be a PMD available. Is the pinning for
> > performance reasons?
> > >
> > > Regards,
> > > Billy
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Darrell Ball [mailto:dball at vmware.com]
> > > > Sent: Wednesday, September 6, 2017 8:25 AM
> > > > To: 王志克 <wangzhike at jd.com>; ovs-discuss at openvswitch.org; ovs-
> > > > dev at openvswitch.org; O Mahony, Billy <billy.o.mahony at intel.com>;
> > > Kevin
> > > > Traynor <ktraynor at redhat.com>
> > > > Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> > > > physical port
> > > >
> > > > Adding Billy and Kevin
> > > >
> > > >
> > > > On 9/6/17, 12:22 AM, "Darrell Ball" <dball at vmware.com> wrote:
> > > >
> > > >
> > > >
> > > >     On 9/6/17, 12:03 AM, "王志克" <wangzhike at jd.com> wrote:
> > > >
> > > >         Hi Darrell,
> > > >
> > > >         pmd-rxq-affinity has below limitation: (so isolated pmd
> > > > can not be used for others, which is not my expectation. Lots of
> > > > VMs come and go on the fly, and manully assignment is not feasible.)
> > > >                   >>After that PMD threads on cores where RX
> > > > queues was pinned will become isolated. This means that this
> > > > thread will poll only pinned RX queues
> > > >
> > > >         My problem is that I have several CPUs spreading on
> > > > different NUMA nodes. I hope all these CPU can have chance to
> > > > serve
> > the rxq.
> > > > However, because the phy NIC only locates on one certain socket
> > > > node, non-same numa pmd/CPU would be excluded. So I am
> wondering
> > > > whether
> > > we
> > > > can have different behavior for phy port rxq:
> > > >               round-robin to all PMDs even the pmd on different NUMA
> socket.
> > > >
> > > >         I guess this is a common case, and I believe it would
> > > > improve rx performance.
> > > >
> > > >
> > > >     [Darrell] I agree it would be a common problem and some
> > > > distribution would seem to make sense, maybe factoring in some
> > > > favoring of local numa PMDs ?
> > > >                     Maybe an optional config to enable ?
> > > >
> > > >
> > > >         Br,
> > > >         Wang Zhike
> > > >
> > > >



More information about the discuss mailing list