[ovs-dev] OVS DPDK NUMA pmd assignment question for physical port

Thu Sep 7 04:15:18 UTC 2017


-----Original Message-----
From: O Mahony, Billy [mailto:billy.o.mahony at intel.com] 
Sent: Wednesday, September 06, 2017 10:49 PM
To: Kevin Traynor; Jan Scheurich; 王志克; Darrell Ball; ovs-discuss at openvswitch.org; ovs-dev at openvswitch.org
Subject: RE: [ovs-dev] OVS DPDK NUMA pmd assignment question for physical port


> -----Original Message-----
> From: Kevin Traynor [mailto:ktraynor at redhat.com]
> Sent: Wednesday, September 6, 2017 2:50 PM
> To: Jan Scheurich <jan.scheurich at ericsson.com>; O Mahony, Billy
> <billy.o.mahony at intel.com>; wangzhike at jd.com; Darrell Ball
> <dball at vmware.com>; ovs-discuss at openvswitch.org; ovs-
> dev at openvswitch.org
> Subject: Re: [ovs-dev] OVS DPDK NUMA pmd assignment question for
> physical port
> 
> On 09/06/2017 02:33 PM, Jan Scheurich wrote:
> > Hi Billy,
> >
> >> You are going to have to take the hit crossing the NUMA boundary at
> some point if your NIC and VM are on different NUMAs.
> >>
> >> So are you saying that it is more expensive to cross the NUMA
> >> boundary from the pmd to the VM that to cross it from the NIC to the
> PMD?
> >
> > Indeed, that is the case: If the NIC crosses the QPI bus when storing
> packets in the remote NUMA there is no cost involved for the PMD. (The QPI
> bandwidth is typically not a bottleneck.) The PMD only performs local
> memory access.
> >
> > On the other hand, if the PMD crosses the QPI when copying packets into a
> remote VM, there is a huge latency penalty involved, consuming lots of PMD
> cycles that cannot be spent on processing packets. We at Ericsson have
> observed exactly this behavior.
> >
> > This latency penalty becomes even worse when the LLC cache hit rate is
> degraded due to LLC cache contention with real VNFs and/or unfavorable
> packet buffer re-use patterns as exhibited by real VNFs compared to typical
> synthetic benchmark apps like DPDK testpmd.
> >
> >>
> >> If so then in that case you'd like to have two (for example) PMDs
> >> polling 2 queues on the same NIC. With the PMDs on each of the NUMA
> nodes forwarding to the VMs local to that NUMA?
> >>
> >> Of course your NIC would then also need to be able know which VM (or
> >> at least which NUMA the VM is on) in order to send the frame to the
> correct rxq.
> >
> > That would indeed be optimal but hard to realize in the general case (e.g.
> with VXLAN encapsulation) as the actual destination is only known after
> tunnel pop. Here perhaps some probabilistic steering of RSS hash values
> based on measured distribution of final destinations might help in the future.
> >
> > But even without that in place, we need PMDs on both NUMAs anyhow
> (for NUMA-aware polling of vhostuser ports), so why not use them to also
> poll remote eth ports. We can achieve better average performance with
> fewer PMDs than with the current limitation to NUMA-local polling.
> >
> 
> If the user has some knowledge of the numa locality of ports and can place
> VM's accordingly, default cross-numa assignment can be harm performance.
> Also, it would make for very unpredictable performance from test to test and
> even for flow to flow on a datapath.
[[BO'M]] Wang's original request would constitute default cross numa assignment but I don't think this modified proposal would as it still requires explicit config to assign to the remote NUMA.

[Wangzhike] I think configuration option or compiling option are OK to me, since only phyiscal NIC rxq needs be configrued. It is only one-shot job.
Regarding the test concern, I think it is worth to clarify different performance if the new behavior improves the rx throughput a lot.
> 
> Kevin.
> 
> > BR, Jan
> >