[ovs-dev] Setting pmd-cpu-mask in cluster-on-die mode

Mauricio Vasquez mauricio.vasquez at polito.it
Tue Sep 20 20:57:17 UTC 2016


On 09/19/2016 10:28 AM, Jeremias Blendin wrote:
> Hello Mauricio,
>
> thank you for the pointer to the bin-to-hex conversion, I knew it
> looked strange but I could not see why m)
> In any case, it was just an example, the actual configuration for
> testing is correct.
>
>
> 2016-09-18 18:25 GMT+02:00 Mauricio Vasquez <mauricio.vasquez at polito.it>:
>> Hello Jeremias,
>>
>>
>> On 09/18/2016 05:46 PM, Jeremias Blendin wrote:
>>> Hi,
>>>
>>> I set pmd-cpu-mask on a server running its processors in
>>> cluster-on-die mode. This means, that the actual cpu topology is shown
>>> to the os as shown below.
>>>
>>> The problem I have is that although OVS is allowed to use all available
>>> CPUs:
>>>
>>> $ ovs-vsctl --no-wait get Open_vSwitch . other_config:pmd-cpu-mask
>>> "0x1787586c4fa8a01c71bc"
>>> (=python hex(int('111111111111111111111111')))
>> The cpu mask is calculated from a binary, so in your case if you want ovs to
>> have access to all cores it should be 0xFFFFF... ->
>> python hex(int('11111...'), 2)
>>
>>
>>> OVS only uses the CPUs that are located in the first NUMA Node where
>>> the OS is running:
>>> $ ovs-appctl dpif-netdev/pmd-stats-show | grep ^pmd
>>> pmd thread numa_id 0 core_id 2:
>>> pmd thread numa_id 0 core_id 12:
>>> pmd thread numa_id 0 core_id 13:
>>> pmd thread numa_id 0 core_id 14:
>>>
>>> I restarted OVS multiple times, I tried to pin queues to specific cores:
>>>
>>> ovs-vsctl set interface dpdk0 other_config:pmd-rxq-affinity="0:2,1:12"
>>> ovs-vsctl set interface dpdk1 other_config:pmd-rxq-affinity="0:13,1:14"
>>> ovs-vsctl set interface vif3 other_config:pmd-rxq-affinity="0:3,1:3"
>>> ovs-vsctl set interface vif4 other_config:pmd-rxq-affinity="0:4,1:4"
>>>
>>> but with the same result, cores in other numa nodes are not used:
>>>
>>> /usr/local/var/log/openvswitch/ovs-vswitchd.log
>>> 2016-09-18T15:25:04.327Z|00080|dpif_netdev|INFO|Created 4 pmd threads
>>> on numa node 0
>>> 2016-09-18T15:25:04.327Z|00081|dpif_netdev|WARN|There is no PMD thread
>>> on core 3. Queue 0 on port 'vif3' will not be polled.
>>> 2016-09-18T15:25:04.327Z|00082|dpif_netdev|WARN|There is no PMD thread
>>> on core 4. Queue 0 on port 'vif4' will not be polled.
>>> 2016-09-18T15:25:04.327Z|00083|dpif_netdev|WARN|There's no available
>>> pmd thread on numa node 0
>>> 2016-09-18T15:25:04.327Z|00084|dpif_netdev|WARN|There's no available
>>> pmd thread on numa node 0
>>>
>>> The log output seems to indicate that only numa node 0 is used for
>>> some reason? Can anyone confirm this?
>> OVS only creates pmd threads in sockets where there are ports, in the case
>> of physical ports the numa node is defined by where the ports are connected
>> on the server, in the case of dpdkvhostuser ports, it is defined by where
>> memory of the virtio device is allocated.
> That is an interesting point, I create the dpdkvhostuser ports with
> Open vSwitch:
> ovs-vsctl add-port br0 vif0 -- set Interface vif0 type=dpdkvhostuser
> How can I define which memory it should use?
I don't know how can it be defined, but I found in the documentation [1] 
that CONFIG_RTE_LIBRTE_VHOST_NUMA=y should be set in order to 
automatically detect the numa node of vhostuser ports.

>
>> Probably in your case physical ports and the memory of the virtio devices
>> are on socket0.
> As COD is active, I have two numa nodes per socket. So yes, the VM and
> OVS are located on socket 0 but in different numa nodes.
> OVS has memory on all nodes (4G), the VM has only memory on numa node 1.
> However, this numa node (1) is never used by OVS, although the VM is
> located there. I guess I could fix this issue by deactivating COD, but
> this
> has other drawbacks. Is there any way to directly tell OVS to run pmds
> on a specific numa node?
Not, PMDs in a numa node are only created if there are interfaces in 
that numa node.


> I understand that runnings pmds on a
> different socket
> might be an issue, but it seems weird to me that pmds cannot run on a
> different numa node on the same socket.
>
> Thanks!
>
> Jeremias
>
>> Regards,
>>
>> Mauricio Vasquez
>>> Best regards,
>>>
>>> Jeremias
>>>
>>>
>>> ovs-vsctl show
>>>       Bridge "br0"
>>>           Controller "tcp:<ctrl>:6633"
>>>               is_connected: true
>>>           Port "vif0"
>>>               Interface "vif0"
>>>                   type: dpdkvhostuser
>>>                   options: {n_rxq="2"}
>>>           Port "dpdk1"
>>>               Interface "dpdk1"
>>>                   type: dpdk
>>>                   options: {n_rxq="2"}
>>>           Port "vif3"
>>>               Interface "vif3"
>>>                   type: dpdkvhostuser
>>>                   options: {n_rxq="2"}
>>>           Port "dpdk0"
>>>               Interface "dpdk0"
>>>                   type: dpdk
>>>                   options: {n_rxq="2"}
>>>           Port "vif1"
>>>               Interface "vif1"
>>>                   type: dpdkvhostuser
>>>                   options: {n_rxq="2"}
>>>           Port "br0"
>>>               Interface "br0"
>>>                   type: internal
>>>           Port "vif4"
>>>               Interface "vif4"
>>>                   type: dpdkvhostuser
>>>                   options: {n_rxq="2"}
>>>       ovs_version: "2.6.90"
>>>
>>>
>>> OVS (last commit):
>>> commit 75e2077e0c43224bcca92746b28b01a4936fc101
>>> Author: Thadeu Lima de Souza Cascardo <cascardo at redhat.com>
>>> Date:   Fri Sep 16 15:52:48 2016 -0300
>>>
>>>
>>> CPU topology:
>>> lstopo -p
>>>
>>> Machine (252GB total)
>>>     Package P#0
>>>       NUMANode P#0 (63GB) +  L3 (15MB)
>>>           L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#0
>>>           L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#1
>>>           L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#4 + PU P#2
>>>           L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#12
>>>           L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#13
>>>           L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#5 + PU P#14
>>>       NUMANode P#1 (63GB) + L3 (15MB)
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#8 + PU P#3
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#10 + PU P#4
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#12 + PU P#5
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#9 + PU P#15
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#11 + PU P#16
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#13 + PU P#17
>>>     Package P#1
>>>       NUMANode P#2 (63GB) + L3 (15MB)
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#6
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#7
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#4 + PU P#8
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#18
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#19
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#5 + PU P#20
>>>       NUMANode P#3 (63GB) + L3 (15MB)
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#8 + PU P#9
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#10 + PU P#10
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#12 + PU P#11
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#9 + PU P#21
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#11 + PU P#22
>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#13 + PU P#23
>>>
>>> Other info:
>>>
>>> $ uname -a
>>> Linux nfvi1 4.4.0-34-lowlatency #53-Ubuntu SMP PREEMPT Wed Jul 27
>>> 19:23:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> $ cat /proc/cmdline
>>> BOOT_IMAGE=/boot/vmlinuz-4.4.0-34-lowlatency root=UUID=<whatever> ro
>>> default_hugepagesz=1GB hugepagesz=1G hugepages=100 isolcpus=2-23
>>> nohz_full=2-23 rcu_nocbs=2-23 apparmor=0
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> http://openvswitch.org/mailman/listinfo/dev
>>
[1] 
https://github.com/openvswitch/ovs/blob/master/INSTALL.DPDK-ADVANCED.md#36-numacluster-on-die 




More information about the dev mailing list