[ovs-dev] Setting pmd-cpu-mask in cluster-on-die mode

Jeremias Blendin jblendin at kom.e-technik.tu-darmstadt.de
Thu Sep 22 14:59:36 UTC 2016


Hi,

thank you for the pointers. The problem is solved, for details, see below.

2016-09-20 22:57 GMT+02:00 Mauricio Vasquez <mauricio.vasquez at polito.it>:
>
> On 09/19/2016 10:28 AM, Jeremias Blendin wrote:
>>
>> Hello Mauricio,
>>
>> thank you for the pointer to the bin-to-hex conversion, I knew it
>> looked strange but I could not see why m)
>> In any case, it was just an example, the actual configuration for
>> testing is correct.
>>
>>
>> 2016-09-18 18:25 GMT+02:00 Mauricio Vasquez <mauricio.vasquez at polito.it>:
>>>
>>> Hello Jeremias,
>>>
>>>
>>> On 09/18/2016 05:46 PM, Jeremias Blendin wrote:
>>>>
>>>> Hi,
>>>>
>>>> I set pmd-cpu-mask on a server running its processors in
>>>> cluster-on-die mode. This means, that the actual cpu topology is shown
>>>> to the os as shown below.
>>>>
>>>> The problem I have is that although OVS is allowed to use all available
>>>> CPUs:
>>>>
>>>> $ ovs-vsctl --no-wait get Open_vSwitch . other_config:pmd-cpu-mask
>>>> "0x1787586c4fa8a01c71bc"
>>>> (=python hex(int('111111111111111111111111')))
>>>
>>> The cpu mask is calculated from a binary, so in your case if you want ovs
>>> to
>>> have access to all cores it should be 0xFFFFF... ->
>>> python hex(int('11111...'), 2)
>>>
>>>
>>>> OVS only uses the CPUs that are located in the first NUMA Node where
>>>> the OS is running:
>>>> $ ovs-appctl dpif-netdev/pmd-stats-show | grep ^pmd
>>>> pmd thread numa_id 0 core_id 2:
>>>> pmd thread numa_id 0 core_id 12:
>>>> pmd thread numa_id 0 core_id 13:
>>>> pmd thread numa_id 0 core_id 14:
>>>>
>>>> I restarted OVS multiple times, I tried to pin queues to specific cores:
>>>>
>>>> ovs-vsctl set interface dpdk0 other_config:pmd-rxq-affinity="0:2,1:12"
>>>> ovs-vsctl set interface dpdk1 other_config:pmd-rxq-affinity="0:13,1:14"
>>>> ovs-vsctl set interface vif3 other_config:pmd-rxq-affinity="0:3,1:3"
>>>> ovs-vsctl set interface vif4 other_config:pmd-rxq-affinity="0:4,1:4"
>>>>
>>>> but with the same result, cores in other numa nodes are not used:
>>>>
>>>> /usr/local/var/log/openvswitch/ovs-vswitchd.log
>>>> 2016-09-18T15:25:04.327Z|00080|dpif_netdev|INFO|Created 4 pmd threads
>>>> on numa node 0
>>>> 2016-09-18T15:25:04.327Z|00081|dpif_netdev|WARN|There is no PMD thread
>>>> on core 3. Queue 0 on port 'vif3' will not be polled.
>>>> 2016-09-18T15:25:04.327Z|00082|dpif_netdev|WARN|There is no PMD thread
>>>> on core 4. Queue 0 on port 'vif4' will not be polled.
>>>> 2016-09-18T15:25:04.327Z|00083|dpif_netdev|WARN|There's no available
>>>> pmd thread on numa node 0
>>>> 2016-09-18T15:25:04.327Z|00084|dpif_netdev|WARN|There's no available
>>>> pmd thread on numa node 0
>>>>
>>>> The log output seems to indicate that only numa node 0 is used for
>>>> some reason? Can anyone confirm this?
>>>
>>> OVS only creates pmd threads in sockets where there are ports, in the
>>> case
>>> of physical ports the numa node is defined by where the ports are
>>> connected
>>> on the server, in the case of dpdkvhostuser ports, it is defined by where
>>> memory of the virtio device is allocated.
>>
>> That is an interesting point, I create the dpdkvhostuser ports with
>> Open vSwitch:
>> ovs-vsctl add-port br0 vif0 -- set Interface vif0 type=dpdkvhostuser
>> How can I define which memory it should use?
>
> I don't know how can it be defined, but I found in the documentation [1]
> that CONFIG_RTE_LIBRTE_VHOST_NUMA=y should be set in order to automatically
> detect the numa node of vhostuser ports.
>
>>
>>> Probably in your case physical ports and the memory of the virtio devices
>>> are on socket0.
This was the source of the troubles. It seems that if one creates
vhostuser interfaces, the
memory of those interfaces is always located in numa node 0. The trick
is to use the new
vhostuserclient feature and to create the vhostuser server with Qemu.
Qemu will use
memory from the same numa node as the VM and therefore OVS will move its pmds
to the same node.

Problem solved,

thanks!

Jeremias
>>
>> As COD is active, I have two numa nodes per socket. So yes, the VM and
>> OVS are located on socket 0 but in different numa nodes.
>> OVS has memory on all nodes (4G), the VM has only memory on numa node 1.
>> However, this numa node (1) is never used by OVS, although the VM is
>> located there. I guess I could fix this issue by deactivating COD, but
>> this
>> has other drawbacks. Is there any way to directly tell OVS to run pmds
>> on a specific numa node?
>
> Not, PMDs in a numa node are only created if there are interfaces in that
> numa node.
>
>
>
>> I understand that runnings pmds on a
>> different socket
>> might be an issue, but it seems weird to me that pmds cannot run on a
>> different numa node on the same socket.
>>
>> Thanks!
>>
>> Jeremias
>>
>>> Regards,
>>>
>>> Mauricio Vasquez
>>>>
>>>> Best regards,
>>>>
>>>> Jeremias
>>>>
>>>>
>>>> ovs-vsctl show
>>>>       Bridge "br0"
>>>>           Controller "tcp:<ctrl>:6633"
>>>>               is_connected: true
>>>>           Port "vif0"
>>>>               Interface "vif0"
>>>>                   type: dpdkvhostuser
>>>>                   options: {n_rxq="2"}
>>>>           Port "dpdk1"
>>>>               Interface "dpdk1"
>>>>                   type: dpdk
>>>>                   options: {n_rxq="2"}
>>>>           Port "vif3"
>>>>               Interface "vif3"
>>>>                   type: dpdkvhostuser
>>>>                   options: {n_rxq="2"}
>>>>           Port "dpdk0"
>>>>               Interface "dpdk0"
>>>>                   type: dpdk
>>>>                   options: {n_rxq="2"}
>>>>           Port "vif1"
>>>>               Interface "vif1"
>>>>                   type: dpdkvhostuser
>>>>                   options: {n_rxq="2"}
>>>>           Port "br0"
>>>>               Interface "br0"
>>>>                   type: internal
>>>>           Port "vif4"
>>>>               Interface "vif4"
>>>>                   type: dpdkvhostuser
>>>>                   options: {n_rxq="2"}
>>>>       ovs_version: "2.6.90"
>>>>
>>>>
>>>> OVS (last commit):
>>>> commit 75e2077e0c43224bcca92746b28b01a4936fc101
>>>> Author: Thadeu Lima de Souza Cascardo <cascardo at redhat.com>
>>>> Date:   Fri Sep 16 15:52:48 2016 -0300
>>>>
>>>>
>>>> CPU topology:
>>>> lstopo -p
>>>>
>>>> Machine (252GB total)
>>>>     Package P#0
>>>>       NUMANode P#0 (63GB) +  L3 (15MB)
>>>>           L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#0
>>>>           L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#1
>>>>           L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#4 + PU P#2
>>>>           L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#12
>>>>           L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#13
>>>>           L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#5 + PU P#14
>>>>       NUMANode P#1 (63GB) + L3 (15MB)
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#8 + PU P#3
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#10 + PU P#4
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#12 + PU P#5
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#9 + PU P#15
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#11 + PU P#16
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#13 + PU P#17
>>>>     Package P#1
>>>>       NUMANode P#2 (63GB) + L3 (15MB)
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#6
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#7
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#4 + PU P#8
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#18
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#19
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#5 + PU P#20
>>>>       NUMANode P#3 (63GB) + L3 (15MB)
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#8 + PU P#9
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#10 + PU P#10
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#12 + PU P#11
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#9 + PU P#21
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#11 + PU P#22
>>>>         L2 (256KB) + L1d (32KB) + L1i (32KB) + Core P#13 + PU P#23
>>>>
>>>> Other info:
>>>>
>>>> $ uname -a
>>>> Linux nfvi1 4.4.0-34-lowlatency #53-Ubuntu SMP PREEMPT Wed Jul 27
>>>> 19:23:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> $ cat /proc/cmdline
>>>> BOOT_IMAGE=/boot/vmlinuz-4.4.0-34-lowlatency root=UUID=<whatever> ro
>>>> default_hugepagesz=1GB hugepagesz=1G hugepages=100 isolcpus=2-23
>>>> nohz_full=2-23 rcu_nocbs=2-23 apparmor=0
>>>> _______________________________________________
>>>> dev mailing list
>>>> dev at openvswitch.org
>>>> http://openvswitch.org/mailman/listinfo/dev
>>>
>>>
> [1]
> https://github.com/openvswitch/ovs/blob/master/INSTALL.DPDK-ADVANCED.md#36-numacluster-on-die



More information about the dev mailing list