[ovs-dev] [PATCH v3] netdev-dpdk: Allow configurable queue sizes for 'dpdk' ports
Ilya Maximets
i.maximets at samsung.com
Thu Sep 22 06:38:07 UTC 2016
On 21.09.2016 20:37, Loftus, Ciara wrote:
>>
>> Few comments inline.
>
> Thanks for the feedback Ilya.
>
>>
>>> The 'options:n_rxq_desc' and 'n_txq_desc' fields allow the number of rx
>>> and tx descriptors for dpdk ports to be modified. By default the values
>>> are set to 2048, but can be modified to an integer between 1 and 4096
>>> that is a power of two. The values can be modified at runtime, however
>>> require the NIC to restart when changed.
>>>
>>> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
>>>
>>> ---
>>> v3:
>>> * Make queue sizes per-port rather than global
>>> * Check if queue size is power of 2 - fail if so.
>>>
>>> v2:
>>> * Rebase
>>>
>>> INSTALL.DPDK-ADVANCED.md | 16 ++++++++++++++--
>>> NEWS | 2 ++
>>> lib/netdev-dpdk.c | 48
>> +++++++++++++++++++++++++++++++++++++++++++-----
>>> vswitchd/vswitch.xml | 22 ++++++++++++++++++++++
>>> 4 files changed, 81 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
>>> index d7b9873..488e84f 100644
>>> --- a/INSTALL.DPDK-ADVANCED.md
>>> +++ b/INSTALL.DPDK-ADVANCED.md
>>> @@ -257,7 +257,19 @@ needs to be affinitized accordingly.
>>> The rx queues are assigned to pmd threads on the same NUMA node in a
>>> round-robin fashion.
>>>
>>> -### 4.4 Exact Match Cache
>>> +### 4.4 DPDK Physical Port Queue Sizes
>>> + `ovs-vsctl set Interface dpdk0 options:n_rxq_desc=<integer>`
>>> + `ovs-vsctl set Interface dpdk0 options:n_txq_desc=<integer>`
>>> +
>>> + The command above sets the number of rx/tx descriptors that the NIC
>>> + associated with dpdk0 will be initialised with.
>>> +
>>> + Different 'n_rxq_desc' and 'n_txq_desc' configurations yield different
>>> + benefits in terms of throughput and latency for different scenarios.
>>> + Generally, smaller queue sizes can have a positive impact for latency at
>> the
>>> + expense of throughput. The opposite is often true for larger queue sizes.
>>
>> Here we can mention that increasing the number of rx descriptors may lead
>> to performance degradation because of using non-vectorized rx functions.
>> At least this is true for i40e and maybe ixgbe dpdk drivers. Setting
>> 'n_rxq_desc=4096' for them will lead to disabling of vectorized rx.
>
> It seems the same applies for ixgbe (IXGBE_MAX_RING_DESC=4096) http://dpdk.org/doc/guides/nics/ixgbe.html#rx-constraints
> I will include this info in the next version.
>
>>
>>> +
>>> +### 4.5 Exact Match Cache
>>>
>>> Each pmd thread contains one EMC. After initial flow setup in the
>>> datapath, the EMC contains a single table and provides the lowest level
>>> @@ -274,7 +286,7 @@ needs to be affinitized accordingly.
>>> avoiding datapath classifier lookups is to have multiple pmd threads
>>> running. This can be done as described in section 4.2.
>>>
>>> -### 4.5 Rx Mergeable buffers
>>> +### 4.6 Rx Mergeable buffers
>>>
>>> Rx Mergeable buffers is a virtio feature that allows chaining of multiple
>>> virtio descriptors to handle large packet sizes. As such, large packets
>>> diff --git a/NEWS b/NEWS
>>> index 21ab538..901886d 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -125,6 +125,8 @@ v2.6.0 - xx xxx xxxx
>>> * Remove dpdkvhostcuse port type.
>>> * OVS client mode for vHost and vHost reconnect (Requires QEMU 2.7)
>>> * 'dpdkvhostuserclient' port type.
>>> + * New option 'n_rxq_desc' and 'n_txq_desc' fields for DPDK interfaces
>>> + which set the number of rx and tx descriptors to use for the given
>> port.
>>> - Increase number of registers to 16.
>>> - ovs-benchmark: This utility has been removed due to lack of use and
>>> bitrot.
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 89bdc4d..228993f 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -132,8 +132,9 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF /
>> ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
>>>
>>> #define SOCKET0 0
>>>
>>> -#define NIC_PORT_RX_Q_SIZE 2048 /* Size of Physical NIC RX Queue,
>> Max (n+32<=4096)*/
>>> -#define NIC_PORT_TX_Q_SIZE 2048 /* Size of Physical NIC TX Queue,
>> Max (n+32<=4096)*/
>>> +#define NIC_PORT_DEFAULT_RXQ_SIZE 2048 /* Default size of Physical
>> NIC RXQ */
>>> +#define NIC_PORT_DEFAULT_TXQ_SIZE 2048 /* Default size of Physical
>> NIC TXQ */
>>> +#define NIC_PORT_MAX_Q_SIZE 4096 /* Maximum size of Physical
>> NIC Queue */
>>>
>>> #define OVS_VHOST_MAX_QUEUE_NUM 1024 /* Maximum number of
>> vHost TX queues. */
>>> #define OVS_VHOST_QUEUE_MAP_UNKNOWN (-1) /* Mapping not
>> initialized. */
>>> @@ -372,6 +373,12 @@ struct netdev_dpdk {
>>> int requested_mtu;
>>> int requested_n_txq;
>>> int requested_n_rxq;
>>> + int requested_rxq_size;
>>> + int requested_txq_size;
>>> +
>>> + /* Number of rx/tx descriptors for physical devices */
>>> + int rxq_size;
>>> + int txq_size;
>>>
>>> /* Socket ID detected when vHost device is brought up */
>>> int requested_socket_id;
>>> @@ -646,7 +653,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk
>> *dev, int n_rxq, int n_txq)
>>> }
>>>
>>> for (i = 0; i < n_txq; i++) {
>>> - diag = rte_eth_tx_queue_setup(dev->port_id, i,
>> NIC_PORT_TX_Q_SIZE,
>>> + diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size,
>>> dev->socket_id, NULL);
>>> if (diag) {
>>> VLOG_INFO("Interface %s txq(%d) setup error: %s",
>>> @@ -662,7 +669,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk
>> *dev, int n_rxq, int n_txq)
>>> }
>>>
>>> for (i = 0; i < n_rxq; i++) {
>>> - diag = rte_eth_rx_queue_setup(dev->port_id, i,
>> NIC_PORT_RX_Q_SIZE,
>>> + diag = rte_eth_rx_queue_setup(dev->port_id, i, dev->rxq_size,
>>> dev->socket_id, NULL,
>>> dev->dpdk_mp->mp);
>>> if (diag) {
>>> @@ -837,6 +844,10 @@ netdev_dpdk_init(struct netdev *netdev,
>> unsigned int port_no,
>>> netdev->n_txq = NR_QUEUE;
>>> dev->requested_n_rxq = netdev->n_rxq;
>>> dev->requested_n_txq = netdev->n_txq;
>>> + dev->rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE;
>>> + dev->txq_size = NIC_PORT_DEFAULT_TXQ_SIZE;
>>> + dev->requested_rxq_size = dev->rxq_size;
>>> + dev->requested_txq_size = dev->txq_size;
>>>
>>> /* Initialize the flow control to NULL */
>>> memset(&dev->fc_conf, 0, sizeof dev->fc_conf);
>>> @@ -1051,6 +1062,8 @@ netdev_dpdk_get_config(const struct netdev
>> *netdev, struct smap *args)
>>> smap_add_format(args, "configured_rx_queues", "%d", netdev-
>>> n_rxq);
>>> smap_add_format(args, "requested_tx_queues", "%d", dev-
>>> requested_n_txq);
>>> smap_add_format(args, "configured_tx_queues", "%d", netdev-
>>> n_txq);
>>> + smap_add_format(args, "rxq_descriptors", "%d", dev->rxq_size);
>>> + smap_add_format(args, "txq_descriptors", "%d", dev->txq_size);
>>> smap_add_format(args, "mtu", "%d", dev->mtu);
>>> ovs_mutex_unlock(&dev->mutex);
>>>
>>> @@ -1069,6 +1082,21 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev,
>> const struct smap *args)
>>> }
>>> }
>>>
>>> +static void
>>> +dpdk_process_queue_size(struct netdev *netdev, const struct smap
>> *args,
>>> + char *flag, int *new_size)
>>> +{
>>> + int queue_size;
>>> +
>>> + queue_size = smap_get_int(args, flag, 0);
>>> + if (queue_size > 0 && queue_size <= NIC_PORT_MAX_Q_SIZE
>>> + && rte_is_power_of_2(queue_size)
>>
>> I'm suggesting to use OVS internal functions to do the check: 'IS_POW2()'
>> macro or function 'is_pow2()'.
>>
>>> + && queue_size != *new_size) {
>>> + *new_size = queue_size;
>>> + netdev_request_reconfigure(netdev);
>>> + }
>>
>> Also, some sane error message would be nice in case of wrong value.
>
> This is generally avoided for set_config since it is called so often. An error isn't printed in when setting n_rxq for example.
> If the user sets a bogus value you can end up with many prints of the same log. And these logs again later when you set another value eg. n_rxq.
> For this reason I don't plan to make this change.
>
> Thanks,
> Ciara
Yes, you're right. My bad.
>>
>>> +}
>>> +
>>> static int
>>> netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
>>> {
>>> @@ -1078,6 +1106,11 @@ netdev_dpdk_set_config(struct netdev
>> *netdev, const struct smap *args)
>>>
>>> dpdk_set_rxq_config(dev, args);
>>>
>>> + dpdk_process_queue_size(netdev, args, "n_rxq_desc",
>>> + &dev->requested_rxq_size);
>>> + dpdk_process_queue_size(netdev, args, "n_txq_desc",
>>> + &dev->requested_txq_size);
>>> +
>>> /* Flow control support is only available for DPDK Ethernet ports. */
>>> bool rx_fc_en = false;
>>> bool tx_fc_en = false;
>>> @@ -2923,7 +2956,9 @@ netdev_dpdk_reconfigure(struct netdev
>> *netdev)
>>>
>>> if (netdev->n_txq == dev->requested_n_txq
>>> && netdev->n_rxq == dev->requested_n_rxq
>>> - && dev->mtu == dev->requested_mtu) {
>>> + && dev->mtu == dev->requested_mtu
>>> + && dev->rxq_size == dev->requested_rxq_size
>>> + && dev->txq_size == dev->requested_txq_size) {
>>> /* Reconfiguration is unnecessary */
>>>
>>> goto out;
>>> @@ -2938,6 +2973,9 @@ netdev_dpdk_reconfigure(struct netdev
>> *netdev)
>>> netdev->n_txq = dev->requested_n_txq;
>>> netdev->n_rxq = dev->requested_n_rxq;
>>>
>>> + dev->rxq_size = dev->requested_rxq_size;
>>> + dev->txq_size = dev->requested_txq_size;
>>> +
>>> rte_free(dev->tx_q);
>>> err = dpdk_eth_dev_init(dev);
>>> netdev_dpdk_alloc_txq(dev, netdev->n_txq);
>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>> index e73023d..1e96ada 100644
>>> --- a/vswitchd/vswitch.xml
>>> +++ b/vswitchd/vswitch.xml
>>> @@ -2375,6 +2375,28 @@
>>> Only supported by dpdkvhostuserclient interfaces.
>>> </p>
>>> </column>
>>> +
>>> + <column name="options" key="n_rxq_desc"
>>> + type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'>
>>> + <p>
>>> + Specifies the rx queue size (number rx descriptors) for dpdk ports.
>>> + The value must be a multiple of 2, less than 4096 and supported
>>
>> s/multiple/power/
>>
>>> + by the hardware of the device being configured.
>>> + If not specified or an incorrect value is specified, 2048 rx
>>> + descriptors will be used by default.
>>> + </p>
>>> + </column>
>>> +
>>> + <column name="options" key="n_txq_desc"
>>> + type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'>
>>> + <p>
>>> + Specifies the tx queue size (number tx descriptors) for dpdk ports.
>>> + The value must be a multiple of 2, less than 4096 and supported
>>
>> s/multiple/power/
>>
>>> + by the hardware of the device being configured.
>>> + If not specified or an incorrect value is specified, 2048 tx
>>> + descriptors will be used by default.
>>> + </p>
>>> + </column>
>>> </group>
>>>
>>> <group title="MTU">
>>> --
>>> 2.4.3
>
More information about the dev
mailing list