[ovs-dev] [PATCH v3] netdev-dpdk: Allow configurable queue sizes for 'dpdk' ports

Ilya Maximets i.maximets at samsung.com
Thu Sep 22 06:38:07 UTC 2016


On 21.09.2016 20:37, Loftus, Ciara wrote:
>>
>> Few comments inline.
> 
> Thanks for the feedback Ilya. 
> 
>>
>>> The 'options:n_rxq_desc' and 'n_txq_desc' fields allow the number of rx
>>> and tx descriptors for dpdk ports to be modified. By default the values
>>> are set to 2048, but can be modified to an integer between 1 and 4096
>>> that is a power of two. The values can be modified at runtime, however
>>> require the NIC to restart when changed.
>>>
>>> Signed-off-by: Ciara Loftus <ciara.loftus at intel.com>
>>>
>>> ---
>>> v3:
>>> * Make queue sizes per-port rather than global
>>> * Check if queue size is power of 2 - fail if so.
>>>
>>> v2:
>>> * Rebase
>>>
>>>  INSTALL.DPDK-ADVANCED.md | 16 ++++++++++++++--
>>>  NEWS                     |  2 ++
>>>  lib/netdev-dpdk.c        | 48
>> +++++++++++++++++++++++++++++++++++++++++++-----
>>>  vswitchd/vswitch.xml     | 22 ++++++++++++++++++++++
>>>  4 files changed, 81 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
>>> index d7b9873..488e84f 100644
>>> --- a/INSTALL.DPDK-ADVANCED.md
>>> +++ b/INSTALL.DPDK-ADVANCED.md
>>> @@ -257,7 +257,19 @@ needs to be affinitized accordingly.
>>>    The rx queues are assigned to pmd threads on the same NUMA node in a
>>>    round-robin fashion.
>>>
>>> -### 4.4 Exact Match Cache
>>> +### 4.4 DPDK Physical Port Queue Sizes
>>> +  `ovs-vsctl set Interface dpdk0 options:n_rxq_desc=<integer>`
>>> +  `ovs-vsctl set Interface dpdk0 options:n_txq_desc=<integer>`
>>> +
>>> +  The command above sets the number of rx/tx descriptors that the NIC
>>> +  associated with dpdk0 will be initialised with.
>>> +
>>> +  Different 'n_rxq_desc' and 'n_txq_desc' configurations yield different
>>> +  benefits in terms of throughput and latency for different scenarios.
>>> +  Generally, smaller queue sizes can have a positive impact for latency at
>> the
>>> +  expense of throughput. The opposite is often true for larger queue sizes.
>>
>> Here we can mention that increasing the number of rx descriptors may lead
>> to performance degradation because of using non-vectorized rx functions.
>> At least this is true for i40e and maybe ixgbe dpdk drivers. Setting
>> 'n_rxq_desc=4096' for them will lead to disabling of vectorized rx.
> 
> It seems the same applies for ixgbe (IXGBE_MAX_RING_DESC=4096) http://dpdk.org/doc/guides/nics/ixgbe.html#rx-constraints
> I will include this info in the next version.
> 
>>
>>> +
>>> +### 4.5 Exact Match Cache
>>>
>>>    Each pmd thread contains one EMC. After initial flow setup in the
>>>    datapath, the EMC contains a single table and provides the lowest level
>>> @@ -274,7 +286,7 @@ needs to be affinitized accordingly.
>>>    avoiding datapath classifier lookups is to have multiple pmd threads
>>>    running. This can be done as described in section 4.2.
>>>
>>> -### 4.5 Rx Mergeable buffers
>>> +### 4.6 Rx Mergeable buffers
>>>
>>>    Rx Mergeable buffers is a virtio feature that allows chaining of multiple
>>>    virtio descriptors to handle large packet sizes. As such, large packets
>>> diff --git a/NEWS b/NEWS
>>> index 21ab538..901886d 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -125,6 +125,8 @@ v2.6.0 - xx xxx xxxx
>>>       * Remove dpdkvhostcuse port type.
>>>       * OVS client mode for vHost and vHost reconnect (Requires QEMU 2.7)
>>>       * 'dpdkvhostuserclient' port type.
>>> +     * New option 'n_rxq_desc' and 'n_txq_desc' fields for DPDK interfaces
>>> +       which set the number of rx and tx descriptors to use for the given
>> port.
>>>     - Increase number of registers to 16.
>>>     - ovs-benchmark: This utility has been removed due to lack of use and
>>>       bitrot.
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 89bdc4d..228993f 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -132,8 +132,9 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF /
>> ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
>>>
>>>  #define SOCKET0              0
>>>
>>> -#define NIC_PORT_RX_Q_SIZE 2048  /* Size of Physical NIC RX Queue,
>> Max (n+32<=4096)*/
>>> -#define NIC_PORT_TX_Q_SIZE 2048  /* Size of Physical NIC TX Queue,
>> Max (n+32<=4096)*/
>>> +#define NIC_PORT_DEFAULT_RXQ_SIZE 2048 /* Default size of Physical
>> NIC RXQ */
>>> +#define NIC_PORT_DEFAULT_TXQ_SIZE 2048 /* Default size of Physical
>> NIC TXQ */
>>> +#define NIC_PORT_MAX_Q_SIZE 4096       /* Maximum size of Physical
>> NIC Queue */
>>>
>>>  #define OVS_VHOST_MAX_QUEUE_NUM 1024  /* Maximum number of
>> vHost TX queues. */
>>>  #define OVS_VHOST_QUEUE_MAP_UNKNOWN (-1) /* Mapping not
>> initialized. */
>>> @@ -372,6 +373,12 @@ struct netdev_dpdk {
>>>      int requested_mtu;
>>>      int requested_n_txq;
>>>      int requested_n_rxq;
>>> +    int requested_rxq_size;
>>> +    int requested_txq_size;
>>> +
>>> +    /* Number of rx/tx descriptors for physical devices */
>>> +    int rxq_size;
>>> +    int txq_size;
>>>
>>>      /* Socket ID detected when vHost device is brought up */
>>>      int requested_socket_id;
>>> @@ -646,7 +653,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk
>> *dev, int n_rxq, int n_txq)
>>>          }
>>>
>>>          for (i = 0; i < n_txq; i++) {
>>> -            diag = rte_eth_tx_queue_setup(dev->port_id, i,
>> NIC_PORT_TX_Q_SIZE,
>>> +            diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size,
>>>                                            dev->socket_id, NULL);
>>>              if (diag) {
>>>                  VLOG_INFO("Interface %s txq(%d) setup error: %s",
>>> @@ -662,7 +669,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk
>> *dev, int n_rxq, int n_txq)
>>>          }
>>>
>>>          for (i = 0; i < n_rxq; i++) {
>>> -            diag = rte_eth_rx_queue_setup(dev->port_id, i,
>> NIC_PORT_RX_Q_SIZE,
>>> +            diag = rte_eth_rx_queue_setup(dev->port_id, i, dev->rxq_size,
>>>                                            dev->socket_id, NULL,
>>>                                            dev->dpdk_mp->mp);
>>>              if (diag) {
>>> @@ -837,6 +844,10 @@ netdev_dpdk_init(struct netdev *netdev,
>> unsigned int port_no,
>>>      netdev->n_txq = NR_QUEUE;
>>>      dev->requested_n_rxq = netdev->n_rxq;
>>>      dev->requested_n_txq = netdev->n_txq;
>>> +    dev->rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE;
>>> +    dev->txq_size = NIC_PORT_DEFAULT_TXQ_SIZE;
>>> +    dev->requested_rxq_size = dev->rxq_size;
>>> +    dev->requested_txq_size = dev->txq_size;
>>>
>>>      /* Initialize the flow control to NULL */
>>>      memset(&dev->fc_conf, 0, sizeof dev->fc_conf);
>>> @@ -1051,6 +1062,8 @@ netdev_dpdk_get_config(const struct netdev
>> *netdev, struct smap *args)
>>>      smap_add_format(args, "configured_rx_queues", "%d", netdev-
>>> n_rxq);
>>>      smap_add_format(args, "requested_tx_queues", "%d", dev-
>>> requested_n_txq);
>>>      smap_add_format(args, "configured_tx_queues", "%d", netdev-
>>> n_txq);
>>> +    smap_add_format(args, "rxq_descriptors", "%d", dev->rxq_size);
>>> +    smap_add_format(args, "txq_descriptors", "%d", dev->txq_size);
>>>      smap_add_format(args, "mtu", "%d", dev->mtu);
>>>      ovs_mutex_unlock(&dev->mutex);
>>>
>>> @@ -1069,6 +1082,21 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev,
>> const struct smap *args)
>>>      }
>>>  }
>>>
>>> +static void
>>> +dpdk_process_queue_size(struct netdev *netdev, const struct smap
>> *args,
>>> +                        char *flag, int *new_size)
>>> +{
>>> +    int queue_size;
>>> +
>>> +    queue_size = smap_get_int(args, flag, 0);
>>> +    if (queue_size > 0 && queue_size <= NIC_PORT_MAX_Q_SIZE
>>> +            && rte_is_power_of_2(queue_size)
>>
>> I'm suggesting to use OVS internal functions to do the check: 'IS_POW2()'
>> macro or function 'is_pow2()'.
>>
>>> +            && queue_size != *new_size) {
>>> +        *new_size = queue_size;
>>> +        netdev_request_reconfigure(netdev);
>>> +    }
>>
>> Also, some sane error message would be nice in case of wrong value.
> 
> This is generally avoided for set_config since it is called so often. An error isn't printed in when setting n_rxq for example.
> If the user sets a bogus value you can end up with many prints of the same log. And these logs again later when you set another value eg. n_rxq.
> For this reason I don't plan to make this change.
> 
> Thanks,
> Ciara

Yes, you're right. My bad.

>>
>>> +}
>>> +
>>>  static int
>>>  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
>>>  {
>>> @@ -1078,6 +1106,11 @@ netdev_dpdk_set_config(struct netdev
>> *netdev, const struct smap *args)
>>>
>>>      dpdk_set_rxq_config(dev, args);
>>>
>>> +    dpdk_process_queue_size(netdev, args, "n_rxq_desc",
>>> +                            &dev->requested_rxq_size);
>>> +    dpdk_process_queue_size(netdev, args, "n_txq_desc",
>>> +                            &dev->requested_txq_size);
>>> +
>>>      /* Flow control support is only available for DPDK Ethernet ports. */
>>>      bool rx_fc_en = false;
>>>      bool tx_fc_en = false;
>>> @@ -2923,7 +2956,9 @@ netdev_dpdk_reconfigure(struct netdev
>> *netdev)
>>>
>>>      if (netdev->n_txq == dev->requested_n_txq
>>>          && netdev->n_rxq == dev->requested_n_rxq
>>> -        && dev->mtu == dev->requested_mtu) {
>>> +        && dev->mtu == dev->requested_mtu
>>> +        && dev->rxq_size == dev->requested_rxq_size
>>> +        && dev->txq_size == dev->requested_txq_size) {
>>>          /* Reconfiguration is unnecessary */
>>>
>>>          goto out;
>>> @@ -2938,6 +2973,9 @@ netdev_dpdk_reconfigure(struct netdev
>> *netdev)
>>>      netdev->n_txq = dev->requested_n_txq;
>>>      netdev->n_rxq = dev->requested_n_rxq;
>>>
>>> +    dev->rxq_size = dev->requested_rxq_size;
>>> +    dev->txq_size = dev->requested_txq_size;
>>> +
>>>      rte_free(dev->tx_q);
>>>      err = dpdk_eth_dev_init(dev);
>>>      netdev_dpdk_alloc_txq(dev, netdev->n_txq);
>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>> index e73023d..1e96ada 100644
>>> --- a/vswitchd/vswitch.xml
>>> +++ b/vswitchd/vswitch.xml
>>> @@ -2375,6 +2375,28 @@
>>>            Only supported by dpdkvhostuserclient interfaces.
>>>          </p>
>>>        </column>
>>> +
>>> +      <column name="options" key="n_rxq_desc"
>>> +              type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'>
>>> +        <p>
>>> +          Specifies the rx queue size (number rx descriptors) for dpdk ports.
>>> +          The value must be a multiple of 2, less than 4096 and supported
>>
>> s/multiple/power/
>>
>>> +          by the hardware of the device being configured.
>>> +          If not specified or an incorrect value is specified, 2048 rx
>>> +          descriptors will be used by default.
>>> +        </p>
>>> +      </column>
>>> +
>>> +      <column name="options" key="n_txq_desc"
>>> +              type='{"type": "integer", "minInteger": 1, "maxInteger": 4096}'>
>>> +        <p>
>>> +          Specifies the tx queue size (number tx descriptors) for dpdk ports.
>>> +          The value must be a multiple of 2, less than 4096 and supported
>>
>> s/multiple/power/
>>
>>> +          by the hardware of the device being configured.
>>> +          If not specified or an incorrect value is specified, 2048 tx
>>> +          descriptors will be used by default.
>>> +        </p>
>>> +      </column>
>>>      </group>
>>>
>>>      <group title="MTU">
>>> --
>>> 2.4.3
> 



More information about the dev mailing list