[ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS functionality.

Stokes, Ian ian.stokes at intel.com
Thu Feb 4 09:30:59 UTC 2016


Hi all,

Just wondering is there any feedback for the V3 patch below for QoS functionality?

This patch has been in and out of review since October, I think I've reworked it to meet any specific issues raised.

If there is anything of concern holding it back from being committed I'd really appreciate it if someone could review and provide feedback.

I also have an ingress policing patch in the works. Any feedback regarding the usage of the rte_meter in QoS here would also be useful for that work as the datapath implementations are quite similar.

Thanks
Ian

> -----Original Message-----
> From: Stokes, Ian
> Sent: Monday, February 01, 2016 8:47 PM
> To: dev at openvswitch.org
> Cc: Stokes, Ian
> Subject: [PATCH v3] netdev_dpdk.c: Add QoS functionality.
> 
> This patch provides the modifications required in netdev-dpdk.c and
> vswitch.xml to allow for a DPDK user space QoS algorithm.
> 
> This patch adds a QoS configuration structure for netdev-dpdk and
> expected QoS operations 'dpdk_qos_ops'. Various helper functions are
> also supplied.
> 
> Also included are the modifications required for vswitch.xml to allow a
> new QoS implementation for netdev-dpdk devices. This includes a new QoS
> type `egress-policer` as well as its expected QoS table entries.
> 
> The QoS functionality implemented for DPDK devices is `egress-policer`.
> This can be used to drop egress packets at a configurable rate.
> 
> The INSTALL.DPDK.md guide has also been modified to provide an example
> configuration of `egress-policer` QoS.
> 
> Signed-off-by: Ian Stokes <ian.stokes at intel.com>
> ---
>  INSTALL.DPDK.md      |   20 +++
>  lib/netdev-dpdk.c    |  439
> ++++++++++++++++++++++++++++++++++++++++++++++++--
>  vswitchd/vswitch.xml |   52 ++++++
>  3 files changed, 498 insertions(+), 13 deletions(-)
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index e8ef4b5..dac70cd
> 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd:
>     ./ovs-ofctl add-flow br0 in_port=2,action=output:1
>     ```
> 
> +8. QoS usage example
> +
> +   Assuming you have a vhost-user port transmitting traffic consisting
> of
> +   packets of size 64 bytes, the following command would limit the
> egress
> +   transmission rate of the port to ~1,000,000 packets per second:
> +
> +   `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create
> qos
> +   type=egress-policer other-config:cir=46000000 other-config:cbs=2048`
> +
> +   To examine the QoS configuration of the port:
> +
> +   `ovs-appctl -t ovs-vswitchd qos/show vhost-user0`
> +
> +   To clear the QoS configuration from the port and ovsdb use the
> following:
> +
> +   `ovs-vsctl -- destroy QoS vhost-user0 -- clear Port vhost-user0 qos`
> +
> +   For more details regarding egress-policer parameters please refer to
> the
> +   vswitch.xml.
> +
>  Performance Tuning:
>  -------------------
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> 09ccc2c..716da79 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -44,6 +44,7 @@
>  #include "ovs-rcu.h"
>  #include "packets.h"
>  #include "shash.h"
> +#include "smap.h"
>  #include "sset.h"
>  #include "unaligned.h"
>  #include "timeval.h"
> @@ -52,12 +53,14 @@
> 
>  #include "rte_config.h"
>  #include "rte_mbuf.h"
> +#include "rte_meter.h"
>  #include "rte_virtio_net.h"
> 
>  VLOG_DEFINE_THIS_MODULE(dpdk);
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> 
>  #define DPDK_PORT_WATCHDOG_INTERVAL 5
> +#define DPDK_MAX_QOS_NAME_SIZE 10
> 
>  #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE  #define OVS_VPORT_DPDK
> "ovs_dpdk"
> @@ -142,6 +145,107 @@ static int rte_eal_init_ret = ENODEV;
> 
>  static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER;
> 
> +/* Quality of Service */
> +
> +/* An instance of a QoS configuration.  Always associated with a
> +particular
> + * network device.
> + *
> + * Each QoS implementation subclasses this with whatever additional
> +data it
> + * needs.
> + */
> +struct qos_conf {
> +    const struct dpdk_qos_ops *ops;
> +};
> +
> +/* A particular implementation of dpdk QoS operations.
> + *
> + * The functions below return 0 if successful or a positive errno value
> +on
> + * failure, except where otherwise noted. All of them must be provided,
> +except
> + * where otherwise noted.
> + */
> +struct dpdk_qos_ops {
> +
> +    /* Name of the QoS type */
> +    const char *qos_name;
> +
> +    /* Called to construct the QoS implementation on 'netdev'. The
> +     * implementation should make the appropriate calls to configure
> QoS
> +     * according to 'details'. The implementation may assume that any
> current
> +     * QoS configuration already installed should be destroyed before
> +     * constructing the new configuration.
> +     *
> +     * The contents of 'details' should be documented as valid for
> 'ovs_name'
> +     * in the "other_config" column in the "QoS" table in
> vswitchd/vswitch.xml
> +     * (which is built as ovs-vswitchd.conf.db(8)).
> +     *
> +     * This function must return 0 if and only if it sets 'netdev-
> >qos_conf'
> +     * to an initialized 'struct qos_conf'.
> +     *
> +     * For all QoS implementations it should always be non-null.
> +     */
> +    int (*qos_construct)(struct netdev *netdev, const struct smap
> + *details);
> +
> +    /* Destroys the data structures allocated by the implementation as
> part of
> +     * 'qos_conf.
> +     *
> +     * For all QoS implementations it should always be non-null.
> +     */
> +    void (*qos_destruct)(struct netdev *netdev, struct qos_conf *conf);
> +
> +    /* Retrieves details of 'netdev->qos_conf' configuration into
> 'details'.
> +     *
> +     * The contents of 'details' should be documented as valid for
> 'ovs_name'
> +     * in the "other_config" column in the "QoS" table in
> vswitchd/vswitch.xml
> +     * (which is built as ovs-vswitchd.conf.db(8)).
> +     */
> +    int (*qos_get)(const struct netdev *netdev, struct smap *details);
> +
> +    /* Reconfigures 'netdev->qos_conf' according to 'details',
> performing any
> +     * required calls to complete the reconfiguration.
> +     *
> +     * The contents of 'details' should be documented as valid for
> 'ovs_name'
> +     * in the "other_config" column in the "QoS" table in
> vswitchd/vswitch.xml
> +     * (which is built as ovs-vswitchd.conf.db(8)).
> +     *
> +     * This function may be null if 'qos_conf' is not configurable.
> +     */
> +    int (*qos_set)(struct netdev *netdev, const struct smap *details);
> +
> +    /* Modify an array of rte_mbufs. The modification is specific to
> +     * each qos implementation.
> +     *
> +     * The function should take and array of mbufs and an int
> representing
> +     * the current number of mbufs present in the array.
> +     *
> +     * After the function has performed a qos modification to the array
> of
> +     * mbufs it returns an int representing the number of mbufs now
> present in
> +     * the array. This value is can then be passed to the port send
> function
> +     * along with the modified array for transmission.
> +     *
> +     * For all QoS implementations it should always be non-null.
> +     */
> +    int (*qos_alg_process)(struct netdev *netdev, struct rte_mbuf
> **pkts,
> +                           int pkt_cnt); };
> +
> +/* dpdk_qos_ops for each type of user space QoS implementation */
> +static const struct dpdk_qos_ops egress_policer_ops;
> +
> +/*
> + * Array of dpdk_qos_ops, contains pointer to all supported QoS
> + * operations.
> + */
> +static const struct dpdk_qos_ops *const qos_confs[] = {
> +    &egress_policer_ops,
> +    NULL
> +};
> +
> +/* Action that can be set for a packet for rte_meter */ enum
> +egress_policer_action {
> +        GREEN = e_RTE_METER_GREEN,
> +        DROP = 1,
> +};
> +
>  /* Contains all 'struct dpdk_dev's. */
>  static struct ovs_list dpdk_list OVS_GUARDED_BY(dpdk_mutex)
>      = OVS_LIST_INITIALIZER(&dpdk_list); @@ -232,6 +336,11 @@ struct
> netdev_dpdk {
> 
>      /* In dpdk_list. */
>      struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
> +
> +    /* QoS configuration and lock for the device */
> +    struct qos_conf *qos_conf;
> +    rte_spinlock_t qos_lock;
> +
>  };
> 
>  struct netdev_rxq_dpdk {
> @@ -611,6 +720,10 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned
> int port_no,
>          goto unlock;
>      }
> 
> +    /* Initialise QoS configuration to NULL and qos lock to unlocked */
> +    netdev->qos_conf = NULL;
> +    rte_spinlock_init(&netdev->qos_lock);
> +
>      netdev_->n_txq = NR_QUEUE;
>      netdev_->n_rxq = NR_QUEUE;
>      netdev->real_n_txq = NR_QUEUE;
> @@ -1066,6 +1179,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_,
> struct dp_packet **packets,
>      return 0;
>  }
> 
> +static inline int
> +netdev_dpdk_qos_process__(struct netdev_dpdk *dev, struct rte_mbuf
> **pkts,
> +                        int cnt)
> +{
> +    struct netdev *netdev = &dev->up;
> +
> +    if (dev->qos_conf != NULL) {
> +        rte_spinlock_lock(&dev->qos_lock);
> +        if (dev->qos_conf != NULL) {
> +            cnt = dev->qos_conf->ops->qos_alg_process(netdev, pkts,
> cnt);
> +        }
> +        rte_spinlock_unlock(&dev->qos_lock);
> +    }
> +
> +    return cnt;
> +}
> +
>  static inline void
>  netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats,
>                                       struct dp_packet **packets, @@ -
> 1092,6 +1222,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int
> qid,
>      struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(vhost_dev);
>      struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
>      unsigned int total_pkts = cnt;
> +    unsigned int qos_pkts = cnt;
>      uint64_t start = 0;
> 
>      if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { @@ -1106,6
> +1237,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
>          rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock);
>      }
> 
> +    /* Check has QoS has been configured for the netdev */
> +    cnt = netdev_dpdk_qos_process__(vhost_dev, cur_pkts, cnt);
> +    qos_pkts -= cnt;
> +
>      do {
>          int vhost_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
>          unsigned int tx_pkts;
> @@ -1147,6 +1282,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev,
> int qid,
>      }
> 
>      rte_spinlock_lock(&vhost_dev->stats_lock);
> +    cnt += qos_pkts;
>      netdev_dpdk_vhost_update_tx_counters(&vhost_dev->stats, pkts,
> total_pkts,
>                                           cnt);
>      rte_spinlock_unlock(&vhost_dev->stats_lock);
> @@ -1242,19 +1378,25 @@ dpdk_do_tx_copy(struct netdev *netdev, int qid,
> struct dp_packet **pkts,
>          newcnt++;
>      }
> 
> -    if (OVS_UNLIKELY(dropped)) {
> -        rte_spinlock_lock(&dev->stats_lock);
> -        dev->stats.tx_dropped += dropped;
> -        rte_spinlock_unlock(&dev->stats_lock);
> -    }
> -
>      if (dev->type == DPDK_DEV_VHOST) {
>          __netdev_dpdk_vhost_send(netdev, qid, (struct dp_packet **)
> mbufs, newcnt, true);
>      } else {
> +        unsigned int qos_pkts = newcnt;
> +
> +        /* Check if QoS has been configured for this netdev. */
> +        newcnt = netdev_dpdk_qos_process__(dev, mbufs, newcnt);
> +
> +        dropped += qos_pkts - newcnt;
>          dpdk_queue_pkts(dev, qid, mbufs, newcnt);
>          dpdk_queue_flush(dev, qid);
>      }
> 
> +    if (OVS_UNLIKELY(dropped)) {
> +        rte_spinlock_lock(&dev->stats_lock);
> +        dev->stats.tx_dropped += dropped;
> +        rte_spinlock_unlock(&dev->stats_lock);
> +    }
> +
>      if (!dpdk_thread_is_pmd()) {
>          ovs_mutex_unlock(&nonpmd_mempool_mutex);
>      }
> @@ -1304,15 +1446,24 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int
> qid,
>      } else {
>          int next_tx_idx = 0;
>          int dropped = 0;
> +        unsigned int qos_pkts = 0;
> 
>          for (i = 0; i < cnt; i++) {
>              int size = dp_packet_size(pkts[i]);
> 
>              if (OVS_UNLIKELY(size > dev->max_packet_len)) {
>                  if (next_tx_idx != i) {
> +                    cnt = i - next_tx_idx;
> +                    qos_pkts = cnt;
> +
> +                    cnt = netdev_dpdk_qos_process__(dev,
> +                                                    (struct
> rte_mbuf**)pkts,
> +                                                    cnt);
> +                    dropped += qos_pkts - cnt;
>                      dpdk_queue_pkts(dev, qid,
>                                      (struct rte_mbuf
> **)&pkts[next_tx_idx],
> -                                    i-next_tx_idx);
> +                                    cnt);
> +
>                  }
> 
>                  VLOG_WARN_RL(&rl, "Too big size %d max_packet_len %d",
> @@ -1324,9 +1475,13 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int
> qid,
>              }
>          }
>          if (next_tx_idx != cnt) {
> -           dpdk_queue_pkts(dev, qid,
> -                            (struct rte_mbuf **)&pkts[next_tx_idx],
> -                            cnt-next_tx_idx);
> +            cnt -= next_tx_idx;
> +            qos_pkts = cnt;
> +
> +            cnt = netdev_dpdk_qos_process__(dev, (struct
> rte_mbuf**)pkts, cnt);
> +            dropped += qos_pkts - cnt;
> +            dpdk_queue_pkts(dev, qid, (struct rte_mbuf
> **)&pkts[next_tx_idx],
> +                            cnt);
>          }
> 
>          if (OVS_UNLIKELY(dropped)) {
> @@ -2107,6 +2262,264 @@ unlock_dpdk:
>      return err;
>  }
> 
> +/* QoS Functions */
> +
> +/*
> + * Initialize QoS configuration operations.
> + */
> +static void
> +qos_conf_init(struct qos_conf *conf, const struct dpdk_qos_ops *ops) {
> +    conf->ops = ops;
> +}
> +
> +/*
> + * Search existing QoS operations in qos_ops and compare each set of
> + * operations qos_name to name. Return a dpdk_qos_ops pointer to a
> +match,
> + * else return NULL
> + */
> +static const struct dpdk_qos_ops *
> +qos_lookup_name(const char *name)
> +{
> +    const struct dpdk_qos_ops *const *opsp;
> +
> +    for (opsp = qos_confs; *opsp != NULL; opsp++) {
> +        const struct dpdk_qos_ops *ops = *opsp;
> +        if (!strncmp(name, ops->qos_name,DPDK_MAX_QOS_NAME_SIZE)) {
> +            return ops;
> +        }
> +    }
> +    return NULL;
> +}
> +
> +/*
> + * Call qos_destruct to clean up items associated with the netdevs
> + * qos_conf. Set netdevs qos_conf to NULL.
> + */
> +static void
> +qos_delete_conf(struct netdev *netdev_) {
> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> +
> +    rte_spinlock_lock(&netdev->qos_lock);
> +    if (netdev->qos_conf) {
> +        if (netdev->qos_conf->ops->qos_destruct) {
> +            netdev->qos_conf->ops->qos_destruct(netdev_, netdev-
> >qos_conf);
> +        }
> +        netdev->qos_conf = NULL;
> +    }
> +    rte_spinlock_unlock(&netdev->qos_lock);
> +}
> +
> +static int
> +netdev_dpdk_get_qos_types(const struct netdev *netdev OVS_UNUSED,
> +                           struct sset *types) {
> +    const struct dpdk_qos_ops *const *opsp;
> +
> +    for (opsp = qos_confs; *opsp != NULL; opsp++) {
> +        const struct dpdk_qos_ops *ops = *opsp;
> +        if (ops->qos_construct && ops->qos_name[0] != '\0') {
> +            sset_add(types, ops->qos_name);
> +        }
> +    }
> +    return 0;
> +}
> +
> +static int
> +netdev_dpdk_get_qos(const struct netdev *netdev_,
> +                    const char **typep, struct smap *details) {
> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> +    int error = 0;
> +
> +    ovs_mutex_lock(&netdev->mutex);
> +    if(netdev->qos_conf) {
> +        *typep = netdev->qos_conf->ops->qos_name;
> +        error = (netdev->qos_conf->ops->qos_get
> +                 ? netdev->qos_conf->ops->qos_get(netdev_, details):
> 0);
> +    }
> +    ovs_mutex_unlock(&netdev->mutex);
> +
> +    return error;
> +}
> +
> +static int
> +netdev_dpdk_set_qos(struct netdev *netdev_,
> +                    const char *type, const struct smap *details) {
> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> +    const struct dpdk_qos_ops *new_ops = NULL;
> +    int error = 0;
> +
> +    /* If type is empty the current QoS configuration for the dpdk-
> netdev can
> +     * be destroyed */
> +    if(type[0] == '\0') {
> +        qos_delete_conf(netdev_);
> +    }
> +
> +    new_ops = qos_lookup_name(type);
> +    if (!new_ops || !new_ops->qos_construct) {
> +        return EOPNOTSUPP;
> +    }
> +
> +    ovs_mutex_lock(&netdev->mutex);
> +    if (netdev->qos_conf) {
> +        if (new_ops == netdev->qos_conf->ops) {
> +            error = new_ops->qos_set ? new_ops->qos_set(netdev_,
> details) : 0;
> +        }
> +        else {
> +            /* Delete existing QoS configuration. */
> +            qos_delete_conf(netdev_);
> +            ovs_assert(netdev->qos_conf == NULL);
> +
> +            /* Install new QoS configuration. */
> +            error = new_ops->qos_construct(netdev_, details);
> +            ovs_assert((error == 0) == (netdev->qos_conf != NULL));
> +        }
> +    } else {
> +        error = new_ops->qos_construct(netdev_, details);
> +        ovs_assert((error == 0) == (netdev->qos_conf != NULL));
> +    }
> +
> +    ovs_mutex_unlock(&netdev->mutex);
> +    return error;
> +}
> +
> +/* egress-policer details */
> +
> +struct egress_policer {
> +    struct qos_conf qos_conf;
> +    struct rte_meter_srtcm_params app_srtcm_params;
> +    struct rte_meter_srtcm egress_meter; };
> +
> +static struct egress_policer *
> +egress_policer_get__(const struct netdev *netdev_) {
> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> +    return CONTAINER_OF(netdev->qos_conf, struct egress_policer,
> +qos_conf); }
> +
> +static int
> +egress_policer_qos_construct(struct netdev *netdev_,
> +                            const struct smap *details) {
> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> +    struct egress_policer *policer;
> +    const char *cir_s;
> +    const char *cbs_s;
> +    int err = 0;
> +
> +    rte_spinlock_lock(&netdev->qos_lock);
> +    policer = xmalloc(sizeof *policer);
> +    qos_conf_init(&policer->qos_conf, &egress_policer_ops);
> +    netdev->qos_conf = &policer->qos_conf;
> +    cir_s = smap_get(details, "cir");
> +    cbs_s = smap_get(details, "cbs");
> +    policer->app_srtcm_params.cir = cir_s ? strtoull(cir_s, NULL, 10) :
> 0;
> +    policer->app_srtcm_params.cbs = cbs_s ? strtoull(cbs_s, NULL, 10) :
> 0;
> +    policer->app_srtcm_params.ebs = 0;
> +    err = rte_meter_srtcm_config(&policer->egress_meter,
> +                                    &policer->app_srtcm_params);
> +    rte_spinlock_unlock(&netdev->qos_lock);
> +
> +    return err;
> +}
> +
> +static void
> +egress_policer_qos_destruct(struct netdev *netdev_ OVS_UNUSED,
> +                        struct qos_conf *conf) {
> +    struct egress_policer *policer = CONTAINER_OF(conf, struct
> egress_policer,
> +                                                qos_conf);
> +    free(policer);
> +}
> +
> +static int
> +egress_policer_qos_get(const struct netdev *netdev, struct smap
> +*details) {
> +    struct egress_policer *policer = egress_policer_get__(netdev);
> +    smap_add_format(details, "cir", "%llu",
> +                    1ULL * policer->app_srtcm_params.cir);
> +    smap_add_format(details, "cbs", "%llu",
> +                    1ULL * policer->app_srtcm_params.cbs);
> +    return 0;
> +}
> +
> +static int
> +egress_policer_qos_set(struct netdev *netdev_, const struct smap
> +*details) {
> +    struct egress_policer *policer;
> +    const char *cir_s;
> +    const char *cbs_s;
> +    int err = 0;
> +
> +    policer = egress_policer_get__(netdev_);
> +    cir_s = smap_get(details, "cir");
> +    cbs_s = smap_get(details, "cbs");
> +    policer->app_srtcm_params.cir = cir_s ? strtoull(cir_s, NULL, 10) :
> 0;
> +    policer->app_srtcm_params.cbs = cbs_s ? strtoull(cbs_s, NULL, 10) :
> 0;
> +    policer->app_srtcm_params.ebs = 0;
> +    err = rte_meter_srtcm_config(&policer->egress_meter,
> +                                    &policer->app_srtcm_params);
> +
> +    return err;
> +}
> +
> +static inline int
> +egress_policer_pkt_handle__(struct rte_meter_srtcm *meter,
> +                            struct rte_mbuf *pkt, uint64_t time) {
> +    uint8_t output_color;
> +    uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct
> ether_hdr);
> +    enum egress_policer_action action = GREEN;
> +
> +    /* color input is not used for blind modes */
> +    output_color = (uint8_t) rte_meter_srtcm_color_blind_check(meter,
> time,
> +
> + pkt_len);
> +
> +    /* Check output color, 0 corresponds to GREEN i.e. packet can be
> +     * transmitted. If greater than 0 then action is set to DROP. */
> +    if (output_color) {
> +        action = DROP;
> +    }
> +
> +    return action;
> +}
> +
> +static int
> +egress_policer_alg_process(struct netdev *netdev_, struct rte_mbuf
> **pkts,
> +                        int pkt_cnt)
> +{
> +    int i = 0;
> +    int cnt = pkt_cnt;
> +    struct egress_policer *policer = egress_policer_get__(netdev_);
> +    struct rte_mbuf *pkt = NULL;
> +    uint64_t current_time = rte_rdtsc();
> +
> +    for(i = 0; i < pkt_cnt; i++) {
> +        pkt = pkts[i];
> +        /* Handle current packet */
> +        if (egress_policer_pkt_handle__(&policer->egress_meter, pkt,
> +                                        current_time) == DROP) {
> +            rte_pktmbuf_free(pkt);
> +            cnt = cnt - 1;
> +        }
> +    }
> +
> +    return cnt;
> +}
> +
> +static const struct dpdk_qos_ops egress_policer_ops = {
> +    "egress-policer",    /* qos_name */
> +    egress_policer_qos_construct,
> +    egress_policer_qos_destruct,
> +    egress_policer_qos_get,
> +    egress_policer_qos_set,
> +    egress_policer_alg_process
> +};
> +
>  #define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ,
> SEND, \
>      GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV)
> \
>  {                                                             \
> @@ -2144,10 +2557,10 @@ unlock_dpdk:
>      NULL,                       /* set_advertisements */      \
>                                                                \
>      NULL,                       /* set_policing */            \
> -    NULL,                       /* get_qos_types */           \
> +    netdev_dpdk_get_qos_types,                                \
>      NULL,                       /* get_qos_capabilities */    \
> -    NULL,                       /* get_qos */                 \
> -    NULL,                       /* set_qos */                 \
> +    netdev_dpdk_get_qos,                                      \
> +    netdev_dpdk_set_qos,                                      \
>      NULL,                       /* get_queue */               \
>      NULL,                       /* set_queue */               \
>      NULL,                       /* delete_queue */            \
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index
> ce0dbc1..545d47f 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3298,6 +3298,26 @@
>            for information on how this classifier works.
>          </dd>
>        </dl>
> +      <dl>
> +        <dt><code>egress-policer</code></dt>
> +        <dd>
> +          An egress policer algorithm. This implementation uses the
> DPDK
> +          rte_meter library. The rte_meter library provides an
> implementation
> +          of srTCM (RFC2697) which allows the metering and policing of
> traffic.
> +          The implementation in OVS creates a color blind srTCM meter
> with a
> +          single token bucket used to police traffic. It should be
> noted that
> +          when the rte_meter is configured as part of QoS there will be
> a
> +          performance overhead as the rte_mter itself will consume CPU
> cycles
> +          in order to police traffic. These CPU cycles ordinarily are
> used for
> +          packet proccessing. As such the drop in performance will be
> noticed
> +          in terms of overall aggregate traffic throughput.
> +
> +          For more information regarding the usage of the DPDK
> rte_meter see
> +
> <code>http://dpdk.org/doc/guides/sample_app_ug/qos_metering.html</code>
> +          For more information regarding srTCM see
> +          <code>https://tools.ietf.org/html/rfc2697</code>
> +        </dd>
> +      </dl>
>      </column>
> 
>      <column name="queues">
> @@ -3334,6 +3354,38 @@
>        </column>
>      </group>
> 
> +    <group title="Configuration for egress-policer QoS">
> +      <p>
> +        <ref table="QoS"/> <ref table="QoS" column="type"/>
> +        <code>egress-policer</code> provides egress policing for
> userspace
> +        port types with DPDK.
> +
> +        It has the following key-value pairs defined.
> +      </p>
> +
> +      <column name="other_config" key="cir" type='{"type": "integer"}'>
> +        The Committed Information Rate (CIR) is measured in bytes of IP
> +        packets per second, i.e. it includes the IP header, but not
> link
> +        specific (e.g. Ethernet) headers. This represents the bytes per
> second
> +        rate at which the token bucket will be updated. The cir value
> is
> +        calculated by (pps x packet data size).  For example assuming a
> user
> +        wishes to limit a stream consisting of 64 byte packets to 1
> million
> +        packets per second the CIR would be set to to to 46000000. This
> value
> +        can be broken into '1,000,000 x 46'. Where 1,000,000 is the
> policing
> +        rate for the number of packets per second and 46 represents the
> size
> +        of the packet data for a 64 byte ip packet.
> +      </column>
> +      <column name="other_config" key="cbs" type='{"type": "integer"}'>
> +        The Committed Burst Size (CBS) is measured in bytes and
> represents a
> +        token bucket. At a minimum this value should be be set to the
> expected
> +        largest size packet in the traffic stream. In practice larger
> values
> +        may be used to increase the size of the token bucket. If a
> packet can
> +        be transmitted then the cbs will be decremented by the number
> of
> +        bytes/tokens of the packet. If there are not enough tokens in
> the cbs
> +        bucket the packet will be dropped.
> +      </column>
> +    </group>
> +
>      <group title="Common Columns">
>        The overall purpose of these columns is described under
> <code>Common
>        Columns</code> at the beginning of this document.
> --
> 1.7.4.1




More information about the dev mailing list