[ovs-dev] [PATCH v2 2/2] netdev-dpdk.c: Add ingress-policing functionality.

Daniele Di Proietto diproiettod at ovn.org
Tue May 24 00:41:32 UTC 2016


Thanks for the patch, it's simpler than I expected (all the infrastructure
was already there).

Minor comments inline, otherwise this looks good to me.

If you agree with the comments, would you mind sending another version?

Daniele

2016-05-10 2:19 GMT-07:00 Ian Stokes <ian.stokes at intel.com>:

> This patch provides the modifications required in netdev-dpdk.c and
> vswitch.xml to enable ingress policing for DPDK interfaces.
>
> This patch implements the necessary netdev functions to netdev-dpdk.c as
> well as various helper functions required for ingress policing.
>
> The vswitch.xml has been modified to explain the expected parameters and
> behaviour when using ingress policing.
>
> The INSTALL.DPDK.md guide has been modified to provide an example
> configuration of ingress policing.
>
> Signed-off-by: Ian Stokes <ian.stokes at intel.com>
> ---
>
> v2:
> - Rebase to latest master.
>
> *netdev-dpdk.c
> - Modied 'netdev_dpdk_set_policing()' to set default value of of
>   policer_burst to 8000 kbits.
> - Modified 'netdev_dpdk_policer_construct__()' to use '1000/8' for clarity
>   instead of '125' when converting kbits to bytes for rate and burst.
>
> v1:
>
> *INSTALL.DPDK.md
> - Add ingress_policing usage example.
>
> *NEWS
> - Add support for ingress_policing to DPDK section.
>
> *netdev-dpdk.c
> - Add ingress_policer struct.
> - Modify struct netdev-dpdk to include a ovsrcu pointer to an ingress
>   policer.
> - Modify netdev_dpdk_init() to initialise ingress policer to null.
> - Add function 'ingress_policer_run()'to process packets with a given
>   netdev, rte_mbuf and packet count.
> - Modified 'netdev_dpdk_vhost_update_rx_counters()' to accept new
>   parameter int dropped representing number of packets dropped.
> - Modified 'netdev_dpdk_vhost_update_rx_counters()' to update number of
>   dropped packets in stats->rx_dropped.
> - Modified 'netdev_dpdk_vhost_rxq_recv()' to check and call
>   ingress_policing functionality with 'ingress_policer_run()'.
> - Modified 'netdev_dpdk_rxq_recv()' to check and call
>   ingress_policing functionality with 'ingress_policer_run()'.
> - Modified 'netdev_dpdk_rxq_recv()' to update rx dropped stats.
> - Modified 'netdev_dpdk_vhost_get_stats()' to add support for rx_dropped.
> - Add function 'netdev_dpdk_policer_construct__()' to create an
>   ingress_policer.
> - Add function 'netdev_dpdk_policer_destruct__()' to destroy an ingress
>   policer.
> - Add function 'netdev_dpdk_set_policing()' responsible for calling
>   helper functions to create and destroy an ingress policer for the
>   device.
> - Add function 'netdev_dpdk_get_ingress_policer()' to return a pointer
>   to the ingress_policer using the ovsrcu_get function.
>
> *vswitch.xml
> - Modify ingress policing section to include support in OVS with DPDK.
> ---
>  INSTALL.DPDK.md      |   19 ++++++
>  NEWS                 |    1 +
>  lib/netdev-dpdk.c    |  173
> ++++++++++++++++++++++++++++++++++++++++++++++++-
>  vswitchd/vswitch.xml |    4 +-
>  4 files changed, 191 insertions(+), 6 deletions(-)
>
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 93f92e4..68735cc 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -267,6 +267,25 @@ Using the DPDK with ovs-vswitchd:
>     For more details regarding egress-policer parameters please refer to
> the
>     vswitch.xml.
>
> +9. Ingress Policing Example
> +
> +   Assuming you have a vhost-user port receiving traffic consisting of
> +   packets of size 64 bytes, the following command would limit the
> reception
> +   rate of the port to ~1,000,000 packets per second:
> +
> +   `ovs-vsctl set interface vhost-user0 ingress_policing_rate=368000
> +    ingress_policing_burst=1000`
> +
> +   To examine the ingress policer configuration of the port:
> +
> +   `ovs-vsctl list interface vhost-user0`
> +
> +   To clear the ingress policer configuration from the port use the
> following:
> +
> +   `ovs-vsctl set interface vhost-user0 ingress_policing_rate=0`
> +
> +   For more details regarding ingress-policer see the vswitch.xml.
> +
>  Performance Tuning:
>  -------------------
>
> diff --git a/NEWS b/NEWS
> index 4e81cad..ba201cf 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -32,6 +32,7 @@ Post-v2.5.0
>       * DB entries have been added for many of the DPDK EAL command line
>         arguments. Additional arguments can be passed via the dpdk-extra
>         entry.
> +     * Add ingress policing functionality.
>     - ovs-benchmark: This utility has been removed due to lack of use and
>       bitrot.
>     - ovs-appctl:
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 0ed909c..dd4fa0a 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -328,6 +328,12 @@ struct dpdk_ring {
>      struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
>  };
>
> +struct ingress_policer {
> +    struct rte_meter_srtcm_params app_srtcm_params;
> +    struct rte_meter_srtcm in_policer;
> +    rte_spinlock_t policer_lock;
> +};
> +
>  struct netdev_dpdk {
>      struct netdev up;
>      int port_id;
> @@ -373,6 +379,10 @@ struct netdev_dpdk {
>      struct qos_conf *qos_conf;
>      rte_spinlock_t qos_lock;
>
> +    /* Ingress Policer */
> +    OVSRCU_TYPE(struct ingress_policer *) ingress_policer;
> +    uint32_t policer_rate;
> +    uint32_t policer_burst;
>  };
>
>  struct netdev_rxq_dpdk {
> @@ -386,6 +396,9 @@ static int netdev_dpdk_construct(struct netdev *);
>
>  struct virtio_net * netdev_dpdk_get_virtio(const struct netdev_dpdk *dev);
>
> +struct ingress_policer *
> +netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev);
> +
>  static bool
>  is_dpdk_class(const struct netdev_class *class)
>  {
> @@ -759,6 +772,11 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int
> port_no,
>      dev->qos_conf = NULL;
>      rte_spinlock_init(&dev->qos_lock);
>
> +    /* Initialise rcu pointer for ingress policer to NULL */
> +    ovsrcu_init(&dev->ingress_policer, NULL);
> +    dev->policer_rate = 0;
> +    dev->policer_burst = 0;
> +
>      netdev->n_txq = NR_QUEUE;
>      netdev->n_rxq = NR_QUEUE;
>      netdev->requested_n_rxq = NR_QUEUE;
> @@ -1198,6 +1216,22 @@ netdev_dpdk_policer_run__(struct rte_meter_srtcm
> *meter,
>      return cnt;
>  }
>
>
+static int
> +ingress_policer_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts,
> +                        int pkt_cnt)
> +{
> +    int cnt = 0;
> +    struct ingress_policer *policer =
> netdev_dpdk_get_ingress_policer(dev);
> +
> +    if (policer) {
> +        rte_spinlock_lock(&policer->policer_lock);
> +        cnt = netdev_dpdk_policer_run__(&policer->in_policer, pkts,
> pkt_cnt);
> +        rte_spinlock_unlock(&policer->policer_lock);
> +    }
> +
> +    return cnt;
> +}
> +
>

The callers to the above function already have to check for 'policer'.
Maybe it'd be clearer to
pass 'policer' instead of 'dev' and drop the if?


>  static bool
>  is_vhost_running(struct virtio_net *virtio_dev)
>  {
> @@ -1232,13 +1266,15 @@ netdev_dpdk_vhost_update_rx_size_counters(struct
> netdev_stats *stats,
>
>  static inline void
>  netdev_dpdk_vhost_update_rx_counters(struct netdev_stats *stats,
> -                                     struct dp_packet **packets, int
> count)
> +                                     struct dp_packet **packets, int
> count,
> +                                     int dropped)
>  {
>      int i;
>      unsigned int packet_size;
>      struct dp_packet *packet;
>
>      stats->rx_packets += count;
> +    stats->rx_dropped += dropped;
>      for (i = 0; i < count; i++) {
>          packet = packets[i];
>          packet_size = dp_packet_size(packet);
> @@ -1273,7 +1309,9 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
>      struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
>      struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(dev);
>      int qid = rxq->queue_id;
> +    struct ingress_policer *policer =
> netdev_dpdk_get_ingress_policer(dev);
>      uint16_t nb_rx = 0;
> +    uint16_t dropped = 0;
>
>      if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) {
>          return EAGAIN;
> @@ -1291,8 +1329,14 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
>          return EAGAIN;
>      }
>
> +    if (policer != NULL) {
>

Minor: I would drop the "!= NULL"


> +        dropped = nb_rx;
> +        nb_rx = ingress_policer_run(dev, (struct rte_mbuf **)packets,
> nb_rx);
> +        dropped -= nb_rx;
> +    }
> +
>      rte_spinlock_lock(&dev->stats_lock);
> -    netdev_dpdk_vhost_update_rx_counters(&dev->stats, packets, nb_rx);
> +    netdev_dpdk_vhost_update_rx_counters(&dev->stats, packets, nb_rx,
> dropped);
>      rte_spinlock_unlock(&dev->stats_lock);
>
>      *c = (int) nb_rx;
> @@ -1305,7 +1349,9 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
> dp_packet **packets,
>  {
>      struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq);
>      struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev);
> +    struct ingress_policer *policer =
> netdev_dpdk_get_ingress_policer(dev);
>      int nb_rx;
> +    int dropped = 0;
>
>      /* There is only one tx queue for this core.  Do not flush other
>       * queues.
> @@ -1323,6 +1369,19 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct
> dp_packet **packets,
>          return EAGAIN;
>      }
>
> +    if (policer != NULL) {
>

same here


> +        dropped = nb_rx;
> +        nb_rx = ingress_policer_run(dev, (struct rte_mbuf **) packets,
> nb_rx);
> +        dropped -= nb_rx;
> +    }
> +
> +    /* Update stats to reflect dropped packets */
> +    if (OVS_UNLIKELY(dropped)) {
> +        rte_spinlock_lock(&dev->stats_lock);
> +        dev->stats.rx_dropped += dropped;
> +        rte_spinlock_unlock(&dev->stats_lock);
> +    }
> +
>      *c = nb_rx;
>
>      return 0;
> @@ -1756,6 +1815,7 @@ netdev_dpdk_vhost_get_stats(const struct netdev
> *netdev,
>      /* Supported Stats */
>      stats->rx_packets += dev->stats.rx_packets;
>      stats->tx_packets += dev->stats.tx_packets;
> +    stats->rx_dropped = dev->stats.rx_dropped;
>      stats->tx_dropped += dev->stats.tx_dropped;
>      stats->multicast = dev->stats.multicast;
>      stats->rx_bytes = dev->stats.rx_bytes;
> @@ -1882,11 +1942,12 @@ netdev_dpdk_get_stats(const struct netdev *netdev,
> struct netdev_stats *stats)
>
>      rte_spinlock_lock(&dev->stats_lock);
>      stats->tx_dropped = dev->stats.tx_dropped;
> +    stats->rx_dropped = dev->stats.rx_dropped;
>      rte_spinlock_unlock(&dev->stats_lock);
>
>      /* These are the available DPDK counters for packets not received due
> to
>       * local resource constraints in DPDK and NIC respectively. */
> -    stats->rx_dropped = rte_stats.rx_nombuf + rte_stats.imissed;
> +    stats->rx_dropped += rte_stats.rx_nombuf + rte_stats.imissed;
>      stats->rx_missed_errors = rte_stats.imissed;
>
>      ovs_mutex_unlock(&dev->mutex);
> @@ -1941,6 +2002,104 @@ netdev_dpdk_get_features(const struct netdev
> *netdev,
>  }
>
>  static int
> +netdev_dpdk_policer_construct__(struct netdev *netdev_, uint32_t rate,
> +                                uint32_t burst)
>

I would drop the "__" part from the name.


> +{
> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
> +    struct ingress_policer *policer;
> +    uint64_t rate_bytes;
> +    uint64_t burst_bytes;
> +    int err = 0;
> +
> +    policer = xmalloc(sizeof *policer);
> +    rte_spinlock_init(&policer->policer_lock);
> +
> +    /* rte_meter requires bytes so convert kbits rate and burst to bytes.
> */
> +    rate_bytes = rate * 1000/8;
> +    burst_bytes = burst * 1000/8;
> +
> +    policer->app_srtcm_params.cir = rate_bytes;
> +    policer->app_srtcm_params.cbs = burst_bytes;
> +    policer->app_srtcm_params.ebs = 0;
> +    err = rte_meter_srtcm_config(&policer->in_policer,
> +                                    &policer->app_srtcm_params);
> +    ovsrcu_set(&netdev->ingress_policer, policer);
> +
> +    return err;
> +}
>

IMHO it will be clearer if this function returns 'policer', instead of
setting it in
'netdev_'


> +
> +static int netdev_dpdk_policer_destruct__(struct ingress_policer *
> policer)
> +{
> +    if (policer == NULL) {
> +        return 0;
> +    }
> +    else {
> +        free(policer);
> +    }
> +    return 0;
> +}
>

There's no need for this function. free(NULL) is perfectly valid (plus, the
caller already checks).


> +
> +static int
> +netdev_dpdk_set_policing(struct netdev* netdev_, uint32_t policer_rate,
> +                            uint32_t policer_burst)
> +{
> +    struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>

Not a big deal, but in the rest of the functions we use 'netdev' for
'struct netdev *',
and 'dev' for 'struct netdev_dpdk *' (since commit d46285a2206f)


> +    struct ingress_policer *policer;
> +    int err = 0;
>

There's no way for this function to fail, so this variable is unnecessary.


> +
> +    /* Force to 0 if no rate specified,
> +     * default to 8000 kbits if burst is 0,
> +     * else stick with user-specified value.
> +     */
> +    policer_burst = (!policer_rate ? 0
> +                    : !policer_burst ? 8000
> +                    : policer_burst);
> +
> +    ovs_mutex_lock(&netdev->mutex);
> +
> +    policer = ovsrcu_get_protected(struct ingress_policer *,
> +                                    &netdev->ingress_policer);
> +
> +    if (netdev->policer_rate == policer_rate &&
> +    netdev->policer_burst == policer_burst) {
> +        /* Assume that settings haven't changed since we last set them. */
> +        ovs_mutex_unlock(&netdev->mutex);
> +        return err;
> +    }
> +
>




> +    /* Destroy any existing ingress policer for the device if one exists
> +     * and the policer_rate and policer_burst are being set to 0
> +     */
> +    if ((policer != NULL) && (policer_rate == 0) && (policer_burst == 0))
> {
> +        /* User has set rate and burst to 0, destroy the policer */
> +        ovsrcu_postpone(netdev_dpdk_policer_destruct__, policer);
> +        ovsrcu_set(&netdev->ingress_policer, NULL);
> +        netdev->policer_rate = 0;
> +        netdev->policer_burst = 0;
> +        ovs_mutex_unlock(&netdev->mutex);
> +        return err;
> +    }
> +
> +    /* Destroy existing policer before adding new policer */
> +    if (policer != NULL) {
> +        ovsrcu_postpone(netdev_dpdk_policer_destruct__, policer);
> +        ovsrcu_set(&netdev->ingress_policer, NULL);
> +        netdev->policer_rate = 0;
> +        netdev->policer_burst = 0;
> +    }
> +
> +    /* Add an ingress policer */
> +    err = netdev_dpdk_policer_construct__(netdev_, policer_rate,
> +                                            policer_burst);
> +
> +    /* Update policer_rate and policer_burst for the netdev */
> +    netdev->policer_rate = policer_rate;
> +    netdev->policer_burst = policer_burst;
>
+    ovs_mutex_unlock(&netdev->mutex);
>

The above block seems more complicated than necessary.

How about something like this?

    if (policer) {
        ovsrcu_postpone(free, policer);
    }

    if (policer_rate != 0) {
        new_policer = netdev_dpdk_policer_construct__(rate, burst);
    } else {
        new_policer = NULL;
    }
    ovsrcu_set(&netdev->ingress_policer, new_policer);
    netdev->policer_rate = policer_rate;
    netdev->policer_burst = policer_burst;
    ovs_mutex_unlock(&netdev->mutex);


> +    return err;
> +}
> +
> +static int
>  netdev_dpdk_get_ifindex(const struct netdev *netdev)
>  {
>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> @@ -2387,6 +2546,12 @@ netdev_dpdk_get_virtio(const struct netdev_dpdk
> *dev)
>      return ovsrcu_get(struct virtio_net *, &dev->virtio_dev);
>  }
>
> +struct ingress_policer *
> +netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev)
> +{
> +    return ovsrcu_get(struct ingress_policer *, &dev->ingress_policer);
> +}
> +
>  /*
>   * These callbacks allow virtio-net devices to be added to vhost ports
> when
>   * configuration has been fully complete.
> @@ -2827,7 +2992,7 @@ static const struct dpdk_qos_ops egress_policer_ops
> = {
>      GET_FEATURES,                                             \
>      NULL,                       /* set_advertisements */      \
>                                                                \
> -    NULL,                       /* set_policing */            \
> +    netdev_dpdk_set_policing,   /* set_policing */            \
>

Minor: since the member is not NULL anymore we can remove the comment


>      netdev_dpdk_get_qos_types,                                \
>      NULL,                       /* get_qos_capabilities */    \
>      netdev_dpdk_get_qos,                                      \
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 944d8ec..0c9e60c 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -2502,8 +2502,8 @@
>          table="Queue"/> tables).
>        </p>
>        <p>
> -        Policing is currently implemented only on Linux.  The Linux
> -        implementation uses a simple ``token bucket'' approach:
> +        Policing is currently implemented on Linux and OVS with DPDK. Both
> +        implementations use a simple ``token bucket'' approach:
>        </p>
>        <ul>
>          <li>
> --
> 1.7.4.1
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>



More information about the dev mailing list