[ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS functionality.

Stokes, Ian ian.stokes at intel.com
Thu Feb 11 14:41:49 UTC 2016


> On Thu, 11 Feb 2016 12:32:56 +0000
> "Stokes, Ian" <ian.stokes at intel.com> wrote:
> 
> > Thanks For the review Flavio, much appreciated, comments inline. I'll
> re-spin a new version also.
> 
> Thank you for the patch!
> comments inline.
> 
> 
> >
> > > -----Original Message-----
> > > From: Flavio Leitner [mailto:fbl at sysclose.org]
> > > Sent: Wednesday, February 10, 2016 7:55 PM
> > > To: Stokes, Ian
> > > Cc: dev at openvswitch.org
> > > Subject: Re: [ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS
> functionality.
> > >
> > > On Mon,  1 Feb 2016 20:47:25 +0000
> > > Ian Stokes <ian.stokes at intel.com> wrote:
> > >
> > > > This patch provides the modifications required in netdev-dpdk.c
> > > > and vswitch.xml to allow for a DPDK user space QoS algorithm.
> > > >
> > > > This patch adds a QoS configuration structure for netdev-dpdk and
> > > > expected QoS operations 'dpdk_qos_ops'. Various helper functions
> > > > are also supplied.
> > > >
> > > > Also included are the modifications required for vswitch.xml to
> > > > allow a new QoS implementation for netdev-dpdk devices. This
> > > > includes a new QoS type `egress-policer` as well as its expected
> QoS table entries.
> > > >
> > > > The QoS functionality implemented for DPDK devices is `egress-
> > > policer`.
> > > > This can be used to drop egress packets at a configurable rate.
> > > >
> > > > The INSTALL.DPDK.md guide has also been modified to provide an
> > > > example configuration of `egress-policer` QoS.
> > > >
> > > > Signed-off-by: Ian Stokes <ian.stokes at intel.com>
> > > > ---
> > > >  INSTALL.DPDK.md      |   20 +++
> > > >  lib/netdev-dpdk.c    |  439
> > > ++++++++++++++++++++++++++++++++++++++++++++++++--
> > > >  vswitchd/vswitch.xml |   52 ++++++
> > > >  3 files changed, 498 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index
> > > > e8ef4b5..dac70cd
> > > > 100644
> > > > --- a/INSTALL.DPDK.md
> > > > +++ b/INSTALL.DPDK.md
> > > > @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd:
> > > >     ./ovs-ofctl add-flow br0 in_port=2,action=output:1
> > > >     ```
> > > >
> > > > +8. QoS usage example
> > > > +
> > > > +   Assuming you have a vhost-user port transmitting traffic
> > > consisting of
> > > > +   packets of size 64 bytes, the following command would limit
> > > > + the
> > > egress
> > > > +   transmission rate of the port to ~1,000,000 packets per
> second:
> > > > +
> > > > +   `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos
> > > > + create
> > > qos
> > > > +   type=egress-policer other-config:cir=46000000
> > > > + other-config:cbs=2048`
> > > > +
> > > > +   To examine the QoS configuration of the port:
> > > > +
> > > > +   `ovs-appctl -t ovs-vswitchd qos/show vhost-user0`
> > > > +
> > > > +   To clear the QoS configuration from the port and ovsdb use the
> > > following:
> > > > +
> > > > +   `ovs-vsctl -- destroy QoS vhost-user0 -- clear Port
> > > > + vhost-user0 qos`
> > > > +
> > > > +   For more details regarding egress-policer parameters please
> > > > + refer
> > > to the
> > > > +   vswitch.xml.
> > > > +
> > > >  Performance Tuning:
> > > >  -------------------
> > > >
> > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > > > 09ccc2c..716da79 100644
> > > > --- a/lib/netdev-dpdk.c
> > > > +++ b/lib/netdev-dpdk.c
> > > > @@ -44,6 +44,7 @@
> > > >  #include "ovs-rcu.h"
> > > >  #include "packets.h"
> > > >  #include "shash.h"
> > > > +#include "smap.h"
> > > >  #include "sset.h"
> > > >  #include "unaligned.h"
> > > >  #include "timeval.h"
> > > > @@ -52,12 +53,14 @@
> > > >
> > > >  #include "rte_config.h"
> > > >  #include "rte_mbuf.h"
> > > > +#include "rte_meter.h"
> > > >  #include "rte_virtio_net.h"
> > > >
> > > >  VLOG_DEFINE_THIS_MODULE(dpdk);
> > > >  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> > > >
> > > >  #define DPDK_PORT_WATCHDOG_INTERVAL 5
> > > > +#define DPDK_MAX_QOS_NAME_SIZE 10
> > Noted, will expand to 20 for the moment unless you think it should be
> higher?
> 
> I'd rather take that out completely.
> 

Done, Daniele had a similar comment and it's not needed.

> > > > +
> > > > +/* Action that can be set for a packet for rte_meter */ enum
> > > > +egress_policer_action {
> > > > +        GREEN = e_RTE_METER_GREEN,
> > > > +        DROP = 1,
> > > > +};
> > >
> > > I don't think it's a good idea to mix external and internal
> > > declarations because if the external one changes, it will break
> > > here. Since GREEN and DROP have meanings only inside of OVS, we
> should define our own values.
> > > See my comment where this is used.
> > >
> > Just to clarify, is this in relation to the use of e_RTE_METER_GREEN?
> > Is it prefereable for GREEN to be equal to our own declaration?
> 
> We can use e_RTE_METER_GREEN and we should do it to be consistent with
> the rte function.  My concern is having two declarations: one coming
> from outside and another that you defined to a fixed number.  So, what
> happens if e_RTE_METER_GREEN gets shifted and now its value is 1?
> 
> But with the simplification I proposed for egress_policer_pkt_handle__()
> that enum isn't needed and we can take it out completely.
> 

Agreed, I've tested your suggestion and it works fine + is cleaner.

> 
> > > >  static inline void
> > > >  netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats,
> > > >                                       struct dp_packet **packets,
> > > > @@
> > > > -1092,6 +1222,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev,
> > > > int
> > > qid,
> > > >      struct virtio_net *virtio_dev =
> > > netdev_dpdk_get_virtio(vhost_dev);
> > > >      struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
> > > >      unsigned int total_pkts = cnt;
> > > > +    unsigned int qos_pkts = cnt;
> > > >      uint64_t start = 0;
> > > >
> > > >      if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { @@ -1106,6
> > > > +1237,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int
> > > > +qid,
> > > >          rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock);
> > > >      }
> > > >
> > > > +    /* Check has QoS has been configured for the netdev */
> > > > +    cnt = netdev_dpdk_qos_process__(vhost_dev, cur_pkts, cnt);
> > > > +    qos_pkts -= cnt;
> > >
> > > I don't see how this is working because egress_policer_alg_process()
> > > will loop in the array one by one freeing or not the packet.
> > > If it does nothing, then the array is good but otherwise it creates
> > > a problem because the array still references freed entries.  So in
> > > loop below we are sending a mix of good and freed buffers up to
> 'cnt'.
> > >
> > So we should never have freed packets included in cnt as it will cause
> > a segfault. Consider when an array of size cnt is passed to the
> > netdev_dpdk_qos_process(). The timestamp used to update the token
> > bucket is taken before the for loop is entered. This way the token
> > bucket is only updated once for each batch of packets of size cnt.
> > The loop cycles through the array, for any packet that is green the
> > token bucket decremented. If a packet is dropped then the bucket has
> > been exhausted. The token bucket will not be updated again within the
> > run of the loop. Any packets remaining in the array will be dropped
> > (i.e. freed) as the bucket is exhausted. The value of cnt will only be
> > the number of pkts that were not dropped (i.e. not freed). This value
> > will then be passed to the send function ensuring no freed packets are
> > present in the array that is to be processed. If the token bucket
> > updated between each individual packet being processed then you could
> > have the situation you describe, by processing them as a batch with a
> > single update beforehand you avoid it.
> 
> I haven't looked into the rte implementation yet so I can't tell if the
> decision to drop or not the packet is based solely on the timestamp
> which doesn't change during the batch processing.
> 
> Assuming that is the case I agree with you that the cnt would have only
> the first valid entries to be sent out.
> 
> My concern is when we are processing a batch with two packets.
> One big packet and another small one.  While processing the big packet,
> there is no enough tokens and the packet is marked as RED/DROPPED.  Then
> the loop gets the second small packet, but now some enough tokens were
> added/still available in the bucket which passes the small packet.
> 

Apologies, I was looking at this the wrong way. I understand your point now. 
Let me think about this. And I'll post a rough solution here before submitting.

> 
> > > > +
> > > > +static inline int
> > > > +egress_policer_pkt_handle__(struct rte_meter_srtcm *meter,
> > > > +                            struct rte_mbuf *pkt, uint64_t time)
> > > > +{
> > > > +    uint8_t output_color;
> > > > +    uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct
> > > ether_hdr);
> > > > +    enum egress_policer_action action = GREEN;
> > > > +
> > > > +    /* color input is not used for blind modes */
> > > > +    output_color = (uint8_t)
> > > > rte_meter_srtcm_color_blind_check(meter,
> > > time,
> > > > +
> > > pkt_len);
> > > > +
> > > > +    /* Check output color, 0 corresponds to GREEN i.e. packet
> > > > can be
> > > > +     * transmitted. If greater than 0 then action is set to
> > > > DROP. */
> > > > +    if (output_color) {
> > > > +        action = DROP;
> > > > +    }
> > >
> > >
> > > This could be simpler:
> > >
> > > static inline int
> > > egress_policer_pkt_handle__(struct rte_meter_srtcm *meter,
> > >                             struct rte_mbuf *pkt, uint64_t time) {
> > >      uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct
> > > ether_hdr);
> > >
> > >      /* If GREEN, accept it otherwise drop it */
> > >      if (rte_meter_srtcm_color_blind_check(meter, time, pkt_len) !=
> > > e_RTE_METER_GREEN) {
> > >          return 1;
> > >      }
> > >
> > >      return 0;
> > > }
> > >
> > >
> > Agreed this is simpler/better. In relation to the external declaration
> > comment above, Is it preferable not to use e_RTE_METER_GREEN here and
> > instead have a value defined in OVS?
> 
> We should use it to be consistent with the rte implementation.
Done.
> 
> Thanks,
> --
> fbl




More information about the dev mailing list