[ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS functionality.

Flavio Leitner fbl at sysclose.org
Thu Feb 11 14:05:11 UTC 2016


On Thu, 11 Feb 2016 12:32:56 +0000
"Stokes, Ian" <ian.stokes at intel.com> wrote:

> Thanks For the review Flavio, much appreciated, comments inline. I'll re-spin a new version also.

Thank you for the patch!
comments inline.


> 
> > -----Original Message-----
> > From: Flavio Leitner [mailto:fbl at sysclose.org]
> > Sent: Wednesday, February 10, 2016 7:55 PM
> > To: Stokes, Ian
> > Cc: dev at openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS functionality.
> > 
> > On Mon,  1 Feb 2016 20:47:25 +0000
> > Ian Stokes <ian.stokes at intel.com> wrote:
> >   
> > > This patch provides the modifications required in netdev-dpdk.c and
> > > vswitch.xml to allow for a DPDK user space QoS algorithm.
> > >
> > > This patch adds a QoS configuration structure for netdev-dpdk and
> > > expected QoS operations 'dpdk_qos_ops'. Various helper functions are
> > > also supplied.
> > >
> > > Also included are the modifications required for vswitch.xml to allow
> > > a new QoS implementation for netdev-dpdk devices. This includes a new
> > > QoS type `egress-policer` as well as its expected QoS table entries.
> > >
> > > The QoS functionality implemented for DPDK devices is `egress-  
> > policer`.  
> > > This can be used to drop egress packets at a configurable rate.
> > >
> > > The INSTALL.DPDK.md guide has also been modified to provide an example
> > > configuration of `egress-policer` QoS.
> > >
> > > Signed-off-by: Ian Stokes <ian.stokes at intel.com>
> > > ---
> > >  INSTALL.DPDK.md      |   20 +++
> > >  lib/netdev-dpdk.c    |  439  
> > ++++++++++++++++++++++++++++++++++++++++++++++++--  
> > >  vswitchd/vswitch.xml |   52 ++++++
> > >  3 files changed, 498 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index e8ef4b5..dac70cd
> > > 100644
> > > --- a/INSTALL.DPDK.md
> > > +++ b/INSTALL.DPDK.md
> > > @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd:
> > >     ./ovs-ofctl add-flow br0 in_port=2,action=output:1
> > >     ```
> > >
> > > +8. QoS usage example
> > > +
> > > +   Assuming you have a vhost-user port transmitting traffic  
> > consisting of  
> > > +   packets of size 64 bytes, the following command would limit the  
> > egress  
> > > +   transmission rate of the port to ~1,000,000 packets per second:
> > > +
> > > +   `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create  
> > qos  
> > > +   type=egress-policer other-config:cir=46000000
> > > + other-config:cbs=2048`
> > > +
> > > +   To examine the QoS configuration of the port:
> > > +
> > > +   `ovs-appctl -t ovs-vswitchd qos/show vhost-user0`
> > > +
> > > +   To clear the QoS configuration from the port and ovsdb use the  
> > following:  
> > > +
> > > +   `ovs-vsctl -- destroy QoS vhost-user0 -- clear Port vhost-user0
> > > + qos`
> > > +
> > > +   For more details regarding egress-policer parameters please refer  
> > to the  
> > > +   vswitch.xml.
> > > +
> > >  Performance Tuning:
> > >  -------------------
> > >
> > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > > 09ccc2c..716da79 100644
> > > --- a/lib/netdev-dpdk.c
> > > +++ b/lib/netdev-dpdk.c
> > > @@ -44,6 +44,7 @@
> > >  #include "ovs-rcu.h"
> > >  #include "packets.h"
> > >  #include "shash.h"
> > > +#include "smap.h"
> > >  #include "sset.h"
> > >  #include "unaligned.h"
> > >  #include "timeval.h"
> > > @@ -52,12 +53,14 @@
> > >
> > >  #include "rte_config.h"
> > >  #include "rte_mbuf.h"
> > > +#include "rte_meter.h"
> > >  #include "rte_virtio_net.h"
> > >
> > >  VLOG_DEFINE_THIS_MODULE(dpdk);
> > >  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> > >
> > >  #define DPDK_PORT_WATCHDOG_INTERVAL 5
> > > +#define DPDK_MAX_QOS_NAME_SIZE 10  
> Noted, will expand to 20 for the moment unless you think it should be higher?

I'd rather take that out completely.

> > > +
> > > +/* Action that can be set for a packet for rte_meter */ enum
> > > +egress_policer_action {
> > > +        GREEN = e_RTE_METER_GREEN,
> > > +        DROP = 1,
> > > +};  
> > 
> > I don't think it's a good idea to mix external and internal declarations
> > because if the external one changes, it will break here. Since GREEN and
> > DROP have meanings only inside of OVS, we should define our own values.
> > See my comment where this is used.
> >   
> Just to clarify, is this in relation to the use of e_RTE_METER_GREEN?
> Is it prefereable for GREEN to be equal to our own declaration?

We can use e_RTE_METER_GREEN and we should do it to be consistent with
the rte function.  My concern is having two declarations: one coming
from outside and another that you defined to a fixed number.  So, what
happens if e_RTE_METER_GREEN gets shifted and now its value is 1?

But with the simplification I proposed for egress_policer_pkt_handle__()
that enum isn't needed and we can take it out completely. 


> > >  static inline void
> > >  netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats,
> > >                                       struct dp_packet **packets, @@
> > > -1092,6 +1222,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int  
> > qid,  
> > >      struct virtio_net *virtio_dev =  
> > netdev_dpdk_get_virtio(vhost_dev);  
> > >      struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
> > >      unsigned int total_pkts = cnt;
> > > +    unsigned int qos_pkts = cnt;
> > >      uint64_t start = 0;
> > >
> > >      if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { @@ -1106,6
> > > +1237,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
> > >          rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock);
> > >      }
> > >
> > > +    /* Check has QoS has been configured for the netdev */
> > > +    cnt = netdev_dpdk_qos_process__(vhost_dev, cur_pkts, cnt);
> > > +    qos_pkts -= cnt;  
> > 
> > I don't see how this is working because egress_policer_alg_process()
> > will loop in the array one by one freeing or not the packet.
> > If it does nothing, then the array is good but otherwise it creates a
> > problem because the array still references freed entries.  So in loop
> > below we are sending a mix of good and freed buffers up to 'cnt'.
> >   
> So we should never have freed packets included in cnt as it will
> cause a segfault. Consider when an array of size cnt is passed to the
> netdev_dpdk_qos_process(). The timestamp used to update the token
> bucket is taken before the for loop is entered. This way the token
> bucket is only updated once for each batch of packets of size cnt.
> The loop cycles through the array, for any packet that is green the
> token bucket decremented. If a packet is dropped then the bucket has
> been exhausted. The token bucket will not be updated again within the
> run of the loop. Any packets remaining in the array will be dropped
> (i.e. freed) as the bucket is exhausted. The value of cnt will only
> be the number of pkts that were not dropped (i.e. not freed). This
> value will then be passed to the send function ensuring no freed
> packets are present in the array that is to be processed. If the
> token bucket updated between each individual packet being processed
> then you could have the situation you describe, by processing them as
> a batch with a single update beforehand you avoid it.

I haven't looked into the rte implementation yet so I can't tell
if the decision to drop or not the packet is based solely on the
timestamp which doesn't change during the batch processing.

Assuming that is the case I agree with you that the cnt would have
only the first valid entries to be sent out.

My concern is when we are processing a batch with two packets.
One big packet and another small one.  While processing the big
packet, there is no enough tokens and the packet is marked as
RED/DROPPED.  Then the loop gets the second small packet, but
now some enough tokens were added/still available in the bucket
which passes the small packet.


> > > +
> > > +static inline int
> > > +egress_policer_pkt_handle__(struct rte_meter_srtcm *meter,
> > > +                            struct rte_mbuf *pkt, uint64_t time)
> > > +{
> > > +    uint8_t output_color;
> > > +    uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct  
> > ether_hdr);  
> > > +    enum egress_policer_action action = GREEN;
> > > +
> > > +    /* color input is not used for blind modes */
> > > +    output_color = (uint8_t)
> > > rte_meter_srtcm_color_blind_check(meter,  
> > time,  
> > > +  
> > pkt_len);  
> > > +
> > > +    /* Check output color, 0 corresponds to GREEN i.e. packet
> > > can be
> > > +     * transmitted. If greater than 0 then action is set to
> > > DROP. */
> > > +    if (output_color) {
> > > +        action = DROP;
> > > +    }  
> > 
> > 
> > This could be simpler:
> > 
> > static inline int
> > egress_policer_pkt_handle__(struct rte_meter_srtcm *meter,
> >                             struct rte_mbuf *pkt, uint64_t time)
> > {
> >      uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct
> > ether_hdr);
> > 
> >      /* If GREEN, accept it otherwise drop it */
> >      if (rte_meter_srtcm_color_blind_check(meter, time, pkt_len) !=
> > e_RTE_METER_GREEN) {
> >          return 1;
> >      }
> > 
> >      return 0;
> > }
> > 
> >   
> Agreed this is simpler/better. In relation to the external
> declaration comment above, Is it preferable not to use
> e_RTE_METER_GREEN here and instead have a value defined in OVS?

We should use it to be consistent with the rte implementation.

Thanks,
-- 
fbl




More information about the dev mailing list