[ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS functionality.
Flavio Leitner
fbl at sysclose.org
Thu Feb 11 14:05:11 UTC 2016
On Thu, 11 Feb 2016 12:32:56 +0000
"Stokes, Ian" <ian.stokes at intel.com> wrote:
> Thanks For the review Flavio, much appreciated, comments inline. I'll re-spin a new version also.
Thank you for the patch!
comments inline.
>
> > -----Original Message-----
> > From: Flavio Leitner [mailto:fbl at sysclose.org]
> > Sent: Wednesday, February 10, 2016 7:55 PM
> > To: Stokes, Ian
> > Cc: dev at openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS functionality.
> >
> > On Mon, 1 Feb 2016 20:47:25 +0000
> > Ian Stokes <ian.stokes at intel.com> wrote:
> >
> > > This patch provides the modifications required in netdev-dpdk.c and
> > > vswitch.xml to allow for a DPDK user space QoS algorithm.
> > >
> > > This patch adds a QoS configuration structure for netdev-dpdk and
> > > expected QoS operations 'dpdk_qos_ops'. Various helper functions are
> > > also supplied.
> > >
> > > Also included are the modifications required for vswitch.xml to allow
> > > a new QoS implementation for netdev-dpdk devices. This includes a new
> > > QoS type `egress-policer` as well as its expected QoS table entries.
> > >
> > > The QoS functionality implemented for DPDK devices is `egress-
> > policer`.
> > > This can be used to drop egress packets at a configurable rate.
> > >
> > > The INSTALL.DPDK.md guide has also been modified to provide an example
> > > configuration of `egress-policer` QoS.
> > >
> > > Signed-off-by: Ian Stokes <ian.stokes at intel.com>
> > > ---
> > > INSTALL.DPDK.md | 20 +++
> > > lib/netdev-dpdk.c | 439
> > ++++++++++++++++++++++++++++++++++++++++++++++++--
> > > vswitchd/vswitch.xml | 52 ++++++
> > > 3 files changed, 498 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index e8ef4b5..dac70cd
> > > 100644
> > > --- a/INSTALL.DPDK.md
> > > +++ b/INSTALL.DPDK.md
> > > @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd:
> > > ./ovs-ofctl add-flow br0 in_port=2,action=output:1
> > > ```
> > >
> > > +8. QoS usage example
> > > +
> > > + Assuming you have a vhost-user port transmitting traffic
> > consisting of
> > > + packets of size 64 bytes, the following command would limit the
> > egress
> > > + transmission rate of the port to ~1,000,000 packets per second:
> > > +
> > > + `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create
> > qos
> > > + type=egress-policer other-config:cir=46000000
> > > + other-config:cbs=2048`
> > > +
> > > + To examine the QoS configuration of the port:
> > > +
> > > + `ovs-appctl -t ovs-vswitchd qos/show vhost-user0`
> > > +
> > > + To clear the QoS configuration from the port and ovsdb use the
> > following:
> > > +
> > > + `ovs-vsctl -- destroy QoS vhost-user0 -- clear Port vhost-user0
> > > + qos`
> > > +
> > > + For more details regarding egress-policer parameters please refer
> > to the
> > > + vswitch.xml.
> > > +
> > > Performance Tuning:
> > > -------------------
> > >
> > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
> > > 09ccc2c..716da79 100644
> > > --- a/lib/netdev-dpdk.c
> > > +++ b/lib/netdev-dpdk.c
> > > @@ -44,6 +44,7 @@
> > > #include "ovs-rcu.h"
> > > #include "packets.h"
> > > #include "shash.h"
> > > +#include "smap.h"
> > > #include "sset.h"
> > > #include "unaligned.h"
> > > #include "timeval.h"
> > > @@ -52,12 +53,14 @@
> > >
> > > #include "rte_config.h"
> > > #include "rte_mbuf.h"
> > > +#include "rte_meter.h"
> > > #include "rte_virtio_net.h"
> > >
> > > VLOG_DEFINE_THIS_MODULE(dpdk);
> > > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> > >
> > > #define DPDK_PORT_WATCHDOG_INTERVAL 5
> > > +#define DPDK_MAX_QOS_NAME_SIZE 10
> Noted, will expand to 20 for the moment unless you think it should be higher?
I'd rather take that out completely.
> > > +
> > > +/* Action that can be set for a packet for rte_meter */ enum
> > > +egress_policer_action {
> > > + GREEN = e_RTE_METER_GREEN,
> > > + DROP = 1,
> > > +};
> >
> > I don't think it's a good idea to mix external and internal declarations
> > because if the external one changes, it will break here. Since GREEN and
> > DROP have meanings only inside of OVS, we should define our own values.
> > See my comment where this is used.
> >
> Just to clarify, is this in relation to the use of e_RTE_METER_GREEN?
> Is it prefereable for GREEN to be equal to our own declaration?
We can use e_RTE_METER_GREEN and we should do it to be consistent with
the rte function. My concern is having two declarations: one coming
from outside and another that you defined to a fixed number. So, what
happens if e_RTE_METER_GREEN gets shifted and now its value is 1?
But with the simplification I proposed for egress_policer_pkt_handle__()
that enum isn't needed and we can take it out completely.
> > > static inline void
> > > netdev_dpdk_vhost_update_tx_counters(struct netdev_stats *stats,
> > > struct dp_packet **packets, @@
> > > -1092,6 +1222,7 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int
> > qid,
> > > struct virtio_net *virtio_dev =
> > netdev_dpdk_get_virtio(vhost_dev);
> > > struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts;
> > > unsigned int total_pkts = cnt;
> > > + unsigned int qos_pkts = cnt;
> > > uint64_t start = 0;
> > >
> > > if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { @@ -1106,6
> > > +1237,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
> > > rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock);
> > > }
> > >
> > > + /* Check has QoS has been configured for the netdev */
> > > + cnt = netdev_dpdk_qos_process__(vhost_dev, cur_pkts, cnt);
> > > + qos_pkts -= cnt;
> >
> > I don't see how this is working because egress_policer_alg_process()
> > will loop in the array one by one freeing or not the packet.
> > If it does nothing, then the array is good but otherwise it creates a
> > problem because the array still references freed entries. So in loop
> > below we are sending a mix of good and freed buffers up to 'cnt'.
> >
> So we should never have freed packets included in cnt as it will
> cause a segfault. Consider when an array of size cnt is passed to the
> netdev_dpdk_qos_process(). The timestamp used to update the token
> bucket is taken before the for loop is entered. This way the token
> bucket is only updated once for each batch of packets of size cnt.
> The loop cycles through the array, for any packet that is green the
> token bucket decremented. If a packet is dropped then the bucket has
> been exhausted. The token bucket will not be updated again within the
> run of the loop. Any packets remaining in the array will be dropped
> (i.e. freed) as the bucket is exhausted. The value of cnt will only
> be the number of pkts that were not dropped (i.e. not freed). This
> value will then be passed to the send function ensuring no freed
> packets are present in the array that is to be processed. If the
> token bucket updated between each individual packet being processed
> then you could have the situation you describe, by processing them as
> a batch with a single update beforehand you avoid it.
I haven't looked into the rte implementation yet so I can't tell
if the decision to drop or not the packet is based solely on the
timestamp which doesn't change during the batch processing.
Assuming that is the case I agree with you that the cnt would have
only the first valid entries to be sent out.
My concern is when we are processing a batch with two packets.
One big packet and another small one. While processing the big
packet, there is no enough tokens and the packet is marked as
RED/DROPPED. Then the loop gets the second small packet, but
now some enough tokens were added/still available in the bucket
which passes the small packet.
> > > +
> > > +static inline int
> > > +egress_policer_pkt_handle__(struct rte_meter_srtcm *meter,
> > > + struct rte_mbuf *pkt, uint64_t time)
> > > +{
> > > + uint8_t output_color;
> > > + uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct
> > ether_hdr);
> > > + enum egress_policer_action action = GREEN;
> > > +
> > > + /* color input is not used for blind modes */
> > > + output_color = (uint8_t)
> > > rte_meter_srtcm_color_blind_check(meter,
> > time,
> > > +
> > pkt_len);
> > > +
> > > + /* Check output color, 0 corresponds to GREEN i.e. packet
> > > can be
> > > + * transmitted. If greater than 0 then action is set to
> > > DROP. */
> > > + if (output_color) {
> > > + action = DROP;
> > > + }
> >
> >
> > This could be simpler:
> >
> > static inline int
> > egress_policer_pkt_handle__(struct rte_meter_srtcm *meter,
> > struct rte_mbuf *pkt, uint64_t time)
> > {
> > uint32_t pkt_len = rte_pktmbuf_pkt_len(pkt) - sizeof(struct
> > ether_hdr);
> >
> > /* If GREEN, accept it otherwise drop it */
> > if (rte_meter_srtcm_color_blind_check(meter, time, pkt_len) !=
> > e_RTE_METER_GREEN) {
> > return 1;
> > }
> >
> > return 0;
> > }
> >
> >
> Agreed this is simpler/better. In relation to the external
> declaration comment above, Is it preferable not to use
> e_RTE_METER_GREEN here and instead have a value defined in OVS?
We should use it to be consistent with the rte implementation.
Thanks,
--
fbl
More information about the dev
mailing list