[ovs-dev] [PATCH v5 2/2] netdev-dpdk: Add new DPDK RFC 4115 egress policer

Kevin Traynor ktraynor at redhat.com
Wed Jan 15 16:49:49 UTC 2020

On 15/01/2020 13:19, Stokes, Ian wrote:
> On 1/15/2020 12:19 PM, Ilya Maximets wrote:
>> On 15.01.2020 12:28, Stokes, Ian wrote:
>>> On 1/14/2020 4:12 PM, Eelco Chaudron wrote:
>>>> This patch adds a new policer to the DPDK datapath based on RFC 4115's
>>>> Two-Rate, Three-Color marker. It's a two-level hierarchical policer
>>>> which first does a color-blind marking of the traffic at the queue
>>>> level, followed by a color-aware marking at the port level. At the end
>>>> traffic marked as Green or Yellow is forwarded, Red is dropped. For
>>>> details on how traffic is marked, see RFC 4115.
>>>> This egress policer can be used to limit traffic at different rated
>>>> based on the queues the traffic is in. In addition, it can also be used
>>>> to prioritize certain traffic over others at a port level.
>>>> For example, the following configuration will limit the traffic rate at a
>>>> port level to a maximum of 2000 packets a second (64 bytes IPv4 packets).
>>>> 100pps as CIR (Committed Information Rate) and 1000pps as EIR (Excess
>>>> Information Rate). High priority traffic is routed to queue 10, which marks
>>>> all traffic as CIR, i.e. Green. All low priority traffic, queue 20, is
>>>> marked as EIR, i.e. Yellow.
>>>> ovs-vsctl --timeout=5 set port dpdk1 qos=@myqos -- \
>>>>     --id=@myqos create qos type=trtcm-policer \
>>>>     other-config:cir=52000 other-config:cbs=2048 \
>>>>     other-config:eir=52000 other-config:ebs=2048  \
>>>>     queues:10=@dpdk1Q10 queues:20=@dpdk1Q20 -- \
>>>>     --id=@dpdk1Q10 create queue \
>>>>       other-config:cir=41600000 other-config:cbs=2048 \
>>>>       other-config:eir=0 other-config:ebs=0 -- \
>>>>     --id=@dpdk1Q20 create queue \
>>>>       other-config:cir=0 other-config:cbs=0 \
>>>>       other-config:eir=41600000 other-config:ebs=2048 \
>>>> This configuration accomplishes that the high priority traffic has a
>>>> guaranteed bandwidth egressing the ports at CIR (1000pps), but it can also
>>>> use the EIR, so a total of 2000pps at max. These additional 1000pps is
>>>> shared with the low priority traffic. The low priority traffic can use at
>>>> maximum 1000pps.
>>> Thanks for the patch Eelco, minor comment below.
>>> <snip>
>>>>      Rate Limiting (Ingress Policing)
>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>>> index 128963f..1ed4a47 100644
>>>> --- a/lib/netdev-dpdk.c
>>>> +++ b/lib/netdev-dpdk.c
>>>> @@ -26,6 +26,12 @@
>>>>    #include <sys/socket.h>
>>>>    #include <linux/if.h>
>>>>    +/* Include rte_compat.h first to allow experimental API's needed for the
>>>> + * rte_meter.h rfc4115 functions. Once they are no longer marked as
>>>> + * experimental the #define and rte_compat.h include can be removed.
>>>> + */
>>>> +#include <rte_compat.h>
>>> I guess the risk here, from what I understand, this approach is all or nothing in terms of experimental APIs from the included headers, other experimental APIs could be used from DPDK going forward without causing warning?
>>> If so, there would have to be extra dilligence taken when reviewing future patches and a discussion if an API is expereimental, should it wait until it is marked as non experimental. (In this case The TRTCM looks stable and is highly unlikey to be removed so it's not an issue IMO).
>>> @Ilya/Kevin: Would you agree with above? Thoughts?
>> I think it make sense to wait for API to become non-experimental
>> if we have no easy way to enable it only on functions we need.
>> I agree that having widly enabled experimental api might produce
>> additional issues and will require more careful review of all the
>> DPDK related patches.
> So I'm ok with making an exception in this case for the feature, I think 
> it was an oversight but the API is stable. But what I'm hearing is in 
> general the rule should be for new features the API should not be 
> experimental and if it is then it would have to have it's experimental 
> tag removed by the next DPDK upgrade (in this case 20.11 for arguments 
> sake) and then can be upstreamed to OVS master.

In this case, it sounds ok to me. We know it will be non-experimental in
the next hours/days and we will not be left with some zombie feature on
upgrade. Plus, there might be an acceptable way of adding a
non-experimental symbol for this to the v19.11 stable branch too (not
guaranteed though), and if so we could turn this off on upgrade to
v19.11.X. OTOH, I don't know how important it is to have this feature in
OVS 2.13.

>> BTW, another thought I have in mind about all the release management is:
>> Shouldn't we hold OVS updates to new DPDK LTS until the first correction
>> release is out?  I mean, for example, Ubuntu triggers updates from one
>> LTS release to another only after .1 correcting relese is out (users
>> of Ubuntu 18.04 will receive upgrade notifications only after 20.04.1
>> is released).  Shouldn't we do the same thing?  Shouldn't we upgrade
>> to the next DPDK LTS only after XX.11.1 is ready?  This might make sense
>> in order to not have obviously broken functionality in OVS releases but
>> at the same time might just defer actual revealing of DPDK issues, so
>> I'm not fully sure about this.  Since OVS is not the only user of DPDK,
>> this still might make sense anyway.  Would like to hear some thoughts.
> Yes, I've thought about this as well. There certainly is advantage to 
> moving to a .1 release in terms of stability instead of .0. However when 
> I thought about the release two things came to mind.

Hmm, I naturally think this about stability too, but then I look at the
commits for the one I am most familiar with (excluding CVE releases),
18.11.1: 232
18.11.2: 361
18.11.3: 329
18.11.6-rc2: 280

It's hard to know how to interpret. Maybe it can be that "big" bugs are
found quickly? Maybe it takes a while for integrations and bugfixes to
feed back into DPDK? But at least we cannot say that everything gets
fixed in .1 and then it's stable.

> (i) By moving to the .0 release, is OVS in a position to better 
> contribute feedback for the .1 release and ensure relevant patches for 
> OVS with DPDK fixes are upstreamed in the .1 (rather than a .2). 
> Feedback coming not from just the developers but also from end users.

Well, the sooner it is integrated and used, the sooner we will find bugs
that is sure. For example, iirc the non-experimental API issue was only
fully realised because we upgraded and wanted to apply this patch.

> (ii) With the timeing of the OVS to .1 releases, does it make a massive 
> differemce? For example with OVS 2.13 being released late February and 
> 19.11.1 being released early March, I would think OVS would move to .1 
> pretty quickly to benefit from the latest fixes on OVS master. This 
> still allows our OVS release to use the latest stable DPDK without 
> waiting for OVS 2.14 in August.

It is a long time for some having to wait up to 12 months for the
feature or driver they put in DPDK to be usable in OVS as it is, so I
wouldn't be in favour of extending that.

Also, it seems to have settled into a cadence of Feb release of OVS
getting Nov LTS release of DPDK. I know that at least some people plan
around this and then integration into Openstack etc. so any change would
have an impact.

> There's definitely advantages to both approaches and worth further 
> discussion. I'd be interested in hearing what others think?

I don't think there is a real issue with the current model. And nothing
that more testing of intermediate DPDK releases (.02/.05/.08) and DPDK
.11-RC could not iron out.

Just my 0.02c,

> BR
> Ian
>> Best regards, Ilya Maximets.

More information about the dev mailing list