[ovs-dev] [PATCH v8 0/3] Support dynamic rebalancing of offloaded flows

Eelco Chaudron echaudro at redhat.com
Thu Oct 18 19:57:48 UTC 2018



On 18 Oct 2018, at 18:13, Sriharsha Basavapatna via dev wrote:

> With the current OVS offload design, when an offload-device fails to 
> add a
> flow rule and returns an error, OVS adds the rule to the kernel 
> datapath.
> The flow gets processed by the kernel datapath for the entire life of 
> that
> flow. This is fine when an error is returned by the device due to lack 
> of
> support for certain keys or actions.
>
> But when an error is returned due to temporary conditions such as lack 
> of
> resources to add a flow rule, the flow continues to be processed by 
> kernel
> even when resources become available later. That is, those flows never 
> get
> offloaded again. This problem becomes more pronounced when a flow that 
> has
> been initially offloaded may have a smaller packet rate than a later 
> flow
> that could not be offloaded due to lack of resources. This leads to
> inefficient use of HW resources and wastage of host CPU cycles.
>
> This patch-set addresses this issue by providing a way to detect 
> temporary
> offload resource constraints (Out-Of-Resource or OOR condition) and to
> selectively and dynamically offload flows with a higher 
> packets-per-second
> (pps) rate. This dynamic rebalancing is done periodically on netdevs 
> that
> are in OOR state until resources become available to offload all 
> pending
> flows.
>
> The patch-set involves the following changes at a high level:
>
> 1. Detection of Out-Of-Resources (OOR) condition on an offload-capable
>    netdev.
> 2. Gathering flow offload selection criteria for all flows on an OOR 
> netdev;
>    i.e, packets-per-second (pps) rate of flows for offloaded and
>    non-offloaded (pending) flows.
> 3. Dynamically replacing offloaded flows with a lower pps-rate, with
>    non-offloaded flows with a higher pps-rate, on an OOR netdev. A new
>    OpenvSwitch configuration option - "offload-rebalance" to enable
>    this policy.
>
> Cost/benefits data points:
>
> 1. Rough cost of the new rebalancing, in terms of CPU time:
>
>    Ran a test that replaced 256 low pps-rate flows(pings) with 256 
> high
>    pps-rate flows(iperf), in a system with 4 cpus (Intel Xeon E5 @ 
> 2.40GHz;
>    2 cores with hw threads enabled, rest disabled). The data showed 
> that cpu
>    utilization increased by about ~20%. This increase occurs during 
> the
>    specific second in which rebalancing is done. And subsequently 
> (from the
>    next second), cpu utilization decreases significantly due to 
> offloading
>    of higher pps-rate flows. So effectively there's a bump in cpu 
> utilization
>    at the time of rebalancing, that is more than compensated by 
> reduced cpu
>    utilization once the right flows get offloaded.
>
> 2. Rough benefits to the user in terms of offload performance:
>
>    The benefits to the user is reduced cpu utilization in the host, 
> since
>    higher pps-rate flows get offloaded, replacing lower pps-rate 
> flows.
>    Replacing a single offloaded flood ping flow with an iperf flow 
> (multiple
>    connections), shows that the cpu %used that was originally 100% on 
> a
>    single cpu (rebalancing disabled) goes down to 35% (rebalancing 
> enabled).
>    That is, cpu utilization decreased 65% after rebalancing.
>
> 3. Circumstances under which the benefits would show up:
>
>    The rebalancing benefits would show up once offload resources are
>    exhausted and new flows with higher pps-rate are initiated, that 
> would
>    otherwise be handled by kernel datapath costing host cpu cycles.
>
>    This can be observed using 'ovs appctl dpctl/ dump-flows' command. 
> Prior
>    to rebalancing, any high pps-rate flows that couldn't be offloaded 
> due to
>    resource crunch would show up in the output of 'dump-flows 
> type=ovs' and
>    after rebalancing such flows would appear in the output of
>    'dump-flows type=offloaded'.
>

Before I review the individual patches, hope to do this tomorrow, I have 
some general concerns/comments.

Once committed will this feature be marked double experimental? Just 
want to clarify if it goes in as part of HW offload the experimental tag 
for this is not removed once HW offload becomes mainstream.

Currently, in OOR state both phases (insert, and rebalance) are ran. 
Rather than have offload-rebalance true, or false. Maybe we can have, 
disable, retry, retry-rebalance. I think only retrying might be 
beneficial as well, not changing existing flows.

My main objection against offloading flows at a later stage is packet 
re-ordering. As soon as you move from kernel datapath to hardware 
offload packet might be sent out of order. This is mainly a problem for 
TCP streams, which do not have the problem if the stream is directly 
offloaded at the start.

To make this even worse is that the current implementation has no 
protection for flows fighting to be offloaded, i.e. ping/ponging from HW 
to SW, hence causing a lot of out of order packets.

Based on the above I was wondering if any tests where done to measure 
the out of order packets/jitter on rebalancing?

Last implementation item I have is the packet throughput through the 
kernel datapath is rather low, <200Kpps. This low throughput might make 
it hard to determine which not offloaded flow might be the best suited 
for HW offload. The kernel might drop packets for a flow with a way 
higher potential than the packets that do make it through the kernel. I 
do not have a solution for this, but I guess it worth keeping in mind 
when this gets enabled.

As a general question, where other solutions being considered to cope 
with inadequate resources? Maybe a solution that will give the operator 
more control over what would be offloaded? Like giving all configured 
flows a relative priority. Either at individual flow creation or some 
general overlay, i.e. all TCP flows for example.h


Cheers,

Eelco


More information about the dev mailing list