[ovs-dev] [PATCH v4 2/7] dpif-netdev: Make PMD auto load balance use common rxq scheduling.

Jan Scheurich jan.scheurich at ericsson.com
Wed Jul 14 09:05:24 UTC 2021


> > In our patch series we decided to skip the check on cross-numa polling during
> auto-load balancing. The rationale is as follows:
> >
> > If the estimated PMD-rxq distribution includes cross-NUMA rxq assignments,
> the same must apply for the current distribution, as none of the scheduling
> algorithms would voluntarily assign rxqs across NUMA nodes. So, current and
> estimated rxq assignments are comparable and it makes sense to consider
> rebalancing when the variance improves.
> >
> > Please consider removing this check.
> >
> 
> The first thing is that this patch is not changing any behaviour, just re-
> implementing to reuse the common code, so it would not be the place to
> change this functionality.

Fair enough. We should address this in a separate patch.

> About the proposed change itself, just to be clear what is allowed currently. It
> will allow rebalance when there are local pmds, OR there are no local pmds
> and there is one other NUMA node with pmds available for cross-numa polling.
> 
> The rationale of not doing a rebalance when there are no local pmds but
> multiple other NUMAs available for cross-NUMA polling is that the estimate
> may be incorrect due a different cross-NUMA being choosen for an Rxq than is
> currently used.
> 
> I thought about some things like making an Rxq sticky with a particular cross-
> NUMA etc for this case but that brings a whole new set of problems, e.g. what
> happens if that NUMA gets overloaded, reduced cores, how can it ever be reset
> etc. so I decided not to pursue it as I think it is probably a corner case (at least
> for now).

We currently don't see any scenarios with more than two NUMA nodes, but different CPU/server architectures may perhaps have more NUMA nodes than CPU sockets. 

> I know the case of no local pmd and one NUMA with pmds is not a corner case
> as I'm aware of users doing that.

Agree such configurations are a must to support with auto-lb.

> We can discuss further about the multiple non-local NUMA case and maybe
> there's some improvements we can think of, or maybe I've made some wrong
> assumptions but it would be a follow on from the current patchset.

Our main use case for cross-NUMA balancing comes with the additional freedom to allow cross-NUMA polling for selected ports that we introduce with fourth patch:

    dpif-netdev: Allow cross-NUMA polling on selected ports

    Today dpif-netdev considers PMD threads on a non-local NUMA node for
    automatic assignment of the rxqs of a port only if there are no local,
    non-isolated PMDs.

    On typical servers with both physical ports on one NUMA node, this often
    leaves the PMDs on the other NUMA node under-utilized, wasting CPU
    resources. The alternative, to manually pin the rxqs to PMDs on remote
    NUMA nodes, also has drawbacks as it limits OVS' ability to auto
    load-balance the rxqs.

    This patch introduces a new interface configuration option to allow
    ports to be automatically polled by PMDs on any NUMA node:

    ovs-vsctl set interface <Name> other_config:cross-numa-polling=true

    If this option is not present or set to false, legacy behaviour applies.

We indeed use this for our physical ports to be polled by non-isolated PMDs on both NUMAs. The observed capacity improvement is very substantial, so we plan to port this feature on top of your patches once they are merged. 

This can only fly if the auto-load balancing is allowed to activate rxq assignments with cross-numa polling also in the case there are local non-isolated PMDs.

Anyway, we can take this up later in our upcoming patch that introduces this option.

BR, Jan


More information about the dev mailing list