[ovs-dev] [PATCH 3/5] dpif-netdev: Add group rxq scheduling assignment type.

Pai G, Sunil sunil.pai.g at intel.com
Wed Jun 23 15:29:31 UTC 2021


Hey Kevin , 

Patch looks good to me.
Builds fine , all test cases here http://patchwork.ozlabs.org/project/openvswitch/patch/20210316154532.127858-1-ktraynor@redhat.com/  pass as well.
The groups assignment works fine too.

>From vswitchd logs:

dpif_netdev|INFO|PMD auto load balance load threshold set to 50%
dpif_netdev|INFO|PMD auto load balance is disabled
dpif_netdev|INFO|PMD auto load balance improvement threshold set to 5%
dpif_netdev|INFO|PMD auto load balance is disabled
dpif_netdev|INFO|PMD auto load balance is enabled interval 1 mins, pmd load threshold 50%, improvement threshold 5%
dpif_netdev|INFO|Rxq to PMD assignment mode changed to: 'group'.
dpif_netdev|INFO|Performing pmd to rx queue assignment using group algorithm.
...
...
dpif_netdev|DBG|PMD auto load balance performing dry run.
dpif_netdev|DBG|There's no available (non-isolated) pmd thread on numa node 0. Port 'dpdk1' rx queue 0 will be assigned to a pmd on numa node 1. This may lead to reduced performance.
dpif_netdev|DBG|Core 26 on numa node 1 assigned port 'dpdk1' rx queue 0. (measured processing cycles 117514888500).
dpif_netdev|DBG|There's no available (non-isolated) pmd thread on numa node 0. Port 'dpdk1' rx queue 1 will be assigned to a pmd on numa node 1. This may lead to reduced performance.
dpif_netdev|DBG|Core 27 on numa node 1 assigned port 'dpdk1' rx queue 1. (measured processing cycles 115517336048).
dpif_netdev|DBG|There's no available (non-isolated) pmd thread on numa node 0. Port 'dpdk0' rx queue 1 will be assigned to a pmd on numa node 1. This may lead to reduced performance.
dpif_netdev|DBG|Core 27 on numa node 1 assigned port 'dpdk0' rx queue 1. (measured processing cycles 799822228).
dpif_netdev|DBG|There's no available (non-isolated) pmd thread on numa node 0. Port 'dpdk0' rx queue 0 will be assigned to a pmd on numa node 1. This may lead to reduced performance.
dpif_netdev|DBG|Core 27 on numa node 1 assigned port 'dpdk0' rx queue 0. (measured processing cycles 539222868).
dpif_netdev|DBG|There's no available (non-isolated) pmd thread on numa node 0. Port 'dpdk0' rx queue 2 will be assigned to a pmd on numa node 1. This may lead to reduced performance.
dpif_netdev|DBG|Core 27 on numa node 1 assigned port 'dpdk0' rx queue 2. (measured processing cycles 538244586).
dpif_netdev|DBG|Current variance 1 Estimated variance 0
dpif_netdev|DBG|Variance improvement 100%
dpif_netdev|DBG|PMD load variance improvement threshold 5% is met
dpif_netdev|INFO|PMD auto load balance dry run. Requesting datapath reconfigure.
dpif_netdev|INFO|Performing pmd to rx queue assignment using group algorithm.



Some minor nits inline :

<snipped>

> +enum sched_assignment_type {
> +    SCHED_ROUNDROBIN,
> +    SCHED_CYCLES, /* Default.*/
> +    SCHED_GROUP,
> +    SCHED_MAX
> +};
> +
>  /* Datapath based on the network device interface from netdev.h.
>   *
> @@ -367,5 +374,5 @@ struct dp_netdev {
>      struct ovs_mutex tx_qid_pool_mutex;
>      /* Use measured cycles for rxq to pmd assignment. */
> -    bool pmd_rxq_assign_cyc;
> +    enum sched_assignment_type pmd_rxq_assign_cyc;

sched_type would be a better name perhaps ?


<snipped>


> +static struct sched_pmd *
> +get_lowest_num_rxq_pmd(struct sched_numa *numa) {
> +    struct sched_pmd *lowest_rxqs_sched_pmd = NULL;
> +    unsigned lowest_rxqs = UINT_MAX;

n_lowest_rxqs is a bit more clear perhaps ?

> +
> +    /* find the pmd with lowest number of rxqs */
> +    for (unsigned i = 0; i < numa->n_pmds; i++) {
> +        struct sched_pmd *sched_pmd;
> +        unsigned num_rxqs;
> +
> +        sched_pmd = &numa->pmds[i];
> +        num_rxqs = sched_pmd->n_rxq;
> +        if (sched_pmd->isolated) {
> +            continue;
> +        }
> +
> +        /* If this current load is higher we can go to the next one */

Full stop at the end of the comment missing. May be check this once for the entire patch ?

> +        if (num_rxqs > lowest_rxqs) {
> +            continue;
> +        }
> +       if (num_rxqs < lowest_rxqs) {
> +           lowest_rxqs = num_rxqs;
> +           lowest_rxqs_sched_pmd = sched_pmd;
> +        }
> +    }
> +    return lowest_rxqs_sched_pmd;
> +}
> +
> +static struct sched_pmd *
> +get_lowest_proc_pmd(struct sched_numa *numa) {
> +    struct sched_pmd *lowest_loaded_sched_pmd = NULL;
> +    uint64_t lowest_load = UINT64_MAX;
> +
> +    /* find the pmd with the lowest load */
> +    for (unsigned i = 0; i < numa->n_pmds; i++) {
> +        struct sched_pmd *sched_pmd;
> +        uint64_t pmd_load;
> +
> +        sched_pmd = &numa->pmds[i];
> +        if (sched_pmd->isolated) {
> +            continue;
> +        }
> +        pmd_load = sched_pmd->pmd_proc_cycles;
> +        /* If this current load is higher we can go to the next one */
> +        if (pmd_load > lowest_load) {
> +            continue;
> +        }
> +       if (pmd_load < lowest_load) {
> +           lowest_load = pmd_load;
> +           lowest_loaded_sched_pmd = sched_pmd;
> +        }
> +    }
> +    return lowest_loaded_sched_pmd;
> +}

get_lowest_proc_pmd and get_lowest_num_rxq_pmd look quite identical.
I wonder if we could combine them into a single function somehow ?

<snipped>




More information about the dev mailing list