[ovs-dev] [PATCH V2 11/19] dpif-netdev: Read hw stats during flow_dump_next() call

Eli Britstein elibr at mellanox.com
Wed Dec 4 17:15:33 UTC 2019


On 12/4/2019 5:58 PM, Ilya Maximets wrote:
> On 04.12.2019 16:41, Eli Britstein wrote:
>> On 12/4/2019 5:16 PM, Ilya Maximets wrote:
>>> On 02.12.2019 09:41, Eli Britstein wrote:
>>>> From: Ophir Munk <ophirmu at mellanox.com>
>>>>
>>>> Use netdev dump flow next API in order to update the statistics of fully
>>>> offloaded flows.
>>>>
>>>> Co-authored-by: Eli Britstein <elibr at mellanox.com>
>>>> Signed-off-by: Ophir Munk <ophirmu at mellanox.com>
>>>> Reviewed-by: Oz Shlomo <ozsh at mellanox.com>
>>>> Signed-off-by: Eli Britstein <elibr at mellanox.com>
>>>> ---
>>>>    lib/dpif-netdev.c | 42 ++++++++++++++++++++++++++++++++++++++++--
>>>>    1 file changed, 40 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>> index 5142bad1d..bfeb1e7b0 100644
>>>> --- a/lib/dpif-netdev.c
>>>> +++ b/lib/dpif-netdev.c
>>>> @@ -527,6 +527,7 @@ struct dp_netdev_flow {
>>>>    
>>>>        bool dead;
>>>>        uint32_t mark;               /* Unique flow mark assigned to a flow */
>>>> +    bool actions_offloaded;      /* true if flow is fully offloaded */
>>>>    
>>>>        /* Statistics. */
>>>>        struct dp_netdev_flow_stats stats;
>>>> @@ -2410,6 +2411,7 @@ dp_netdev_flow_offload_put(struct dp_flow_offload_item *offload)
>>>>            }
>>>>        }
>>>>        info.flow_mark = mark;
>>>> +    info.actions_offloaded = &flow->actions_offloaded;
>>>>    
>>>>        ovs_mutex_lock(&pmd->dp->port_mutex);
>>>>        port = dp_netdev_lookup_port(pmd->dp, in_port);
>>>> @@ -3073,8 +3075,8 @@ dp_netdev_flow_to_dpif_flow(const struct dp_netdev_flow *netdev_flow,
>>>>        flow->pmd_id = netdev_flow->pmd_id;
>>>>        get_dpif_flow_stats(netdev_flow, &flow->stats);
>>>>    
>>>> -    flow->attrs.offloaded = false;
>>>> -    flow->attrs.dp_layer = "ovs";
>>>> +    flow->attrs.offloaded = netdev_flow->actions_offloaded;
>>>> +    flow->attrs.dp_layer = flow->attrs.offloaded ? "in_hw" : "ovs";
>>>>    }
>>>>    
>>>>    static int
>>>> @@ -3244,6 +3246,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>>>>        flow->dead = false;
>>>>        flow->batch = NULL;
>>>>        flow->mark = INVALID_FLOW_MARK;
>>>> +    flow->actions_offloaded = false;
>>>>        *CONST_CAST(unsigned *, &flow->pmd_id) = pmd->core_id;
>>>>        *CONST_CAST(struct flow *, &flow->flow) = match->flow;
>>>>        *CONST_CAST(ovs_u128 *, &flow->ufid) = *ufid;
>>>> @@ -3598,6 +3601,37 @@ dpif_netdev_flow_dump_thread_destroy(struct dpif_flow_dump_thread *thread_)
>>>>        free(thread);
>>>>    }
>>>>    
>>>> +static int
>>>> +dpif_netdev_offload_used(struct dp_netdev_flow *netdev_flow,
>>>> +                         struct dp_netdev_pmd_thread *pmd)
>>>> +{
>>>> +    struct netdev_flow_dump netdev_dump;
>>>> +    struct dpif_flow_stats stats;
>>>> +    ovs_u128 ufid;
>>>> +    bool has_next;
>>>> +
>>>> +    netdev_dump.port = netdev_flow->flow.in_port.odp_port;
>>>> +    netdev_dump.netdev = netdev_ports_get(netdev_dump.port, pmd->dp->class);
>>>> +    if (!netdev_dump.netdev) {
>>>> +        return -1;
>>>> +    }
>>>> +    /* get offloaded stats */
>>>> +    ufid = netdev_flow->mega_ufid;
>>>> +    has_next = netdev_flow_dump_next(&netdev_dump, NULL, NULL, &stats, NULL,
>>>> +                                     &ufid, NULL, NULL);
>>> This will not work with linux TC flow API.
>>> You should not create 'netdev_flow_dump' by yourself, it must be created
>>> by the netdev_ports_flow_dump_create().
>> It is different in linux TC.
>>
>> In kernel, dump request is done in dump_create API, for each device
>> (using netdev_ports_flow_dump_create).
>>
>> In user space we don't query the driver at all but the flows are looped
>> by their PMD. We don't need any specific netdev action.
>>
>> In kernel, dump next handle a chunk of replies from the kernel, without
>> any known order or properties - neither by device nor by any other
>> property, and the next handles the multi-threaded aspect of the dump.
>>
>> In user space the flows are known by the PMD looping, and the
>> multi-threaded scheme is handled by the PMD thread, so this is also
>> different.
>>
>> In kernel, the dump_next gets the flows from the kernel, and deduct the
>> UFID for each one. In user space we get the UFID as known, and deduct
>> the rte_flow, also different.
> I don't see the limitation that will not allow you to implement the
> same thing for netdev-offload-dpdk.  It knows all the flows and ufids
> and you just need to iterate over them.  You may also dump flows
> directly from the HW because you can request them.
Correct me if I'm wrong, but DPDK doesn't provide an API to "request" HW 
flows.
>
> There is almost same thing implemented for datapath flow dumps.  Just
> look at the API:
>       dpif_netdev_flow_dump_create,
>       dpif_netdev_flow_dump_destroy,
>       dpif_netdev_flow_dump_thread_create,
>       dpif_netdev_flow_dump_thread_destroy,
>       dpif_netdev_flow_dump_next
>
> And dpif-netdev fully implements it regardless of fact that this could
> be done much easier in userspace datapath.

Yes, those functions are implemented, but the scheme of querying the 
flows is different, as in my comments above.

So, I will implement flow_get API instead, and fill only the stats, 
queried from the HW.

>
>> The code in this commit is not relevant for linux TC, not breaking it,
>> so I don't see "this will will not work with linux TC" issue.
> This code should work if you'll add netdev-linux port to userspace
> datapath and linux TC offloading API will be initialized for it.
> Current implementation will not work.

If so, it is currently broken as is. The current dpif-netdev.c code 
doesn't read any HW statistics before my patch.



More information about the dev mailing list