[ovs-dev] [PATCH V2 11/19] dpif-netdev: Read hw stats during flow_dump_next() call

Ilya Maximets i.maximets at ovn.org
Wed Dec 4 15:58:04 UTC 2019


On 04.12.2019 16:41, Eli Britstein wrote:
> 
> On 12/4/2019 5:16 PM, Ilya Maximets wrote:
>> On 02.12.2019 09:41, Eli Britstein wrote:
>>> From: Ophir Munk <ophirmu at mellanox.com>
>>>
>>> Use netdev dump flow next API in order to update the statistics of fully
>>> offloaded flows.
>>>
>>> Co-authored-by: Eli Britstein <elibr at mellanox.com>
>>> Signed-off-by: Ophir Munk <ophirmu at mellanox.com>
>>> Reviewed-by: Oz Shlomo <ozsh at mellanox.com>
>>> Signed-off-by: Eli Britstein <elibr at mellanox.com>
>>> ---
>>>   lib/dpif-netdev.c | 42 ++++++++++++++++++++++++++++++++++++++++--
>>>   1 file changed, 40 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index 5142bad1d..bfeb1e7b0 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -527,6 +527,7 @@ struct dp_netdev_flow {
>>>   
>>>       bool dead;
>>>       uint32_t mark;               /* Unique flow mark assigned to a flow */
>>> +    bool actions_offloaded;      /* true if flow is fully offloaded */
>>>   
>>>       /* Statistics. */
>>>       struct dp_netdev_flow_stats stats;
>>> @@ -2410,6 +2411,7 @@ dp_netdev_flow_offload_put(struct dp_flow_offload_item *offload)
>>>           }
>>>       }
>>>       info.flow_mark = mark;
>>> +    info.actions_offloaded = &flow->actions_offloaded;
>>>   
>>>       ovs_mutex_lock(&pmd->dp->port_mutex);
>>>       port = dp_netdev_lookup_port(pmd->dp, in_port);
>>> @@ -3073,8 +3075,8 @@ dp_netdev_flow_to_dpif_flow(const struct dp_netdev_flow *netdev_flow,
>>>       flow->pmd_id = netdev_flow->pmd_id;
>>>       get_dpif_flow_stats(netdev_flow, &flow->stats);
>>>   
>>> -    flow->attrs.offloaded = false;
>>> -    flow->attrs.dp_layer = "ovs";
>>> +    flow->attrs.offloaded = netdev_flow->actions_offloaded;
>>> +    flow->attrs.dp_layer = flow->attrs.offloaded ? "in_hw" : "ovs";
>>>   }
>>>   
>>>   static int
>>> @@ -3244,6 +3246,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>>>       flow->dead = false;
>>>       flow->batch = NULL;
>>>       flow->mark = INVALID_FLOW_MARK;
>>> +    flow->actions_offloaded = false;
>>>       *CONST_CAST(unsigned *, &flow->pmd_id) = pmd->core_id;
>>>       *CONST_CAST(struct flow *, &flow->flow) = match->flow;
>>>       *CONST_CAST(ovs_u128 *, &flow->ufid) = *ufid;
>>> @@ -3598,6 +3601,37 @@ dpif_netdev_flow_dump_thread_destroy(struct dpif_flow_dump_thread *thread_)
>>>       free(thread);
>>>   }
>>>   
>>> +static int
>>> +dpif_netdev_offload_used(struct dp_netdev_flow *netdev_flow,
>>> +                         struct dp_netdev_pmd_thread *pmd)
>>> +{
>>> +    struct netdev_flow_dump netdev_dump;
>>> +    struct dpif_flow_stats stats;
>>> +    ovs_u128 ufid;
>>> +    bool has_next;
>>> +
>>> +    netdev_dump.port = netdev_flow->flow.in_port.odp_port;
>>> +    netdev_dump.netdev = netdev_ports_get(netdev_dump.port, pmd->dp->class);
>>> +    if (!netdev_dump.netdev) {
>>> +        return -1;
>>> +    }
>>> +    /* get offloaded stats */
>>> +    ufid = netdev_flow->mega_ufid;
>>> +    has_next = netdev_flow_dump_next(&netdev_dump, NULL, NULL, &stats, NULL,
>>> +                                     &ufid, NULL, NULL);
>>
>> This will not work with linux TC flow API.
>> You should not create 'netdev_flow_dump' by yourself, it must be created
>> by the netdev_ports_flow_dump_create().
> 
> It is different in linux TC.
> 
> In kernel, dump request is done in dump_create API, for each device 
> (using netdev_ports_flow_dump_create).
> 
> In user space we don't query the driver at all but the flows are looped 
> by their PMD. We don't need any specific netdev action.
> 
> In kernel, dump next handle a chunk of replies from the kernel, without 
> any known order or properties - neither by device nor by any other 
> property, and the next handles the multi-threaded aspect of the dump.
> 
> In user space the flows are known by the PMD looping, and the 
> multi-threaded scheme is handled by the PMD thread, so this is also 
> different.
> 
> In kernel, the dump_next gets the flows from the kernel, and deduct the 
> UFID for each one. In user space we get the UFID as known, and deduct 
> the rte_flow, also different.

I don't see the limitation that will not allow you to implement the
same thing for netdev-offload-dpdk.  It knows all the flows and ufids
and you just need to iterate over them.  You may also dump flows
directly from the HW because you can request them.

There is almost same thing implemented for datapath flow dumps.  Just
look at the API:
     dpif_netdev_flow_dump_create,                                                
     dpif_netdev_flow_dump_destroy,                                               
     dpif_netdev_flow_dump_thread_create,                                         
     dpif_netdev_flow_dump_thread_destroy,                                        
     dpif_netdev_flow_dump_next

And dpif-netdev fully implements it regardless of fact that this could
be done much easier in userspace datapath.

> 
> The code in this commit is not relevant for linux TC, not breaking it, 
> so I don't see "this will will not work with linux TC" issue.

This code should work if you'll add netdev-linux port to userspace
datapath and linux TC offloading API will be initialized for it.
Current implementation will not work.


More information about the dev mailing list