[ovs-dev] [PATCH V2 11/19] dpif-netdev: Read hw stats during flow_dump_next() call

Ilya Maximets i.maximets at ovn.org
Mon Dec 9 12:48:25 UTC 2019


On 06.12.2019 17:29, Eli Britstein wrote:
> 
> On 12/6/2019 6:04 PM, Ilya Maximets wrote:
>> On 04.12.2019 21:23, Eli Britstein wrote:
>>> On 12/4/2019 8:58 PM, Ilya Maximets wrote:
>>>> On 04.12.2019 18:15, Eli Britstein wrote:
>>>>> On 12/4/2019 5:58 PM, Ilya Maximets wrote:
>>>>>> On 04.12.2019 16:41, Eli Britstein wrote:
>>>>>>> On 12/4/2019 5:16 PM, Ilya Maximets wrote:
>>>>>>>> On 02.12.2019 09:41, Eli Britstein wrote:
>>>>>>>>> From: Ophir Munk<ophirmu at mellanox.com>
>>>>>>>>>
>>>>>>>>> Use netdev dump flow next API in order to update the statistics of fully
>>>>>>>>> offloaded flows.
>>>>>>>>>
>>>>>>>>> Co-authored-by: Eli Britstein<elibr at mellanox.com>
>>>>>>>>> Signed-off-by: Ophir Munk<ophirmu at mellanox.com>
>>>>>>>>> Reviewed-by: Oz Shlomo<ozsh at mellanox.com>
>>>>>>>>> Signed-off-by: Eli Britstein<elibr at mellanox.com>
>>>>>>>>> ---
>>>>>>>>>      lib/dpif-netdev.c | 42 ++++++++++++++++++++++++++++++++++++++++--
>>>>>>>>>      1 file changed, 40 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>>>>>>>> index 5142bad1d..bfeb1e7b0 100644
>>>>>>>>> --- a/lib/dpif-netdev.c
>>>>>>>>> +++ b/lib/dpif-netdev.c
>>>>>>>>> @@ -527,6 +527,7 @@ struct dp_netdev_flow {
>>>>>>>>>      
>>>>>>>>>          bool dead;
>>>>>>>>>          uint32_t mark;               /* Unique flow mark assigned to a flow */
>>>>>>>>> +    bool actions_offloaded;      /* true if flow is fully offloaded */
>>>>>>>>>      
>>>>>>>>>          /* Statistics. */
>>>>>>>>>          struct dp_netdev_flow_stats stats;
>>>>>>>>> @@ -2410,6 +2411,7 @@ dp_netdev_flow_offload_put(struct dp_flow_offload_item *offload)
>>>>>>>>>              }
>>>>>>>>>          }
>>>>>>>>>          info.flow_mark = mark;
>>>>>>>>> +    info.actions_offloaded = &flow->actions_offloaded;
>>>>>>>>>      
>>>>>>>>>          ovs_mutex_lock(&pmd->dp->port_mutex);
>>>>>>>>>          port = dp_netdev_lookup_port(pmd->dp, in_port);
>>>>>>>>> @@ -3073,8 +3075,8 @@ dp_netdev_flow_to_dpif_flow(const struct dp_netdev_flow *netdev_flow,
>>>>>>>>>          flow->pmd_id = netdev_flow->pmd_id;
>>>>>>>>>          get_dpif_flow_stats(netdev_flow, &flow->stats);
>>>>>>>>>      
>>>>>>>>> -    flow->attrs.offloaded = false;
>>>>>>>>> -    flow->attrs.dp_layer = "ovs";
>>>>>>>>> +    flow->attrs.offloaded = netdev_flow->actions_offloaded;
>>>>>>>>> +    flow->attrs.dp_layer = flow->attrs.offloaded ? "in_hw" : "ovs";
>>>>>>>>>      }
>>>>>>>>>      
>>>>>>>>>      static int
>>>>>>>>> @@ -3244,6 +3246,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>>>>>>>>>          flow->dead = false;
>>>>>>>>>          flow->batch = NULL;
>>>>>>>>>          flow->mark = INVALID_FLOW_MARK;
>>>>>>>>> +    flow->actions_offloaded = false;
>>>>>>>>>          *CONST_CAST(unsigned *, &flow->pmd_id) = pmd->core_id;
>>>>>>>>>          *CONST_CAST(struct flow *, &flow->flow) = match->flow;
>>>>>>>>>          *CONST_CAST(ovs_u128 *, &flow->ufid) = *ufid;
>>>>>>>>> @@ -3598,6 +3601,37 @@ dpif_netdev_flow_dump_thread_destroy(struct dpif_flow_dump_thread *thread_)
>>>>>>>>>          free(thread);
>>>>>>>>>      }
>>>>>>>>>      
>>>>>>>>> +static int
>>>>>>>>> +dpif_netdev_offload_used(struct dp_netdev_flow *netdev_flow,
>>>>>>>>> +                         struct dp_netdev_pmd_thread *pmd)
>>>>>>>>> +{
>>>>>>>>> +    struct netdev_flow_dump netdev_dump;
>>>>>>>>> +    struct dpif_flow_stats stats;
>>>>>>>>> +    ovs_u128 ufid;
>>>>>>>>> +    bool has_next;
>>>>>>>>> +
>>>>>>>>> +    netdev_dump.port = netdev_flow->flow.in_port.odp_port;
>>>>>>>>> +    netdev_dump.netdev = netdev_ports_get(netdev_dump.port, pmd->dp->class);
>>>>>>>>> +    if (!netdev_dump.netdev) {
>>>>>>>>> +        return -1;
>>>>>>>>> +    }
>>>>>>>>> +    /* get offloaded stats */
>>>>>>>>> +    ufid = netdev_flow->mega_ufid;
>>>>>>>>> +    has_next = netdev_flow_dump_next(&netdev_dump, NULL, NULL, &stats, NULL,
>>>>>>>>> +                                     &ufid, NULL, NULL);
>>>>>>>> This will not work with linux TC flow API.
>>>>>>>> You should not create 'netdev_flow_dump' by yourself, it must be created
>>>>>>>> by the netdev_ports_flow_dump_create().
>>>>>>> It is different in linux TC.
>>>>>>>
>>>>>>> In kernel, dump request is done in dump_create API, for each device
>>>>>>> (using netdev_ports_flow_dump_create).
>>>>>>>
>>>>>>> In user space we don't query the driver at all but the flows are looped
>>>>>>> by their PMD. We don't need any specific netdev action.
>>>>>>>
>>>>>>> In kernel, dump next handle a chunk of replies from the kernel, without
>>>>>>> any known order or properties - neither by device nor by any other
>>>>>>> property, and the next handles the multi-threaded aspect of the dump.
>>>>>>>
>>>>>>> In user space the flows are known by the PMD looping, and the
>>>>>>> multi-threaded scheme is handled by the PMD thread, so this is also
>>>>>>> different.
>>>>>>>
>>>>>>> In kernel, the dump_next gets the flows from the kernel, and deduct the
>>>>>>> UFID for each one. In user space we get the UFID as known, and deduct
>>>>>>> the rte_flow, also different.
>>>>>> I don't see the limitation that will not allow you to implement the
>>>>>> same thing for netdev-offload-dpdk.  It knows all the flows and ufids
>>>>>> and you just need to iterate over them.  You may also dump flows
>>>>>> directly from the HW because you can request them.
>>>>> Correct me if I'm wrong, but DPDK doesn't provide an API to "request" HW
>>>>> flows.
>>>> At least, you could check if your flow is still in HW with rte_flow_query().
>>> This is exactly what is done.
>>>> Hopefully, functionality of rte_flow_query() will be expanded to dump flows
>>>> themselves or some new API will appear.
>>>>
>>>>>> There is almost same thing implemented for datapath flow dumps.  Just
>>>>>> look at the API:
>>>>>>         dpif_netdev_flow_dump_create,
>>>>>>         dpif_netdev_flow_dump_destroy,
>>>>>>         dpif_netdev_flow_dump_thread_create,
>>>>>>         dpif_netdev_flow_dump_thread_destroy,
>>>>>>         dpif_netdev_flow_dump_next
>>>>>>
>>>>>> And dpif-netdev fully implements it regardless of fact that this could
>>>>>> be done much easier in userspace datapath.
>>>>> Yes, those functions are implemented, but the scheme of querying the
>>>>> flows is different, as in my comments above.
>>>> dpif-netdev follows the dpif-provider API. It doesn't matter how exactly
>>>> it implements this API.  The key point is that caller doesn't need to
>>>> know which dpif it uses.
>>>>
>>>> The same with netdev-offload API.  Linux TC offload-provider implements
>>>> the API and DPDK offload-provider must implement the same API.
>>>> In this case datapath will not need to know from which offload-provider
>>>> it dumps these flows.
>>>>
>>>> netdev-offload-dpdk could has all the information it needs to implement
>>>> the API properly.  If needed you may additionally store all required data
>>>> about flows.
>>>>
>>>>> So, I will implement flow_get API instead, and fill only the stats,
>>>>> queried from the HW.
>>>> That doesn't make any sense.
>>> In dpif_netdev_flow_dump_next the flow and its properties are known from
>>> pmd->flow_table. The only missing info is the statistics, in case the
>>> flow is offloaded. I want to call netdev_flow_get and implement it in
>>> netdev-offload-dpdk to query the statistics, to complete the information
>>> about the flow.
>>>
>>> Could you please elaborate why doesn't it make sense?
>> Will you provide implementation of netdev_flow_get() for netdev-offload-tc?
> It's already implemented for TC.

Oh, sorry, I misread your last suggestion.  This one might make sense, but I
still have a few concerns.  Will reply to the implementation in v3.

>>
>>
>>>>>>> The code in this commit is not relevant for linux TC, not breaking it,
>>>>>>> so I don't see "this will will not work with linux TC" issue.
>>>>>> This code should work if you'll add netdev-linux port to userspace
>>>>>> datapath and linux TC offloading API will be initialized for it.
>>>>>> Current implementation will not work.
>>>>> If so, it is currently broken as is. The current dpif-netdev.c code
>>>>> doesn't read any HW statistics before my patch.
>>>> Right now it's not implemented.  With your implementation OVS will
>>>> crash on attempt to dump linux TC flows from the userspace datapath.
>>> Ack
>>>


More information about the dev mailing list