[ovs-dev] [PATCH v12 00/11] Add offload support for sFlow

Ilya Maximets i.maximets at ovn.org
Tue Mar 23 14:24:34 UTC 2021


On 3/5/21 4:27 AM, Chris Mi wrote:
> Hi Ilya,
> 
> I think about your suggestion recently. But I'm still not very clear about the design.
> Please see my reply below:
> 
> On 3/1/2021 8:48 PM, Ilya Maximets wrote:
>> On 3/1/21 9:30 AM, Chris Mi wrote:
>>> Hi Simon, Ilya,
>>>
>>> Could I know what should we do to make progress for this patch set?
>>> It has been posted in the community for a long time 😁
>> In general, the way to get your patches reviewed is to review other
>> patches.  It's simply because we still have a huge review backlog
>> (214 patches right now in patchwork and most of them needs review)
>> and bugfixes usually has a bit higher priority.  By reviewing other
>> patches you're reducing amount of work for maintainers so they can
>> get to your patches faster.
> OK, I see.
>>
>> For the series and comments from Eelco:
>> I didn't read the patches carefully, only a quick glance, but I still
>> do not understand why we need a separate thread to poll psample events.
>> Why can't we just allow usual handler threads to do that?
> I'm not sure if you are aware of that the psample netlink is different from the ovs
> netlink. Without offload, kernel sends missed packet and sFlow packet to userspace
> using the same netlink 'ovs_packet_family'. So they can use the same handler thread.
> But in order to offload sFlow action, we use psample kernel module to send sampled
> packets from kernel to userspace. The format for ovs netlink message and psample
> netlink messages are different.

Hi.  Sorry for late reply.

Yes, I understand that message format is different, but it doesn't really
matter.  All the message-specific handling and parsing should happen
inside the netdev_offload_tc_recv().  This function should have a prototype
similar to dpif_netlink_recv(), i.e. it should receive a pointer to the
struct dpif_upcall from the caller and fill it with data.  Maybe other
type of a generic data structure if it's really not possible to construct
struct dpif_upcall.


>>    From the
>> architecture perspective it's not a good thing to call ofproto code
>> from the offload-provider.  This introduces lots of complications
>> and might cause lots of issues in the future.
>>
>> I'd say that design should look like this:
>>
>>    handler thread ->
>>      dpif_recv() ->
>>        dpif_netlink_recv() ->
>>          netdev_offload_recv() ->
>>            netdev_offload_tc_recv() ->
>>              nl_sock_recv()
> In order to use the handler thread, I'm not sure if we should add psample socket
> fd to every handler's epoll_fd. If we should do that, we should consider how to
> differentiate if the event comes from ovs or psample netlink. Maybe we should
> allocate a special port number for psample and assign it to event.data.u32.
> Anyway, that's the details. If this is the right direction, I'll think about it.
>>
>> Last three calls should be implemented.
>>
>> The better version of this will be to throw away dpif part from
>> above call chain and make it:
>>
>>    handler thread ->
>>      netdev_offload_recv() ->
>>        netdev_offload_tc_recv() ->
>>          nl_sock_recv()
> If we throw away dpif part, maybe we have to write some duplicate epoll code
> for psample only. And we can't block in nl_sock_recv(). Maybe we have to add
> psample socket fd to every handler's epoll_fd. So we have to change the dpif
> somehow.

There is no need to implement any epoll_fd logic for psample.
You may only use handler with handler_id == 0.  Only this handler will receive
psample upcalls.  netdev_offload_recv_wait() should be implemented similar
to how dpif_netlink_recv_wait() implemented for windows case, i.e. it will
call nl_sock_wait(nl_sock, POLLIN) for the psample netlink socket.
There is no need to block and you're never blocking anywhere because your're
calling nl_sock_recv(..., wait = false) which will use MSG_DONTWAIT while
actually receiving a message.

So, there should be several call chains:

1. Init.

   open_dpif_backer/type_run() ->
     netdev_offload_recv_set() ->
       netdev_offload_tc_recv_set(enabled) ->
         if (enable)
           psample_sock = nl_sock_create(...);
         else
           close(psample_sock);

2. Wait.

   udpif_upcall_handler() ->
     netdev_offload_recv_wait() ->
       netdev_offload_tc_recv_wait() ->
         if (handler_id == 0)
           nl_sock_wait(psample_sock, POLLIN);

3. Receive.

   udpif_upcall_handler() ->
     recv_upcalls() ->
       netdev_offload_recv() ->
       | netdev_offload_tc_recv(..., struct dpif_upcall *upcall) ->
       |   if (handler_id == 0)
       |     nl_sock_recv(psample_socket, ..., wait = false);
       |     *upcall = <received data>;
       upcall_receive()
       process_upcall() ->
         dpif_sflow_received()

4. Deinit.

   close_dpif_backer() ->
     netdev_offload_recv_set(enable = false);


Does that look more clear?

> 
> I don't know if I understand your suggestion correctly. Or I missed anything
> So if you have time, could you please elaborate?
> 
> Thanks,
> Chris
>>
>> This way we could avoid touching dpif-netdev and still have psample
>> offloading for the case where netdev-offload-tc is used from the
>> userspace datapath.
>>
>> Above architecture also implies implementation of:
>>   - netdev_offload_recv_wait()
>>   - netdev_offload_recv_purge()
>>   - and the netdev_offload_tc_* counterparts.
>>
>> Best regards, Ilya Maximets.
>>
>>> Thanks,
>>> Chris
>>>
>>> On 2/23/2021 5:08 PM, Roi Dayan wrote:
>>>>
>>>> On 2021-01-27 8:23 AM, Chris Mi wrote:
>>>>> This patch set adds offload support for sFlow.
>>>>>
>>>>> Psample is a genetlink channel for packet sampling. TC action act_sample
>>>>> uses psample to send sampled packets to userspace.
>>>>>
>>>>> When offloading sample action to TC, userspace creates a unique ID to
>>>>> map sFlow action and tunnel info and passes this ID to kernel instead
>>>>> of the sFlow info. psample will send this ID and sampled packet to
>>>>> userspace. Using the ID, userspace can recover the sFlow info and send
>>>>> sampled packet to the right sFlow monitoring host.
>>>>>
>>>>> v2-v1:
>>>>> - Fix robot errors.
>>>>> v3-v2:
>>>>> - Remove Gerrit Change-Id.
>>>>> - Add patch #9 to fix older kernels build issue.
>>>>> - Add travis test result.
>>>>> v4-v3:
>>>>> - Fix offload issue when sampling rate is 1.
>>>>> v5-v4:
>>>>> - Move polling thread from ofproto to netdev-offload-tc.
>>>>> v6-v5:
>>>>> - Rebase.
>>>>> - Add GitHub Actions test result.
>>>>> v7-v6:
>>>>> - Remove Gerrit Change-Id.
>>>>> - Fix "ERROR: Inappropriate spacing around cast"
>>>>> v8-v7
>>>>> - Address Eelco Chaudron's comment for patch #11.
>>>>> v9-v8
>>>>> - Remove sflow_len from struct dpif_sflow_attr.
>>>>> - Log a debug message for other userspace actions.
>>>>> v10-v9
>>>>> - Address Eelco Chaudron's comments on v9.
>>>>> v11-v10
>>>>> - Fix a bracing error.
>>>>> v12-v11
>>>>> - Add duplicate sample group id check.
>>>>>
>>>>> Chris Mi (11):
>>>>>     compat: Add psample and tc sample action defines for older kernels
>>>>>     ovs-kmod-ctl: Load kernel module psample
>>>>>     dpif: Introduce register sFlow upcall callback API
>>>>>     ofproto: Add upcall callback to process sFlow packet
>>>>>     netdev-offload: Introduce register sFlow upcall callback API
>>>>>     netdev-offload-tc: Implement register sFlow upcall callback API
>>>>>     dpif-netlink: Implement register sFlow upcall callback API
>>>>>     netdev-offload-tc: Introduce group ID management API
>>>>>     netdev-offload-tc: Create psample netlink socket
>>>>>     netdev-offload-tc: Add psample receive handler
>>>>>     netdev-offload-tc: Add offload support for sFlow
>>>>>
>>>>>    include/linux/automake.mk        |   4 +-
>>>>>    include/linux/psample.h          |  58 +++
>>>>>    include/linux/tc_act/tc_sample.h |  25 ++
>>>>>    lib/dpif-netdev.c                |   1 +
>>>>>    lib/dpif-netlink.c               |  27 ++
>>>>>    lib/dpif-netlink.h               |   4 +
>>>>>    lib/dpif-provider.h              |  10 +
>>>>>    lib/dpif.c                       |   8 +
>>>>>    lib/dpif.h                       |  23 ++
>>>>>    lib/netdev-offload-provider.h    |   3 +
>>>>>    lib/netdev-offload-tc.c          | 659 ++++++++++++++++++++++++++++++-
>>>>>    lib/netdev-offload.c             |  30 ++
>>>>>    lib/netdev-offload.h             |   4 +
>>>>>    lib/tc.c                         |  61 ++-
>>>>>    lib/tc.h                         |  16 +-
>>>>>    ofproto/ofproto-dpif-upcall.c    |  42 ++
>>>>>    utilities/ovs-kmod-ctl.in        |  14 +
>>>>>    17 files changed, 973 insertions(+), 16 deletions(-)
>>>>>    create mode 100644 include/linux/psample.h
>>>>>    create mode 100644 include/linux/tc_act/tc_sample.h
>>>>>
>>>>
>>>> Hi Simon, Ilya,
>>>>
>>>> Can you help review for this series?
>>>> do you have any comments you want us to handle?
>>>>
>>>> Thanks,
>>>> Roi
> 



More information about the dev mailing list