[ovs-dev] [PATCH] RFC: netdev-afxdp: Support for XDP metadata HW hints.
Björn Töpel
bjorn.topel at intel.com
Fri Mar 5 15:34:51 UTC 2021
On 2021-03-05 16:07, Jesper Dangaard Brouer wrote:
> On Fri, 5 Mar 2021 14:56:02 +0100
> Björn Töpel <bjorn.topel at intel.com> wrote:
>
>> On 2021-03-05 11:13, Jesper Dangaard Brouer wrote:
[...]
>>> We also want this sizeof-offset to be dynamic.
>>> And I hope that AF_XDP could provide this info, on the size of
>>> data_meta area. I need some help from Magnus and Bjørn here?
>>>
>>> Look at the kernel-side code of AF_XDP/XSK I don't see this being
>>> transferred? Help is this true? Can we add the info?
>>>
>>
>> XDP sockets make sure that the meta data is propagated to userland.
>> However, the meta data *size* is not passed via the AF_XDP descriptors.
>>
>> This was per-design; The metadata is a contract between the XDP program
>> and the XDP socket, and in the future maybe the driver as well.
>>
>> So, we're simply stating that "your AF_XDP application should know if
>> there's meta data or not, and how it's structured".
>>
>> I guess we could explore starting to use the options bits of struct
>> xdp_desc for this, but only if we're *really* seeing a need for it.
>>
>> If the XDP program would like to pass it's length, it can do that via
>> the metadata area as well.
>>
>> struct meta {
>> int len;
>
> I'm not interested in having the len part of metadata. The btf_id
> will basically/indirectly give us the length.
>
Ok!
>> ...
>> ...
>> };
>>
>>
>>>
>>>
>>>> index = addr >> FRAME_SHIFT;
>>>> xpacket = &umem->xpool.array[index];
>>>> packet = &xpacket->packet;
>>>> @@ -868,6 +881,12 @@ netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
>>>> OVS_XDP_HEADROOM);
>>>> dp_packet_set_size(packet, len);
>>>>
>>>> + /* FIXME: This should be done by detecting whether
>>>> + * XDP MD is enabled or not. Ex:
>>>> + * $ bpftool net xdp set dev enp2s0f0np0 md_btf on
>>>> + */
>>>> + dp_packet_set_rss_hash(packet, md->hash32);
>>>
>>> The general idea to make this code more dynamic is to let AF_XDP
>>> provide the meta-data-len and then use the (proposed) btf_id to know
>>> what "callback" C-code to call.
>>>
>>
>> Hmm, again, here I might have a bit of a different view. Personally I
>> think it's better that all the "NIC/XDP/socket" does the negotiation in
>> the control/setup phase, and keep the fast-path lean. Instead of:
>>
>> recv(void *data)
>> {
>> len = get_meta_data_len(data);
>> if (len) {
>> // check this, check that, if-else-if-else
>> }
>> ...
>> }
>>
>> You'd like to use the length of metadata to express if it's available or
>> not. I'm stating that we don't need that for AF_XDP. In XDP if we're
>> accessing the field, we need to validate access due to the verifier. But
>> it's not to know if it's actually there. When we attach a certain XDP to
>> a device, we can query what offload it supports, and the the application
>> knows that meta data is structured in a certain way.
>>
>> Some application has simple metadata:
>> struct tstamp { u64 rx; };
>>
>> That application will unconditionally look at (struct tstamp
>> *)(pkt-sizeof(struct tstamp)->rx;.
>
> I imagine that NIC hardware will/can provide packets with different
> metadata. Your example with timestamps are a good example, as some
> drivers/HW only support timestamping PTP packets. Thus, I don't want
> to provide metadata info on tstamp if HW didn't provide it (and I also
> don't want to spend CPU-cycles on clearing the u64 field with zero).
> Another example is vlan IDs is sometimes provided by hardware.
>
> Thus, on a per packet basis the metadata can change.
>
Yes, I agree. For a certain set of packets, there can be different
metadata per packet that, as you pointed out, can be dispatched do
different handlers.
>> Some might be more flexible and have:
>> struct metadata { char storage[64]; int btf_id; };
>
> I really like the idea of placing the btf_id as the last entry, that
> way we know its location. I can get the btf_id via packet minus-offset
> sizeof(int).
>
>>
>> If you need an "availability check", then you can solve that.
>>
>> int xdp_prog(struct xdp_md *ctx)
>> {
>> struct metadata *m = (void *)(long)ctx->data_meta;
>> void *data = (void *)(unsigned long)ctx->data;
>>
>> if ((m + 1) > data) {
>> ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(*));
>> if (ret < 0)
>> return XDP_ABORTED;
>> data = (void *)(unsigned long)ctx->data;
>> m = (void *)(long)ctx->data_meta;
>> if ((m + 1) > data)
>> return XDP_ABORTED;
>> m->btf_id = -1; // not valid
>> }
>> // ... bpf_redirect_map() to socket
>> }
>>
>> Would you agree? Or maybe expand a bit why you need the length.
>
> With the btf_id as the last entry, I basically don't need the length.
>
Ok! Good! :-)
> The dispatcher function, will based on the btf_id know the size of the
> struct. The C-code will basically type-cast the struct at the pointer
> offset and get access to the metadata at the known offsets.
>
Make sense, and inline with my view!
>>
>>> Using a btf_id make this independent of the NIC driver.
>
> E.g. we don't need to dereference the NIC driver struct to find what
> metadata this driver have. Instead this is given directly by the
> btf_id, and two drivers can use the same btf_id.
>
Yeah!
>
>>> And we can
>>> tell the difference between two BPF structs that contain the same info
>>> but at different offsets. (I'll leave out the details, but I can
>>> explain more if there is interest).
>>>
>
> BTF sort-of make the struct names identifiers and API. And gives *BPF*
> programs the ability to adjust offsets, when loading the BPF-prog into
> the kernel.
>
> Driver-1 choose:
>
> struct xdp_md_desc {
> uint32_t flow_mark;
> uint32_t hash32;
> } __attribute__((preserve_access_index));
>
> Driver-2 choose:
>
> struct xdp_md_desc {
> uint32_t hash32;
> uint32_t flow_mark;
> } __attribute__((preserve_access_index));
>
> The BPF code can access the members, and kernel will remap-struct
> access, and internally in the kernel create two different BPF-programs
> with adjusted offsets in the byte-code.
>
> How do we handle this in the OVS userspace C-code?
>
Longer-term, I think extending the linkers to support struct member
relocations would be a good thing! I.e. simply support
__attribute__((preserve_access_index)) in userland. That would enable
very powerful userland/kernel interface, like what we're discussing now.
But I wonder if it's too big of an initial undertaking to do that as a
first step? Maybe non-relocatable struct could be a start.
I definitely agree that the goal should be relocatable structs!
> Maybe/hopefully you have a better idea than mine, which is simply to
> compile two C-programs (or program for-each know BTF-id) and dispatch
> based on the BTF-id.
>
Not really. :-( A solution w/o __attribute__((preserve_access_index))
would mean that the application would need per-nic structures, or a
run-time/recompile component.
I'm excited to see that William and you are experimenting in the space.
We need more "pasta on the wall" and see what sticks^Wwork.
>
>> Details, please! :-D Maybe I'm just not following your ideas!
>
> I hope the details above helped.
>
Yes. Thank you for the write-up!
Björn
>
>>> The btf_id also helps when NIC reconfigure the meta-data contents, and
>>> there are packets in-flight.
>
> This is another reason why C-code need to handle meta-data can change
> runtime, as I don't like to freeze config changes to NIC.
>
More information about the dev
mailing list