[ovs-dev] [PATCH] RFC: netdev-afxdp: Support for XDP metadata HW hints.

Mon Mar 8 14:00:40 UTC 2021

On Fri, 5 Mar 2021 12:07:28 -0800
William Tu <u9012063 at gmail.com> wrote:

> Thanks Björn and Jesper for feedbacks.
> 
> On Fri, Mar 5, 2021 at 7:34 AM Björn Töpel <bjorn.topel at intel.com> wrote:
> >
> > On 2021-03-05 16:07, Jesper Dangaard Brouer wrote:  
> > > On Fri, 5 Mar 2021 14:56:02 +0100
> > > Björn Töpel <bjorn.topel at intel.com> wrote:
> > >  
> > >> On 2021-03-05 11:13, Jesper Dangaard Brouer wrote:  
> >
> >
> > [...]  
> > >>
> > >> You'd like to use the length of metadata to express if it's available or
> > >> not. I'm stating that we don't need that for AF_XDP. In XDP if we're
> > >> accessing the field, we need to validate access due to the verifier. But
> > >> it's not to know if it's actually there. When we attach a certain XDP to
> > >> a device, we can query what offload it supports, and the the application
> > >> knows that meta data is structured in a certain way.
> > >>
> > >> Some application has simple metadata:
> > >> struct tstamp { u64 rx; };
> > >>
> > >> That application will unconditionally look at (struct tstamp
> > >> *)(pkt-sizeof(struct tstamp)->rx;.  
> > >
> > > I imagine that NIC hardware will/can provide packets with different
> > > metadata.  Your example with timestamps are a good example, as some
> > > drivers/HW only support timestamping PTP packets.  Thus, I don't want
> > > to provide metadata info on tstamp if HW didn't provide it (and I also
> > > don't want to spend CPU-cycles on clearing the u64 field with zero).
> > > Another example is vlan IDs is sometimes provided by hardware.
> > >
> > > Thus, on a per packet basis the metadata can change.
> > >  
> Is this future work?

Yes. I will propose this, and I will also be working on the code myself.
(p.s. I'll be on vacation rest of this week).

It is great that we have AF_XDP+OVS as one use-case/consumer.

My use-case is CPUMAP and veth that gets redirected an xdp_frame, and
creates SKBs based this info.  I will code this on-top of Saeeds
patchset.  With xdp_frame I could also store the btf_id in the top of
the frame, but that would not make it avail to AF_XDP frames.

> Looking at Saeed's patch, I thought we query the netdev through
> BTF netlink API, and we get a BTF ID and C-struct per-netdev.
> And this will be done when OVS control plane, when a user attaches
> a device to its bridge, we query its metadata structure.

Correct.  Remember that Saeed's patches[1] have not been accepted
upstream.  Thus, those[1] are also a proposal (but a really good
starting point).

Even if a netdev only have one current BTF-metadata-hint, then we want
the ability to (runtime) change the BTF-metadata-hint to something
else.  That is the hole point in making the these
metadata-HW-offload-hints more flexible.
 Thus, in OVS you will need some way to detect, when the NIC change the
BTF-metadata-hint to contain new info. My *proposal* is to include the
btf_id as part of the struct. (Feel free to slap me around, I'm sure
Alexei will on this idea).

[1] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/log/?h=topic/xdp_metadata4

> > Yes, I agree. For a certain set of packets, there can be different
> > metadata per packet that, as you pointed out, can be dispatched do
> > different handlers.
> >  
> > >> Some might be more flexible and have:
> > >> struct metadata { char storage[64]; int btf_id; };  
> > >
> > > I really like the idea of placing the btf_id as the last entry, that
> > > way we know its location.  I can get the btf_id via packet minus-offset
> > > sizeof(int).
> > >  
> [...]
> 
> > >
> > > BTF sort-of make the struct names identifiers and API.  And gives *BPF*
> > > programs the ability to adjust offsets, when loading the BPF-prog into
> > > the kernel.
> > >
> > > Driver-1 choose:
> > >
> > >   struct xdp_md_desc {
> > >       uint32_t flow_mark;
> > >       uint32_t hash32;
> > >   } __attribute__((preserve_access_index));
> > >
> > > Driver-2 choose:
> > >
> > >   struct xdp_md_desc {
> > >       uint32_t hash32;
> > >       uint32_t flow_mark;
> > >   } __attribute__((preserve_access_index));
> > >
> > > The BPF code can access the members, and kernel will remap-struct
> > > access, and internally in the kernel create two different BPF-programs
> > > with adjusted offsets in the byte-code.
> > >
> > > How do we handle this in the OVS userspace C-code?
> > >  
> 
> BTW, do we assume all vendors use the same field name
> for same purpose? For example:
> When OVS query Intel or mlx, if rxhash is available, they should
> use the same name as "hash32".
> Otherwise, I don't know how to associate it into OVS code logic.

Yes, AFAIK the struct member field names, defined as BTF, will become
like-API and will have meaning.  The offset will be dynamic, and the
size/type could be validated via BTF-info.

> If that's the case, I can do (in userspace C code, not BPF program)
> 1) When attaching a device, query its MD struct and MD's size
>     using BTF info.
> 2) From the BTF info returned, check if string "hash32" exists.
>     if yes, then calculate its offset from the beginning of the MD.
> 3) OVS cat set it by calling "dp_packet_set_rss(packet, md[rxhash_offset]);"

That sounds good.

The 'rxhash_offset' can be dynamically updated if the btf_id change.

> >
> > Longer-term, I think extending the linkers to support struct member
> > relocations would be a good thing! I.e. simply support
> > __attribute__((preserve_access_index)) in userland. That would enable
> > very powerful userland/kernel interface, like what we're discussing now.
> >
> > But I wonder if it's too big of an initial undertaking to do that as a
> > first step? Maybe non-relocatable struct could be a start.
> >  
> Having relocatable struct would be great. So I can just use 'md->hash32',
> instead of offset like above.
> 
> > I definitely agree that the goal should be relocatable structs!
> >  
> > > Maybe/hopefully you have a better idea than mine, which is simply to
> > > compile two C-programs (or program for-each know BTF-id) and dispatch
> > > based on the BTF-id.
> > >  
> 
> Thanks
> William

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer