[ovs-dev] [RFC v2 PATCH 4/4] bpf: Add reference XDP program implementation for netdev-offload-xdp

William Tu u9012063 at gmail.com
Tue May 5 15:18:01 UTC 2020


On Tue, May 05, 2020 at 11:43:58AM +0900, Toshiaki Makita wrote:
> On 2020/05/04 1:22, William Tu wrote:
> >On Tue, Apr 21, 2020 at 11:47:04PM +0900, Toshiaki Makita wrote:
> >>This adds a reference program, flowtable_afxdp.o, which can be used to
> >>offload flows to XDP through netdev-offload-xdp.
> >>The program will be compiled when using --enable-bpf switch.
> >>
> >>Signed-off-by: Toshiaki Makita <toshiaki.makita1 at gmail.com>
> >
> >Hi Toshiaki,
> >
> >Thanks for your patch, I haven't tried to run it but I have
> >questions about miniflow.
> >
> >>---
> ...
> >>+SEC("xdp") int
> >>+flowtable_afxdp(struct xdp_md *ctx)
> >>+{
> >>+    struct xdp_miniflow *pkt_mf;
> >>+    struct xdp_subtable_mask *subtable_mask;
> >>+    int *head;
> >>+    struct xdp_flow_actions *xdp_actions = NULL;
> >>+    struct nlattr *a;
> >>+    unsigned int left;
> >>+    int cnt, idx, zero = 0;
> >>+
> >>+    account_debug(0);
> >>+
> >>+    head = bpf_map_lookup_elem(&subtbl_masks_hd, &zero);
> >>+    if (!head) {
> >>+        return XDP_ABORTED;
> >>+    }
> >>+    if (*head == XDP_SUBTABLES_TAIL) {
> >>+        /* Offload not enabled */
> >>+        goto upcall;
> >>+    }
> >>+
> >>+    /* Get temporary storage for storing packet miniflow */
> >>+    pkt_mf = bpf_map_lookup_elem(&pkt_mf_tbl, &zero);
> >>+    if (!pkt_mf) {
> >>+        return XDP_ABORTED;
> >>+    }
> >>+
> >
> >I start to wonder what's the benefit of using miniflow in XDP?
> >miniflow tries to compress the large flow key into smaller memory,
> >and with a flowmap indicating which offset in bits are valid.
> >And when adding a new subtable at flow_put, the subtable's key
> >size can be smaller with only the needed fields.
> >
> >But in the case of XDP/eBPF, we have to statically allocated fixed
> >key size (the subtbl_template has key as struct xdp_flow_key), so
> >each subtable is always having key with full key fields. (not saving
> >memory space) Moreover with miniflow, it makes the 'mask_key' function
> >below pretty complicated.
> 
> Fixed sized subtable is restriction of map-in-map.
> The benefit to use miniflow I envisioned initially was
> - compress the key
> - use existing function to convert flows for XDP
> 
> The first one is actually not doable due to map-in-map. I hope someday the
> restriction gets loosen...

On my second thought this might be a good idea.

One problem I hit in my previous prototype is that the flow key is too big.
ex: struct sw_flow_key in ovs kernel is more than 500B, and moving it to bpf
some BPF limitation. If we continue adding more fields to struct xdp_flow,
ex: ipv6, vxlan tunnel, then with miniflow extract, the actually memory usage
in key is smaller. The key size in subtbl_template can be a value smaller the
the size of struct xdp_flow.

But when adding more fields, I'm not sure whether we will hit some limitation
of BPF at xdp_miniflow_extract.
So miniflow can save key size but complicate the key extraction.
Without miniflow, key size is large but key extract and match are simpler.
What do you think?

> 
> The second one is doable and it's good for userspace offload driver.
> But I ended up with rewriting most helper functions of miniflow for BPF due
> to some restriction.
> 
> >What if we don't compress the flow key?
> >unlike to OVS userspace, more like the kernel's implementation of OVS
> >megaflow (ex: masked_flow_lookup in net/openvswitch/flow_table.c)
> 
> Without miniflow the BPF program will be simple like the original xdp_flow
> patch for Linux kernel.
> I can try it.
> 
> >Where it has a list of masks (like your 'subtbl_masks' map) but
> >only use 1 hash table (so we don't need map-in-map).
> >We lookup the hash table multiple times, each
> >time applied different masks in the mask list until finding a match.
> 
> For HW offload one big hash table may help. Fow software it will be slower
> as key size and entries are larger.
> Maybe we can support both types, one big hash table and multiple subtables,
> in the future?
> 

For one big hash table, do you mean adding another cache before megaflow
like EMC (exact match cache)?

William
> As I don't have BPF offloadable NICs I prefer to keep current multiple
> subtables for now.
> 
> Toshiaki Makita
> 
> >
> >However, I'm not sure whether this will hit some limitations of verifier.
> >
> >Please correct me if I understand wrong here.
> >Thanks
> >William


More information about the dev mailing list