[ovs-dev] [RFC v2 PATCH 4/4] bpf: Add reference XDP program implementation for netdev-offload-xdp

William Tu u9012063 at gmail.com
Wed May 13 15:31:09 UTC 2020


On Mon, May 11, 2020 at 7:53 AM Toshiaki Makita
<toshiaki.makita1 at gmail.com> wrote:
>
> On 2020/05/09 0:05, William Tu wrote:
> > On Thu, May 7, 2020 at 7:56 AM Toshiaki Makita
> > <toshiaki.makita1 at gmail.com> wrote:
> >>
> >> On 2020/05/06 0:18, William Tu wrote:
> >>> On Tue, May 05, 2020 at 11:43:58AM +0900, Toshiaki Makita wrote:
> >>>> On 2020/05/04 1:22, William Tu wrote:
> >>>>> On Tue, Apr 21, 2020 at 11:47:04PM +0900, Toshiaki Makita wrote:
> >>>>>> This adds a reference program, flowtable_afxdp.o, which can be used to
> >>>>>> offload flows to XDP through netdev-offload-xdp.
> >>>>>> The program will be compiled when using --enable-bpf switch.
> >>>>>>
> >>>>>> Signed-off-by: Toshiaki Makita <toshiaki.makita1 at gmail.com>
> >>>>>
> >>>>> Hi Toshiaki,
> >>>>>
> >>>>> Thanks for your patch, I haven't tried to run it but I have
> >>>>> questions about miniflow.
> >>>>>
> >>>>>> ---
> >>>> ...
> >>>>>> +SEC("xdp") int
> >>>>>> +flowtable_afxdp(struct xdp_md *ctx)
> >>>>>> +{
> >>>>>> +    struct xdp_miniflow *pkt_mf;
> >>>>>> +    struct xdp_subtable_mask *subtable_mask;
> >>>>>> +    int *head;
> >>>>>> +    struct xdp_flow_actions *xdp_actions = NULL;
> >>>>>> +    struct nlattr *a;
> >>>>>> +    unsigned int left;
> >>>>>> +    int cnt, idx, zero = 0;
> >>>>>> +
> >>>>>> +    account_debug(0);
> >>>>>> +
> >>>>>> +    head = bpf_map_lookup_elem(&subtbl_masks_hd, &zero);
> >>>>>> +    if (!head) {
> >>>>>> +        return XDP_ABORTED;
> >>>>>> +    }
> >>>>>> +    if (*head == XDP_SUBTABLES_TAIL) {
> >>>>>> +        /* Offload not enabled */
> >>>>>> +        goto upcall;
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    /* Get temporary storage for storing packet miniflow */
> >>>>>> +    pkt_mf = bpf_map_lookup_elem(&pkt_mf_tbl, &zero);
> >>>>>> +    if (!pkt_mf) {
> >>>>>> +        return XDP_ABORTED;
> >>>>>> +    }
> >>>>>> +
> >>>>>
> >>>>> I start to wonder what's the benefit of using miniflow in XDP?
> >>>>> miniflow tries to compress the large flow key into smaller memory,
> >>>>> and with a flowmap indicating which offset in bits are valid.
> >>>>> And when adding a new subtable at flow_put, the subtable's key
> >>>>> size can be smaller with only the needed fields.
> >>>>>
> >>>>> But in the case of XDP/eBPF, we have to statically allocated fixed
> >>>>> key size (the subtbl_template has key as struct xdp_flow_key), so
> >>>>> each subtable is always having key with full key fields. (not saving
> >>>>> memory space) Moreover with miniflow, it makes the 'mask_key' function
> >>>>> below pretty complicated.
> >>>>
> >>>> Fixed sized subtable is restriction of map-in-map.
> >>>> The benefit to use miniflow I envisioned initially was
> >>>> - compress the key
> >>>> - use existing function to convert flows for XDP
> >>>>
> >>>> The first one is actually not doable due to map-in-map. I hope someday the
> >>>> restriction gets loosen...
> >>>
> >>> On my second thought this might be a good idea.
> >>>
> >>> One problem I hit in my previous prototype is that the flow key is too big.
> >>> ex: struct sw_flow_key in ovs kernel is more than 500B, and moving it to bpf
> >>> some BPF limitation. If we continue adding more fields to struct xdp_flow,
> >>> ex: ipv6, vxlan tunnel, then with miniflow extract, the actually memory usage
> >>> in key is smaller. The key size in subtbl_template can be a value smaller the
> >>> the size of struct xdp_flow.
> >>
> >> Nice idea!
> >> So you mean we can have less sized subtbl_template key because when adding more and
> >> more keys, we probably don't use *all* of keys at the same time?
> >
> > Now I'm a little confused.
> > If you see 'struct flow' in ovs, all the fields have its own space
> > (ex: no union is used for
> > ipv4 and ipv6). So using miniflow extract saves space by only use
> > fields on the packet header.
> > But if you see 'struct sw_flow_key' in ovs kernel, same layer of
> > protocols are using union,
> > (ex: ipv4 and ipv6 are in a union). So no extra space is wasted.
> > So how we define the struct xdp_flow_key makes difference.
>
> Thanks, now I see what you mean.
> Actually I was thinking of further aggressive way to minimize subtables keys.
> As we don't necessarily need to support all of key combinations, we can cap
> the key size at a certain value.
> For example, one flow may need dl_* and ip_* and another flow may need
> tunneling keys etc, but requiring all of them, i.e. dl, ip, tp, ct, tun, ...
> and every other key, is unlikely. So just cap the key size at a reasonable value,
> and in case some flow requires more size, we can just return error like EOPNOTSUPP
> for such flow.
>
> What do you think?

Yes, I think that's a good idea.
>
> >
> > Or here is another idea.
> > We can also do on-demand parsing/key_extract here.
> > 1) keep a list of key_fields = {empty}
> > 2) when xdp_flow_put, we know which fields are needed to extract
> >      ex: key_fields = {src_mac, dst_mac}
> > 3) then at the xdp_miniflow_extract we can skip/return early after
> >      the L2 fields are parsed.
>
> So in this case we need to keep track of all of keys used in any subtable, right?
> I'm a bit concerned about complexity introduced by this mechanism.

right, that's might be too complicated for initial prototype.
>
> >>> But when adding more fields, I'm not sure whether we will hit some limitation
> >>> of BPF at xdp_miniflow_extract.
> >>> So miniflow can save key size but complicate the key extraction.
> >>> Without miniflow, key size is large but key extract and match are simpler.
> >>> What do you think?
> >>
> >> I cannot think of any limitation xdp_miniflow_extract will hit. Do you have insns
> >> limit in mind?
> > yes.
>
> As xdp_miniflow_extract does not handle loop, I'm not so concerned.

I see, thanks!
William


More information about the dev mailing list