[ovs-dev] [PATCH v3 3/4] netdev-offload: Add xdp flow api provider

Toshiaki Makita toshiaki.makita1 at gmail.com
Sun Jul 26 16:55:11 UTC 2020


William,

On 2020/07/22 17:46, Toshiaki Makita wrote:
> On 2020/07/22 3:10, William Tu wrote:
>> Thanks for the patch. My comments below:
>>
>> On Mon, Jun 29, 2020 at 8:30 AM Toshiaki Makita
>> <toshiaki.makita1 at gmail.com> wrote:
...
>>> diff --git a/lib/automake.mk b/lib/automake.mk
>>> index 86940ccd2..1fa1371f3 100644
>>> --- a/lib/automake.mk
>>> +++ b/lib/automake.mk
>>> @@ -421,10 +421,14 @@ endif
>>>
>>>   if HAVE_AF_XDP
>>>   lib_libopenvswitch_la_SOURCES += \
>>> +       lib/bpf-util.c \
>>> +       lib/bpf-util.h \
>>>          lib/netdev-afxdp-pool.c \
>>>          lib/netdev-afxdp-pool.h \
>>>          lib/netdev-afxdp.c \
>>> -       lib/netdev-afxdp.h
>>> +       lib/netdev-afxdp.h \
>>> +       lib/netdev-offload-xdp.c \
>>> +       lib/netdev-offload-xdp.h
>>>   endif
>>
>> How about doing s.t like:
>> --enable-afxdp: the current one on master without the xdp offload program
>> --enable-afxdp-with-bpf: the afxdp one plus your xdp offload program
>>
>> So that when users only --enable-afxdp, it doesn't need to compile in the
>> lib/netdev-offload-xdp.* and bpf/*
> 
> Let me clarify it.
> 
> Currently,
> 
> --enable-afxdp: build lib/netdev-afxdp* and lib/netdev-offload-xdp*
> --enable-bpf: build bpf/*
> 
> Do you sugguest this?
> 
> --enable-afxdp: build lib/netdev-afxdp*
> --enable-afxdp-with-bpf: build lib/netdev-afxdp*, lib/netdev-offload-xdp*, and bpf/*

Maybe we should not introduce this kind of overlapping?
(enable-afxdp-with-bpf should not include enable-afxdp)

How about this?

--enable-afxdp: build lib/netdev-afxdp*
--enable-xdp-offload: build lib/netdev-offload-xdp*
--enable-bpf: build bpf/*

I thought the following is also possible:

--enable-afxdp: build lib/netdev-afxdp*
--enable-afxdp=offload: build lib/netdev-afxdp* and lib/netdev-offload-xdp*
--enable-bpf: build bpf/*

But theoretically xdp offload can be enabled without afxdp in the future,
so I guess a different build switch, enable-xdp-offload, is better.

>>> +static int
>>> +xdp_preload(struct netdev *netdev, struct bpf_object *obj)
...
>>> +    if (ovsthread_once_start(&output_map_once)) {
>>> +        /* XXX: close output_map_fd somewhere? */
>> maybe at uninit_flow_api, we have to check whether there is
>> still a netdev using linux_xdp offload. And if not, close this map?
> 
> Then we need to cancel the ovsthread_once at that time?
> I could not find such an api.
> 
> I feel like we should close the map when shutting down the ovs-vswitchd, but not so 
> important
> as the process exits anyway.

I'll just drop this comment. This should not be necessary.

>>> +static int
>>> +netdev_xdp_flow_put(struct netdev *netdev, struct match *match_,
...
>>> +        memcpy(pentry, entry, netdev_info->subtable_mask_size);
>>> +        pidx = idx;
>>> +        idx = entry->next;
>> question about this for loop:
>> why do we need to maintain ->next for the struct
>> xdp_subtable_mask_header *entry?
>> I thought we have a fixed-sized array of subtable.
>> And all we need to do is go over all the valid index of the array,
>> until finding a match.
> 
> I think you are right.
> This code is ported from xdp_flow, where order is necessary.
> I guess ovs-vswitchd does not care the order of each masks.
> Will double check it.

After some more consideration, I'm thinking using ->next to maintain a list is better.

As you say, indeed we can maintain an array to hold masks list.
But in that case we need to have a feature to invalidate an entry if we don't 
replace the entire array on deletion of an entry.
Also to avoid too sparse array, we probably should implement deletion by swapping 
the entry to be deleted and the last entry. Then, we need to invalidate an entry 
temporarily, since we don't want to access the entry while it is being updated.

Think about something like the following entry to do invalidation:

	struct table_entry {
		struct entry_value value;
		uint8_t valid;
	}

We should update ->valid field and ->value field separately in order not to let CPU 
reorder memory read (bpf does not provide memory barriers).

in vswitchd:

	entry->valid = false;
	bpf_map_update_elem(map, idx, entry);
	entry->value.x = x;
	entry->value.y = y;
	...
	bpf_map_update_elem(map, idx, entry);
	entry->valid = true;
	bpf_map_update_elem(map, idx, entry);

looks redundant?

Thanks,
Toshiaki Makita


More information about the dev mailing list