[ovs-dev] [PATCHv16 2/2] netdev-afxdp: add new netdev type for AF_XDP.

Ilya Maximets i.maximets at samsung.com
Thu Jul 18 10:21:22 UTC 2019


On 17.07.2019 21:52, William Tu wrote:
> On Wed, Jul 17, 2019 at 9:56 AM Ben Pfaff <blp at ovn.org> wrote:
>>
>> On Fri, Jul 12, 2019 at 04:50:56PM -0700, William Tu wrote:
>>> The patch introduces experimental AF_XDP support for OVS netdev.
>>> AF_XDP, the Address Family of the eXpress Data Path, is a new Linux socket
>>> type built upon the eBPF and XDP technology.  It is aims to have comparable
>>> performance to DPDK but cooperate better with existing kernel's networking
>>> stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
>>> attached to the netdev, by-passing a couple of Linux kernel's subsystems
>>> As a result, AF_XDP socket shows much better performance than AF_PACKET
>>> For more details about AF_XDP, please see linux kernel's
>>> Documentation/networking/af_xdp.rst. Note that by default, this feature is
>>> not compiled in.
>>>
>>> Signed-off-by: William Tu <u9012063 at gmail.com>
>>
>> Thanks a lot!  I have a few comments.
>>
>> In pmd_perf_stats, is last_tsc always accessed in a thread-local manner
>> or can it be read cross-thread?  If it can be read (or updated)
>> cross-thread, then it seems unsafe to me on 32-bit architectures, which
>> might reasonably read or write the two halves of it with separate
>> instructions.  If so, then I'd suggest using atomic_uint64_t instead of
>> plain uint64_t.
> 
> I think rdtsc is a per-core register and OVS always reads it, not updated.
> So it's not possible that while in the middle of 32-bit system reading it,
> the other half register gets updated. I'm not 100% sure, will let Ilya comment
> about it.

Only the owning thread has a direct access to 'last_tsc'. It stores all the
information that needs to be exposed to the atomic 'pmd_counters' of the
'struct pmd_perf_stats' or 'cycles' of 'struct dp_netdev_rxq' which could be
accessed from other threads.

And we're not counting TSC related stats for the non-PMD threads because it
makes no much sense anyway.

> 
>>
>> Looking at xmalloc_size_align(), I noticed a couple of things.  First,
>> the syntax (runt ? : 0) uses a GCC extension.  The portable way to write
>> it is (runt ? runt : 0), or just "runt" since it's a bool.  However, I
>> notice that the older code was (runt ? CACHE_LINE_SIZE : 0), and by
>> analogy the newer code should be (runt ? alignment : 0).  Are you
>> convinced that the substitute is correct?
> 
> Yes, this is a mistake. It should be
> (runt ? alignment : 0)
> 
>>
>> I also have a couple of trivia:
>>
>> * One of the conditionals in automake.mk isn't needed because
>>   LIBBPF_LDADD will be defined to empty if !HAVE_AF_XDP.
>>
>> * In dpif-netdev-perf.h, we don't need the casts because the standard C
>>   promotions work fine for us in this case.
>>
>> I'm appending an incremental to address the trivia:
>>
>> diff --git a/lib/automake.mk b/lib/automake.mk
>> index b07bb01c4ef7..7cb1c3373538 100644
>> --- a/lib/automake.mk
>> +++ b/lib/automake.mk
>> @@ -9,15 +9,12 @@ lib_LTLIBRARIES += lib/libopenvswitch.la
>>
>>  lib_libopenvswitch_la_LIBADD = $(SSL_LIBS)
>>  lib_libopenvswitch_la_LIBADD += $(CAPNG_LDADD)
>> +lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
>>
>>  if WIN32
>>  lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS}
>>  endif
>>
>> -if HAVE_AF_XDP
>> -lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
>> -endif
>> -
>>  lib_libopenvswitch_la_LDFLAGS = \
>>          $(OVS_LTINFO) \
>>          -Wl,--version-script=$(top_builddir)/lib/libopenvswitch.sym \
>> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
>> index 6b6dfda7db1c..3a5e90b16a5f 100644
>> --- a/lib/dpif-netdev-perf.h
>> +++ b/lib/dpif-netdev-perf.h
>> @@ -192,15 +192,11 @@ static inline uint64_t
>>  rdtsc_syscall(struct pmd_perf_stats *s)
>>  {
>>      struct timespec val;
>> -    uint64_t v;
>> -
>>      if (clock_gettime(CLOCK_MONOTONIC_RAW, &val) != 0) {
>>         return s->last_tsc;
>>      }
>>
>> -    v  = (uint64_t) val.tv_sec * 1000000000LL;
>> -    v += (uint64_t) val.tv_nsec;
>> -
>> +    uint64_t v  = val.tv_sec * UINT64_C(1000000000) + val.tv_nsec;
>>      return s->last_tsc = v;
>>  }
>>  #endif
> 
> Thank you, will add them in v17 patch.
> William
> 
> 


More information about the dev mailing list