[ovs-dev] question about dp_packet lifetime
Darrell Ball
dlu998 at gmail.com
Wed Mar 28 16:36:59 UTC 2018
I hit send too quick Alessandro; one clarification inline
On Wed, Mar 28, 2018 at 9:13 AM, Darrell Ball <dlu998 at gmail.com> wrote:
> Another aspect (besides what Ilya mentioned) you might want to check is to
> look at OVS patchwork for your patches,
> after you submit, and check that they are there, firstly.
> Also check that they look like other accepted patches overall and for
> chunks of similar code constructs.
>
> https://patchwork.ozlabs.org/project/openvswitch/list/
>
> Check that your patches can be applied on top of an updated master branch
> of OVS.
>
> I did a quick pass over the raw diff and noticed that in many cases you
> are already using lots of OVS apis which good.
>
> A few pointers:
> 1/ Try to use inline functions as much as possible, instead of macros
> 2/ Think about portability - Don't use direct calls to pthread_ apis for
> example
>
I am specifically referring to the locking apis, like pthread_spin_
3/ Create wrappers for new locks that use generic OVS lock apis
> 4/ Clearly describe any build dependencies, if any, in the install guide
> documentation.
> 5/ Think about portability for parts of the code and look how that is
> handled in other cases.
> 6/ I think it would be helpful for you to describe one or more use cases
> for netmap, for the general user.
> 7/ Think about testing and see what we can do to automate - we have system
> tests that run with
> make check-kmod and make check-system-userspace
> Existing files are tests/system-traffic.at and tests/system-ovn.at,
> which is shared for Linux and userspace datapath
> 8/ You might want to describe some tests results, including performance
> numbers in the cover letter.
>
> Cheers Darrell
>
>
> On Wed, Mar 28, 2018 at 1:50 AM, Alessandro Rosetti <
> alessandro.rosetti at gmail.com> wrote:
>
>> Hi Darrell, Ilya and everyone else,
>>
>> I'm contacting you since you were interested.
>> I've posted the patch that implements netmap in OVS attaching the file in
>> the mail, did I do it wrong?
>> https://mail.openvswitch.org/pipermail/ovs-dev/2018-March/345371.html
>>
>> I'm posting it inline now,
>> sorry for the mess!
>>
>> Alessandro.
>>
>> ----------------------------------------------------------------------
>>
>> diff --git a/acinclude.m4 b/acinclude.m4
>> index d61e37a5e..d9dd9fbd1 100644
>> --- a/acinclude.m4
>> +++ b/acinclude.m4
>> @@ -341,6 +341,36 @@ AC_DEFUN([OVS_CHECK_DPDK], [
>> AM_CONDITIONAL([DPDK_NETDEV], test "$DPDKLIB_FOUND" = true)
>> ])
>>
>> +dnl OVS_CHECK_NETMAP
>> +dnl
>> +dnl Check netmap
>> +AC_DEFUN([OVS_CHECK_NETMAP], [
>> + AC_ARG_WITH([netmap],
>> + [AC_HELP_STRING([--with-netmap], [Enable NETMAP])],
>> + [have_netmap=true])
>> + AC_MSG_CHECKING([whether netmap datapath is enabled])
>> +
>> + if test "$have_netmap" != true || test "$with_netmap" = no; then
>> + AC_MSG_RESULT([no])
>> + else
>> + AC_MSG_RESULT([yes])
>> + NETMAP_FOUND=false
>> + AC_LINK_IFELSE(
>> + [AC_LANG_PROGRAM([#include <net/if.h>
>> + #include<netinet/in.h>
>> + #include<net/netmap.h>
>> + #include<net/netmap_user.h>], [])],
>> + [NETMAP_FOUND=true])
>> + if $NETMAP_FOUND; then
>> + AC_DEFINE([NETMAP_NETDEV], [1], [NETMAP datapath is enabled.])
>> + else
>> + AC_MSG_ERROR([Could not find NETMAP headers])
>> + fi
>> + fi
>> +
>> + AM_CONDITIONAL([NETMAP_NETDEV], test "$NETMAP_FOUND" = true)
>> +])
>> +
>> dnl OVS_GREP_IFELSE(FILE, REGEX, [IF-MATCH], [IF-NO-MATCH])
>> dnl
>> dnl Greps FILE for REGEX. If it matches, runs IF-MATCH, otherwise
>> IF-NO-MATCH.
>> @@ -900,7 +930,7 @@ dnl with or without modifications, as long as this
>> notice is preserved.
>>
>> AC_DEFUN([_OVS_CHECK_CC_OPTION], [dnl
>> m4_define([ovs_cv_name], [ovs_cv_[]m4_translit([$1], [-= ], [__])])dnl
>> - AC_CACHE_CHECK([whether $CC accepts $1], [ovs_cv_name],
>> + AC_CACHE_CHECK([whether $CC accepts $1], [ovs_cv_name],
>> [ovs_save_CFLAGS="$CFLAGS"
>> dnl Include -Werror in the compiler options, because without -Werror
>> dnl clang's GCC-compatible compiler driver does not return a failure
>> @@ -951,7 +981,7 @@ dnl OVS_ENABLE_OPTION([OPTION])
>> dnl Check whether the given C compiler OPTION is accepted.
>> dnl If so, add it to WARNING_FLAGS.
>> dnl Example: OVS_ENABLE_OPTION([-Wdeclaration-after-statement])
>> -AC_DEFUN([OVS_ENABLE_OPTION],
>> +AC_DEFUN([OVS_ENABLE_OPTION],
>> [OVS_CHECK_CC_OPTION([$1], [WARNING_FLAGS="$WARNING_FLAGS $1"])
>> AC_SUBST([WARNING_FLAGS])])
>>
>> diff --git a/configure.ac b/configure.ac
>> index 9940a1a45..24cd4718c 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -180,6 +180,7 @@ AC_SUBST(KARCH)
>> OVS_CHECK_LINUX
>> OVS_CHECK_LINUX_TC
>> OVS_CHECK_DPDK
>> +OVS_CHECK_NETMAP
>> OVS_CHECK_PRAGMA_MESSAGE
>> AC_SUBST([OVS_CFLAGS])
>> AC_SUBST([OVS_LDFLAGS])
>> diff --git a/lib/automake.mk b/lib/automake.mk
>> index 5c26e0f33..4ccd9e22a 100644
>> --- a/lib/automake.mk
>> +++ b/lib/automake.mk
>> @@ -134,12 +134,14 @@ lib_libopenvswitch_la_SOURCES = \
>> lib/namemap.c \
>> lib/netdev-dpdk.h \
>> lib/netdev-dummy.c \
>> + lib/netdev-netmap.h \
>> lib/netdev-provider.h \
>> lib/netdev-vport.c \
>> lib/netdev-vport.h \
>> lib/netdev-vport-private.h \
>> lib/netdev.c \
>> lib/netdev.h \
>> + lib/netmap.h \
>> lib/netflow.h \
>> lib/netlink.c \
>> lib/netlink.h \
>> @@ -403,6 +405,15 @@ lib_libopenvswitch_la_SOURCES += \
>> lib/dpdk-stub.c
>> endif
>>
>> +if NETMAP_NETDEV
>> +lib_libopenvswitch_la_SOURCES += \
>> + lib/netmap.c \
>> + lib/netdev-netmap.c
>> +else
>> +lib_libopenvswitch_la_SOURCES += \
>> + lib/netmap-stub.c
>> +endif
>> +
>> if WIN32
>> lib_libopenvswitch_la_SOURCES += \
>> lib/dpif-netlink.c \
>> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>> index 443c22504..e917e6d6a 100644
>> --- a/lib/dp-packet.c
>> +++ b/lib/dp-packet.c
>> @@ -92,6 +92,7 @@ dp_packet_use_const(struct dp_packet *b, const void
>> *data, size_t size)
>> dp_packet_set_size(b, size);
>> }
>>
>> +
>> /* Initializes 'b' as an empty dp_packet that contains the 'allocated'
>> bytes.
>> * DPDK allocated dp_packet and *data is allocated from one continous
>> memory
>> * region as part of memory pool, so in memory data start right after
>> @@ -105,6 +106,19 @@ dp_packet_init_dpdk(struct dp_packet *b, size_t
>> allocated)
>> b->source = DPBUF_DPDK;
>> }
>>
>> +/* Initializes 'b' as a dp_packet whose data points to a netmap buffer
>> of size
>> + * 'size' bytes. */
>> +#ifdef NETMAP_NETDEV
>> +void
>> +dp_packet_init_netmap(struct dp_packet *b, void *data, size_t size)
>> +{
>> + b->source = DPBUF_NETMAP;
>> + dp_packet_set_base(b, data);
>> + dp_packet_set_data(b, data);
>> + dp_packet_set_size(b, size);
>> +}
>> +#endif
>> +
>> /* Initializes 'b' as an empty dp_packet with an initial capacity of
>> 'size'
>> * bytes. */
>> void
>> @@ -125,6 +139,11 @@ dp_packet_uninit(struct dp_packet *b)
>> /* If this dp_packet was allocated by DPDK it must have been
>> * created as a dp_packet */
>> free_dpdk_buf((struct dp_packet*) b);
>> +#endif
>> + } else if (b->source == DPBUF_NETMAP) {
>> +#ifdef NETMAP_NETDEV
>> + /* If this dp_packet was allocated by NETMAP, release it. */
>> + netmap_free_packet(b);
>> #endif
>> }
>> }
>> @@ -241,6 +260,9 @@ dp_packet_resize__(struct dp_packet *b, size_t
>> new_headroom, size_t new_tailroom
>> case DPBUF_DPDK:
>> OVS_NOT_REACHED();
>>
>> + case DPBUF_NETMAP:
>> + OVS_NOT_REACHED();
>> +
>> case DPBUF_MALLOC:
>> if (new_headroom == dp_packet_headroom(b)) {
>> new_base = xrealloc(dp_packet_base(b), new_allocated);
>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>> index 21c8ca525..bd7832533 100644
>> --- a/lib/dp-packet.h
>> +++ b/lib/dp-packet.h
>> @@ -26,6 +26,7 @@
>> #endif
>>
>> #include "netdev-dpdk.h"
>> +#include "netdev-netmap.h"
>> #include "openvswitch/list.h"
>> #include "packets.h"
>> #include "util.h"
>> @@ -42,6 +43,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
>> DPBUF_DPDK, /* buffer data is from DPDK allocated
>> memory.
>> * ref to dp_packet_init_dpdk() in
>> dp-packet.c.
>> */
>> + DPBUF_NETMAP, /* Buffers are from netmap allocated
>> memory. */
>> };
>>
>> #define DP_PACKET_CONTEXT_SIZE 64
>> @@ -60,6 +62,9 @@ struct dp_packet {
>> uint32_t size_; /* Number of bytes in use. */
>> uint32_t rss_hash; /* Packet hash. */
>> bool rss_hash_valid; /* Is the 'rss_hash' valid? */
>> +#endif
>> +#ifdef NETMAP_NETDEV
>> + uint32_t buf_idx; /* Netmap slot index. */
>> #endif
>> enum dp_packet_source source; /* Source of memory allocated as
>> 'base'. */
>>
>> @@ -115,6 +120,7 @@ void dp_packet_use_stub(struct dp_packet *, void *,
>> size_t);
>> void dp_packet_use_const(struct dp_packet *, const void *, size_t);
>>
>> void dp_packet_init_dpdk(struct dp_packet *, size_t allocated);
>> +void dp_packet_init_netmap(struct dp_packet *, void *, size_t);
>>
>> void dp_packet_init(struct dp_packet *, size_t);
>> void dp_packet_uninit(struct dp_packet *);
>> @@ -173,6 +179,13 @@ dp_packet_delete(struct dp_packet *b)
>> * created as a dp_packet */
>> free_dpdk_buf((struct dp_packet*) b);
>> return;
>> + } else if (b->source == DPBUF_NETMAP) {
>> + /* It was allocated by a netdev_netmap, it will be marked
>> + * for reuse. */
>> +#ifdef NETMAP_NETDEV
>> + netmap_free_packet(b);
>> +#endif
>> + return;
>> }
>>
>> dp_packet_uninit(b);
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index b07fc6b8b..af81c992b 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -4119,11 +4119,14 @@ reload:
>>
>> /* List port/core affinity */
>> for (i = 0; i < poll_cnt; i++) {
>> - VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
>> - pmd->core_id, netdev_rxq_get_name(poll_list[i].rxq->rx),
>> - netdev_rxq_get_queue_id(poll_list[i].rxq->rx));
>> - /* Reset the rxq current cycles counter. */
>> - dp_netdev_rxq_set_cycles(poll_list[i].rxq, RXQ_CYCLES_PROC_CURR,
>> 0);
>> + VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
>> + pmd->core_id, netdev_rxq_get_name(poll_list[
>> i].rxq->rx),
>> + netdev_rxq_get_queue_id(poll_list[i].rxq->rx));
>> + /* Reset the rxq current cycles counter. */
>> + dp_netdev_rxq_set_cycles(poll_list[i].rxq,
>> RXQ_CYCLES_PROC_CURR, 0);
>> +#ifdef NETMAP_NETDEV
>> + netmap_init_port(poll_list[i].rxq->rx);
>> +#endif
>> }
>>
>> if (!poll_cnt) {
>> diff --git a/lib/netdev-netmap.c b/lib/netdev-netmap.c
>> new file mode 100644
>> index 000000000..87b292895
>> --- /dev/null
>> +++ b/lib/netdev-netmap.c
>> @@ -0,0 +1,1014 @@
>> +#include <config.h>
>> +
>> +#include <errno.h>
>> +#include <math.h>
>> +#include <net/if.h>
>> +#include <netinet/in.h>
>> +#include <net/netmap.h>
>> +#define NETMAP_WITH_LIBS
>> +#include <net/netmap_user.h>
>> +#include <sys/ioctl.h>
>> +#include <sys/syscall.h>
>> +
>> +#include "dpif.h"
>> +#include "netdev.h"
>> +#include "netdev-provider.h"
>> +#include "netmap.h"
>> +#include "netdev-netmap.h"
>> +#include "openvswitch/list.h"
>> +#include "openvswitch/poll-loop.h"
>> +#include "openvswitch/vlog.h"
>> +#include "ovs-thread.h"
>> +#include "packets.h"
>> +#include "smap.h"
>> +
>> +#define DP_BLOCK_SIZE NETDEV_MAX_BURST * 2
>> +#define DEFAULT_RSYNC_INTVAL 5
>> +
>> +VLOG_DEFINE_THIS_MODULE(netdev_netmap);
>> +
>> +static struct vlog_rate_limit rl OVS_UNUSED = VLOG_RATE_LIMIT_INIT(5,
>> 100);
>> +
>> +struct netdev_netmap {
>> + struct netdev up;
>> + struct nm_desc *nmd;
>> +
>> + uint64_t timestamp;
>> + uint32_t rxsync_intval;
>> +
>> + struct ovs_list list_node;
>> + long tid;
>> + struct nm_alloc *nma;
>> +
>> + struct ovs_mutex mutex OVS_ACQ_AFTER(netmap_mutex);
>> + pthread_spinlock_t tx_lock;
>> +
>> + struct netdev_stats stats;
>> + struct eth_addr hwaddr;
>> + enum netdev_flags flags;
>> +
>> + int mtu;
>> + int requested_mtu;
>> +};
>> +
>> +struct netdev_rxq_netmap {
>> + struct netdev_rxq up;
>> +};
>> +
>> +static void netdev_netmap_destruct(struct netdev *netdev);
>> +
>> +static bool
>> +is_netmap_class(const struct netdev_class *class)
>> +{
>> + return class->destruct == netdev_netmap_destruct;
>> +}
>> +
>> +static struct netdev_netmap *
>> +netdev_netmap_cast(const struct netdev *netdev)
>> +{
>> + ovs_assert(is_netmap_class(netdev_get_class(netdev)));
>> + return CONTAINER_OF(netdev, struct netdev_netmap, up);
>> +}
>> +
>> +static struct netdev_rxq_netmap *
>> +netdev_rxq_netmap_cast(const struct netdev_rxq *rx)
>> +{
>> + ovs_assert(is_netmap_class(netdev_get_class(rx->netdev)));
>> + return CONTAINER_OF(rx, struct netdev_rxq_netmap, up);
>> +}
>> +
>> +static struct ovs_mutex netmap_mutex = OVS_MUTEX_INITIALIZER;
>> +
>> +/* Blocks are used to store DP_BLOCK_SIZE preallocated netmap dp_packets.
>> + * During receive operation, dp_packets are allocated by moving them
>> from a
>> + * block to a dp_batch. A block is refilled when packets are freed.
>> + * Each netmap dp_packet has source type set to DPBUF_NETMAP, with
>> buf_idx
>> + * identifying a netmap buffer. Packets in the blocks (or in flight
>> within OVS)
>> + * are not attached to any netmap ring, i.e. their buf_idx is not stored
>> in
>> + * any netmap slot. On receive or transmit, the netmap buffer owned by a
>> + * dp_packet is swapped with one attached to a receive/transmit ring
>> slot,
>> + * by simply swapping the buf_idx values. */
>> +struct nm_block {
>> + struct ovs_list node; /* Blocks can be chained
>> + * in a list. */
>> + struct dp_packet* packets[DP_BLOCK_SIZE]; /* Array of dp_packets. */
>> + uint16_t idx; /* Array index of the
>> current
>> + * packet. */
>> +};
>> +
>> +enum nm_block_type {
>> + NM_BLOCK_TYPE_PUT = 0,
>> + NM_BLOCK_TYPE_GET = 1,
>> +};
>> +
>> +/* Global data structures of the netmap dp_packet allocator. */
>> +static struct nm_runtime {
>> + struct ovs_list port_list; /* List of all netmap netdevs. */
>> + struct ovs_list block_list[2]; /* Lists for dp_packet blocks: one for
>> + * empty and one for full ones. */
>> + void *mem;
>> + uint16_t memid;
>> + uint32_t memsize;
>> + uint32_t nextrabufs;
>> +} nmr = { 0 };
>> +
>> +/* Each thread uses a pair of blocks for allocations and deallocations.
>> */
>> +struct nm_alloc {
>> + struct nm_block *block[2]; /* Blocks used by TX/RX to
>> allocate/dealloacte
>> + * dp_packets. */
>> +};
>> +
>> +/* Thread local allocators for packet allocations/dellocations */
>> +DEFINE_STATIC_PER_THREAD_DATA(struct nm_alloc, nma, { 0 });
>> +#define NMA nma_get()
>> +#define PUTB nma_get()->block[NM_BLOCK_TYPE_PUT]
>> +#define GETB nma_get()->block[NM_BLOCK_TYPE_GET]
>> +
>> +/* Creates a new block.
>> + * The block can be empty or initialized with new dp_packets associated
>> to
>> + * netmap buffers not attached to a netmap ring. */
>> +static struct nm_block*
>> +nm_block_new(struct nm_desc *nmd) {
>> + struct nm_block *block;
>> +
>> + block = xmalloc(sizeof(struct nm_block));
>> + block->idx = 0;
>> + ovs_list_init(&block->node);
>> +
>> + if (nmd) {
>> + struct dp_packet *packet;
>> + struct netmap_ring *ring = NETMAP_RXRING(nmd->nifp, 0);
>> + uint32_t idx = nmd->nifp->ni_bufs_head;
>> +
>> + for (int i = 0; idx && i < DP_BLOCK_SIZE;
>> + i++, idx = *(uint32_t *)NETMAP_BUF(ring, idx)) {
>> + packet = dp_packet_new(0);
>> + packet->buf_idx = idx;
>> + packet->source = DPBUF_NETMAP;
>> + block->packets[block->idx++] = packet;
>> + }
>> +
>> + nmd->nifp->ni_bufs_head = idx;
>> + }
>> +
>> + return block;
>> +}
>> +
>> +/* Swaps blocks from nm_runtime in order to replace the current block
>> with
>> + * an empty or full block.
>> + * if we want GETB to be swapped with a block filled with dp_packets we
>> will
>> + * speciry NM_BLOCK_TYPE_GET.
>> + * if we want PUTB to be swapped with a block filled with dp_packets we
>> will
>> + * speciry NM_BLOCK_TYPE_PUT. */
>> +static void
>> +nm_block_swap_global(enum nm_block_type type) {
>> + struct nm_block **bselect = NULL;
>> + struct nm_block *bswap = NULL, *btmp;
>> +
>> + ovs_mutex_lock(&netmap_mutex);
>> +
>> + bselect = &(NMA->block[type]);
>> +
>> + /* Try to pop a block form the correct list */
>> + if (!ovs_list_is_empty(&nmr.block_list[type])) {
>> + bswap = CONTAINER_OF(ovs_list_pop_front(&nmr.block_list[type]),
>> + struct nm_block, node);
>> + } else {
>> + bswap = nm_block_new(NULL);
>> + }
>> +
>> + /* Swap blocks. */
>> + if (OVS_LIKELY(bswap)) {
>> + btmp = *bselect;
>> + *bselect = bswap;
>> + /* If the current block is empty it will be pushed to the empty
>> list
>> + * and viceversa if it not empty. */
>> + type = btmp->idx ? NM_BLOCK_TYPE_GET : NM_BLOCK_TYPE_PUT;
>> + ovs_list_push_back(&nmr.block_list[type], &btmp->node);
>> + }
>> +
>> + ovs_mutex_unlock(&netmap_mutex);
>> +}
>> +
>> +/* Swap the two blocks of the local allocator. */
>> +static void
>> +nm_block_swap_local(void) {
>> + struct nm_block* block = GETB;
>> + GETB = PUTB;
>> + PUTB = block;
>> +}
>> +
>> +/* Frees a block from memory.
>> + * If nmd is specified we will return extra buffers to this
>> + * nm_desc if the block contains any dp_packet. */
>> +static void
>> +nm_block_free(struct nm_block* b, struct nm_desc *nmd) {
>> + if (b) {
>> + if (nmd) {
>> + struct netmap_ring *ring = NETMAP_RXRING(nmd->nifp, 0);
>> +
>> + for (int i = 0; i < b->idx; i++) {
>> + struct dp_packet *packet = b->packets[i];
>> + if (packet) {
>> + uint32_t *e = (uint32_t *) NETMAP_BUF(ring,
>> packet->buf_idx);
>> + *e = nmd->nifp->ni_bufs_head;
>> + nmd->nifp->ni_bufs_head = packet->buf_idx;
>> + free(packet);
>> + }
>> + }
>> + }
>> +
>> + free(b);
>> + }
>> +}
>> +
>> +/* Set up the port by checking if any other port has already been opened.
>> + * Prepare blocks of dp_packets. */
>> +static int
>> +netmap_setup_port(struct nm_desc *nmd) {
>> + ovs_mutex_lock(&netmap_mutex);
>> +
>> + if (ovs_list_size(&nmr.port_list)) {
>> + /* Netmap memory has already been set up, check if the new port
>> uses
>> + * the same memid */
>> + if (nmr.memid != nmd->req.nr_arg2) {
>> + VLOG_WARN("unable to add this port, it has a new mem_id
>> (%x->%x)",
>> + nmr.memid, nmd->req.nr_arg2);
>> + ovs_mutex_unlock(&netmap_mutex);
>> + return 1;
>> + }
>> + } else {
>> + /* We are initializing the first Netmap port: setup Netmap memory
>> + * to this process. */
>> + nmr.memid = nmd->req.nr_arg2;
>> + nmr.memsize = nmd->req.nr_memsize;
>> + nmr.mem = mmap(0, nmr.memsize, PROT_WRITE | PROT_READ,
>> + MAP_SHARED, nmd->fd, 0);
>> +
>> + if (nmr.mem == MAP_FAILED) {
>> + VLOG_WARN("mmap has failed!");
>> + ovs_mutex_unlock(&netmap_mutex);
>> + return 1;
>> + }
>> + }
>> +
>> + /* Now we can set up the following nmd fields */
>> + {
>> + struct netmap_if *nifp;
>> +
>> + nmd->memsize = nmr.memsize;
>> + nmd->mem = nmr.mem;
>> + nifp = NETMAP_IF(nmd->mem, nmd->req.nr_offset);
>> + *(struct netmap_if **)(uintptr_t)&(nmd->nifp) = nifp;
>> + }
>> +
>> + /* Allocate a number of blocks containing dp_packets. The total
>> number
>> + * of extrabuffers to be used is multiple of the blocksize */
>> + uint32_t nextrabufs = nmd->req.nr_arg3 & ~(DP_BLOCK_SIZE-1);
>> + struct nm_block *block;
>> + for (int i = 0 ; i < (nextrabufs/DP_BLOCK_SIZE); i++) {
>> + block = nm_block_new(nmd);
>> + ovs_list_push_back(&nmr.block_list[NM_BLOCK_TYPE_GET],
>> &block->node);
>> + }
>> +
>> + ovs_mutex_unlock(&netmap_mutex);
>> +
>> + return 0;
>> +}
>> +
>> +/* This function initializes some variables and has to be called in the
>> pmd
>> + * thread reload.
>> + * Thanks to this we can initialize thread local blocks and recognize
>> + * if there are other ports using our thread-id. */
>> +void
>> +netmap_init_port(struct netdev_rxq *rxq) {
>> +
>> + ovs_mutex_lock(&netmap_mutex);
>> +
>> + if(is_netmap_class(netdev_get_class(rxq->netdev))) {
>> + struct netdev_netmap *dev = netdev_netmap_cast(rxq->netdev);
>> + dev->tid = syscall(SYS_gettid);
>> + dev->nma = NMA;
>> + }
>> +
>> + /* We need to initialize new blocks in the local allocator */
>> + if (!GETB) {
>> + GETB = nm_block_new(NULL);
>> + }
>> +
>> + if (!PUTB) {
>> + PUTB = nm_block_new(NULL);
>> + }
>> +
>> + ovs_mutex_unlock(&netmap_mutex);
>> +}
>> +
>> +/* This function is called upon dp_packet deallocation. The pointer is
>> not
>> + * dellocated but saved in a nm_block that has free space. */
>> +void
>> +netmap_free_packet(struct dp_packet* packet) {
>> + struct nm_block* block = PUTB;
>> +
>> + if (OVS_UNLIKELY(block->idx == (DP_BLOCK_SIZE - 1))) {
>> + block = GETB;
>> + if (OVS_UNLIKELY(block->idx == (DP_BLOCK_SIZE - 1))) {
>> + nm_block_swap_global(NM_BLOCK_TYPE_PUT);
>> + block = PUTB;
>> + }
>> + }
>> +
>> + block->packets[block->idx++] = packet;
>> +}
>> +
>> +/* Allocate 'n' dp_packets to the batch. This operation might require
>> + * multiple memcpy operations. If no thread local nm_block has data we
>> need
>> + * to ask for a new block to the nm_runtime. */
>> +static int
>> +netmap_alloc_packets(struct dp_packet_batch* b, size_t n) {
>> + struct nm_block* block;
>> + size_t step, tot = 0, s;
>> +
>> + for (step = 0; step < 3; step++) {
>> + block = GETB;
>> + s = MIN(n, block->idx);
>> + memcpy(&b->packets[tot], &block->packets[block->idx - s],
>> + s * sizeof(struct dp_packet*));
>> + block->idx -= s;
>> + tot += s;
>> + n -= s;
>> +
>> + if (n == 0) {
>> + break;
>> + } else if (OVS_LIKELY(step == 0)) {
>> + nm_block_swap_local();
>> + } else {
>> + nm_block_swap_global(NM_BLOCK_TYPE_GET);
>> + }
>> + }
>> +
>> + return tot;
>> +}
>> +
>> +/* Set up some values from the configuration. */
>> +void
>> +netmap_init_config(const struct smap *ovs_other_config) {
>> + nmr.nextrabufs = (uint32_t)
>> + smap_get_int(ovs_other_config, "netmap-nextrabufs",
>> DP_BLOCK_SIZE);
>> +
>> + nmr.nextrabufs &= ~(DP_BLOCK_SIZE-1);
>> +
>> + VLOG_INFO("nextrabufs: %d", nmr.nextrabufs);
>> +}
>> +
>> +static struct netdev_rxq *
>> +netdev_netmap_rxq_alloc(void)
>> +{
>> + struct netdev_rxq_netmap *rx = xzalloc(sizeof *rx);
>> + return &rx->up;
>> +}
>> +
>> +static int
>> +netdev_netmap_rxq_construct(struct netdev_rxq *rxq OVS_UNUSED)
>> +{
>> + /* Nothing to do here */
>> + return 0;
>> +}
>> +
>> +static void
>> +netdev_netmap_rxq_destruct(struct netdev_rxq *rxq OVS_UNUSED)
>> +{
>> + /* Nothing to do here */
>> + return;
>> +}
>> +
>> +static void
>> +netdev_netmap_rxq_dealloc(struct netdev_rxq *rxq)
>> +{
>> + struct netdev_rxq_netmap *rx = netdev_rxq_netmap_cast(rxq);
>> + free(rx);
>> +}
>> +
>> +static struct netdev *
>> +netdev_netmap_alloc(void)
>> +{
>> + struct netdev_netmap *dev;
>> +
>> + dev = (struct netdev_netmap *) xzalloc(sizeof *dev);
>> + if (dev) {
>> + return &dev->up;
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> +static int
>> +netdev_netmap_construct(struct netdev *netdev)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> + const char *ifname = netdev_get_name(netdev);
>> +
>> + struct nmreq req;
>> + memset(&req, 0 , sizeof(req));
>> + req.nr_arg3 = nmr.nextrabufs;
>> +
>> + /* Open Netmap port requesting a number of extrabuffers. We also
>> avoid to
>> + * mmap netmap memory here. */
>> + dev->nmd = nm_open(ifname, &req, NM_OPEN_NO_MMAP, NULL);
>> +
>> + if (!dev->nmd) {
>> + if (!errno) {
>> + VLOG_WARN("opening port \"%s\" failed: not a netmap port",
>> ifname);
>> + } else {
>> + VLOG_WARN("opening port \"%s\" failed: %s", ifname,
>> + ovs_strerror(errno));
>> + }
>> + return EINVAL;
>> + } else {
>> + VLOG_INFO("opening port \"%s\"", ifname);
>> + }
>> +
>> + /* Check if we have enough extra buffers to create a nm_block. */
>> + if (dev->nmd->req.nr_arg3 < DP_BLOCK_SIZE) {
>> + VLOG_WARN("not enough extra buffers(%d/%d), closing port",
>> + dev->nmd->req.nr_arg3, DP_BLOCK_SIZE);
>> + nm_close(dev->nmd);
>> + return EINVAL;
>> + }
>> +
>> + /* Possibly mmap netmap memory, initialize the nm_desc, nm_runtime.
>> + * Allocate some nm_blocks using the extrabuffers given to this
>> port. */
>> + if (netmap_setup_port(dev->nmd)) {
>> + VLOG_WARN("could not setup \"%s\" port", ifname);
>> + nm_close(dev->nmd);
>> + return EINVAL;
>> + }
>> +
>> + ovs_list_init(&dev->list_node);
>> + ovs_mutex_lock(&netmap_mutex);
>> + ovs_list_push_front(&nmr.port_list, &dev->list_node);
>> + ovs_mutex_unlock(&netmap_mutex);
>> +
>> + ovs_mutex_init(&dev->mutex);
>> + pthread_spin_init(&dev->tx_lock, PTHREAD_PROCESS_SHARED);
>> + eth_addr_random(&dev->hwaddr);
>> + dev->flags = NETDEV_UP | NETDEV_PROMISC;
>> + dev->timestamp = netmap_rdtsc();
>> + dev->rxsync_intval = DEFAULT_RSYNC_INTVAL;
>> + dev->requested_mtu = NETMAP_RXRING(dev->nmd->nifp, 0)->nr_buf_size;
>> + netdev_request_reconfigure(netdev);
>> +
>> + return 0;
>> +}
>> +
>> +static void
>> +netdev_netmap_destruct(struct netdev *netdev)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> + struct nm_block* b;
>> +
>> + ovs_mutex_lock(&netmap_mutex);
>> + VLOG_INFO("closing port \"%s\"", (const char*)
>> netdev_get_name(netdev));
>> +
>> + ovs_list_remove(&dev->list_node);
>> +
>> + /* A netmap netdev is being removed.
>> + * If this is the last netmap port we remove all blocks. */
>> + if (!ovs_list_size(&nmr.port_list)) {
>> + LIST_FOR_EACH_POP(b, node, &nmr.block_list[NM_BLOCK_TYPE_PUT]) {
>> + nm_block_free(b, dev->nmd);
>> + }
>> +
>> + LIST_FOR_EACH_POP(b, node, &nmr.block_list[NM_BLOCK_TYPE_GET]) {
>> + nm_block_free(b, dev->nmd);
>> + }
>> + } else {
>> + struct netdev_netmap *d;
>> + enum nm_block_type type;
>> + int last_thread_port = true;
>> +
>> + /* Check if there are other netmap ports using the same thread
>> id. */
>> + LIST_FOR_EACH(d, list_node, &nmr.port_list) {
>> + if (dev->tid == d->tid) {
>> + last_thread_port = false;
>> + break;
>> + }
>> + }
>> +
>> + /* If there are no ports using this thread id we return thread
>> local
>> + * blocks to the global allocator nm_runtime. */
>> + if (last_thread_port) {
>> + b = dev->nma->block[NM_BLOCK_TYPE_PUT];
>> + type = b->idx ? NM_BLOCK_TYPE_GET : NM_BLOCK_TYPE_PUT;
>> + ovs_list_push_front(&nmr.block_list[type], &b->node);
>> + dev->nma->block[NM_BLOCK_TYPE_PUT] = NULL;
>> +
>> + b = dev->nma->block[NM_BLOCK_TYPE_GET];
>> + type = b->idx ? NM_BLOCK_TYPE_GET : NM_BLOCK_TYPE_PUT;
>> + ovs_list_push_front(&nmr.block_list[type], &b->node);
>> + dev->nma->block[NM_BLOCK_TYPE_GET] = NULL;
>> + }
>> +
>> + /* We will now try to free a number of blocks equal to the blocks
>> + * allocated when the port was created.
>> + * Each block is then freed returning the extra bufs to the
>> nm_desc. */
>> + int nblocks = nmr.nextrabufs / DP_BLOCK_SIZE;
>> + LIST_FOR_EACH_POP(b, node, &nmr.block_list[NM_BLOCK_TYPE_GET]) {
>> + nm_block_free(b, dev->nmd);
>> + if (!--nblocks) {
>> + break;
>> + }
>> + }
>> +
>> + if (!ovs_list_is_empty(&nmr.block_list[NM_BLOCK_TYPE_PUT])) {
>> + struct ovs_list *list_node = ovs_list_pop_front(
>> + &nmr.block_list[NM_BLOCK_TYPE_
>> PUT]);
>> + b = CONTAINER_OF(list_node, struct nm_block, node);
>> + nm_block_free(b, dev->nmd);
>> + }
>> + }
>> +
>> + ovs_mutex_unlock(&netmap_mutex);
>> +
>> + /* Now we can close the port. */
>> + nm_close(dev->nmd);
>> +}
>> +
>> +static void
>> +netdev_netmap_dealloc(struct netdev *netdev)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> +
>> + ovs_mutex_destroy(&dev->mutex);
>> + pthread_spin_destroy(&dev->tx_lock);
>> +
>> + free(dev);
>> +}
>> +
>> +static int
>> +netdev_netmap_class_init(void)
>> +{
>> + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>> +
>> + if (ovsthread_once_start(&once)) {
>> + ovs_list_init(&nmr.block_list[NM_BLOCK_TYPE_PUT]);
>> + ovs_list_init(&nmr.block_list[NM_BLOCK_TYPE_GET]);
>> + ovs_list_init(&nmr.port_list);
>> + ovsthread_once_done(&once);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +netdev_netmap_reconfigure(struct netdev *netdev)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> + int err = 0;
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> +
>> + if (dev->mtu == dev->requested_mtu) {
>> + /* Reconfiguration is unnecessary */
>> + goto out;
>> + }
>> +
>> + dev->mtu = dev->requested_mtu;
>> + netdev_change_seq_changed(netdev);
>> +
>> +out:
>> + ovs_mutex_unlock(&dev->mutex);
>> + return err;
>> +}
>> +
>> +static int
>> +netdev_netmap_get_config(const struct netdev *netdev, struct smap *args)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> + smap_add_format(args, "mtu", "%d", dev->mtu);
>> + ovs_mutex_unlock(&dev->mutex);
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +netdev_netmap_set_config(struct netdev *netdev, const struct smap *args,
>> + char **errp OVS_UNUSED)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> + dev->rxsync_intval = smap_get_int(args, "rxsync-intval",
>> + DEFAULT_RSYNC_INTVAL);
>> + ovs_mutex_unlock(&dev->mutex);
>> +
>> + return 0;
>> +}
>> +
>> +static inline void
>> +netmap_rxsync(struct netdev_netmap *dev)
>> +{
>> + uint64_t now = netmap_rdtsc();
>> + unsigned int diff = TSC2US(now - dev->timestamp);
>> +
>> + if (diff < dev->rxsync_intval) {
>> + /* skipping rxsync */
>> + return;
>> + }
>> +
>> + ioctl(dev->nmd->fd, NIOCRXSYNC, NULL);
>> +
>> + /* update current timestamp */
>> + dev->timestamp = now;
>> +}
>> +
>> +static inline void
>> +netmap_swap_slot(struct dp_packet *packet, struct netmap_slot *s) {
>> + uint32_t idx;
>> +
>> + idx = s->buf_idx;
>> + s->buf_idx = packet->buf_idx;
>> + s->flags |= NS_BUF_CHANGED;
>> + packet->buf_idx = idx;
>> +}
>> +
>> +static int
>> +netdev_netmap_send(struct netdev *netdev, int qid OVS_UNUSED,
>> + struct dp_packet_batch *batch, bool concurrent_txq)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> + struct nm_desc *nmd = dev->nmd;
>> + uint16_t r, nrings = dev->nmd->nifp->ni_tx_rings;
>> + uint32_t budget = batch->count, count = 0;
>> + bool again = false;
>> +
>> + if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
>> + dp_packet_delete_batch(batch, true);
>> + return 0;
>> + }
>> +
>> + if (OVS_UNLIKELY(concurrent_txq)) {
>> + pthread_spin_lock(&dev->tx_lock);
>> + }
>> +
>> +try_again:
>> + for (r = 0; r < nrings; r++) {
>> + struct netmap_ring *ring;
>> + uint32_t head, space;
>> +
>> + ring = NETMAP_TXRING(nmd->nifp, nmd->cur_tx_ring);
>> + space = nm_ring_space(ring); /* Available slots in this ring. */
>> + head = ring->head;
>> +
>> + if (space > budget) {
>> + space = budget;
>> + }
>> + budget -= space;
>> +
>> + /* Transmit as much as possible in this ring. */
>> + while (space--) {
>> + struct netmap_slot *ts = &ring->slot[head];
>> + struct dp_packet *packet = batch->packets[count++];
>> +
>> + ts->len = dp_packet_get_send_len(packet);
>> +
>> + if (OVS_UNLIKELY(packet->source != DPBUF_NETMAP)) {
>> + /* send packet copying data to the netmap slot */
>> + memcpy(NETMAP_BUF(ring, ts->buf_idx),
>> + dp_packet_data(packet), ts->len);
>> + } else {
>> + /* send packet using zerocopy */
>> + netmap_swap_slot(packet, ts);
>> + }
>> +
>> + head = nm_ring_next(ring, head);
>> + }
>> +
>> + ring->head = ring->cur = head;
>> +
>> + /* We may have exhausted the budget */
>> + if (OVS_LIKELY(!budget)) {
>> + break;
>> + }
>> +
>> + /* We still have packets to send, select next ring. */
>> + if (OVS_UNLIKELY(++dev->nmd->cur_tx_ring == nrings)) {
>> + nmd->cur_tx_ring = 0;
>> + }
>> + }
>> +
>> + ioctl(dev->nmd->fd, NIOCTXSYNC, NULL);
>> +
>> + if (OVS_UNLIKELY(!count && !again)) {
>> + again = true;
>> + goto try_again;
>> + }
>> +
>> + dp_packet_delete_batch(batch, true);
>> +
>> + if (OVS_UNLIKELY(concurrent_txq)) {
>> + pthread_spin_unlock(&dev->tx_lock);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +netdev_netmap_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch
>> *batch)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(rxq->netdev);
>> + struct nm_desc *nmd = dev->nmd;
>> + uint16_t r, nrings = nmd->nifp->ni_rx_rings;
>> + uint32_t budget = 0;
>> +
>> + if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
>> + return EAGAIN;
>> + }
>> +
>> + /* check how much we can receive */
>> + for (r = nmd->first_rx_ring; r < nrings; r++) {
>> + budget += nm_ring_space(NETMAP_RXRING(nmd->nifp, r));
>> + }
>> +
>> + /* sync if there is no packet */
>> + if (budget == 0) {
>> + netmap_rxsync(dev);
>> + return EAGAIN;
>> + }
>> +
>> + /* allocate the batch */
>> + budget = netmap_alloc_packets(batch, MIN(budget, NETDEV_MAX_BURST));
>> +
>> + for (r = 0; r < nrings; r++) {
>> + struct netmap_ring *ring;
>> + uint32_t head, space;
>> +
>> + ring = NETMAP_RXRING(nmd->nifp, nmd->cur_rx_ring);
>> + head = ring->head;
>> + space = nm_ring_space(ring);
>> +
>> + if (space > budget) {
>> + space = budget;
>> + }
>> + budget -= space;
>> +
>> + /* Receive as much as possible from this ring. */
>> + while (space--) {
>> + struct netmap_slot *rs = &ring->slot[head];
>> + struct dp_packet *packet = batch->packets[batch->count++];
>> + dp_packet_init_netmap(packet, NETMAP_BUF(ring, rs->buf_idx),
>> + rs->len);
>> + /* receiving from a netmap port we can always zero copy
>> here. */
>> + netmap_swap_slot(packet, rs);
>> + head = nm_ring_next(ring, head);
>> + }
>> +
>> + ring->cur = ring->head = head;
>> +
>> + /* check if the batch has been filled. */
>> + if (!budget) {
>> + break;
>> + }
>> +
>> + /* batch isn't full, try to receive on other rings. */
>> + if (OVS_UNLIKELY(++nmd->cur_rx_ring == nrings)) {
>> + nmd->cur_rx_ring = 0;
>> + }
>> + }
>> +
>> + dp_packet_batch_init_packet_fields(batch);
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +netdev_netmap_get_ifindex(const struct netdev *netdev)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> + /* Calculate hash from the netdev name. Ensure that ifindex is a
>> 24-bit
>> + * postive integer to meet RFC 2863 recommendations.
>> + */
>> + int ifindex = hash_string(netdev->name, 0) % 0xfffffe + 1;
>> + ovs_mutex_unlock(&dev->mutex);
>> +
>> + return ifindex;
>> +}
>> +
>> +static int
>> +netdev_netmap_get_mtu(const struct netdev *netdev, int *mtu)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> + *mtu = dev->mtu;
>> + ovs_mutex_unlock(&dev->mutex);
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +netdev_netmap_set_mtu(struct netdev *netdev, int mtu)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> +
>> + if (mtu > NETMAP_RXRING(dev->nmd->nifp, 0)->nr_buf_size
>> + || mtu < ETH_HEADER_LEN) {
>> + VLOG_WARN("%s: unsupported MTU %d\n", dev->up.name, mtu);
>> + return EINVAL;
>> + }
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> + if (dev->requested_mtu != mtu) {
>> + dev->requested_mtu = mtu;
>> + netdev_request_reconfigure(netdev);
>> + }
>> + ovs_mutex_unlock(&dev->mutex);
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +netdev_netmap_set_etheraddr(struct netdev *netdev, const struct
>> eth_addr mac)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> + dev->hwaddr = mac;
>> + netdev_change_seq_changed(netdev);
>> + ovs_mutex_unlock(&dev->mutex);
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +netdev_netmap_get_etheraddr(const struct netdev *netdev, struct
>> eth_addr *mac)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> + *mac = dev->hwaddr;
>> + ovs_mutex_unlock(&dev->mutex);
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +netdev_netmap_update_flags(struct netdev *netdev,
>> + enum netdev_flags off, enum netdev_flags on,
>> + enum netdev_flags *old_flagsp)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> +
>> + if ((off | on) & ~(NETDEV_UP | NETDEV_PROMISC)) {
>> + return EINVAL;
>> + }
>> +
>> + *old_flagsp = dev->flags;
>> + dev->flags |= on;
>> + dev->flags &= ~off;
>> +
>> + ovs_mutex_unlock(&dev->mutex);
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +netdev_netmap_get_carrier(const struct netdev *netdev, bool *carrier)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> + *carrier = true;
>> + ovs_mutex_unlock(&dev->mutex);
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +netdev_netmap_get_stats(const struct netdev *netdev, struct netdev_stats
>> *stats)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> + stats->tx_packets = dev->stats.tx_packets;
>> + stats->tx_bytes = dev->stats.tx_bytes;
>> + stats->rx_packets = dev->stats.rx_packets;
>> + stats->rx_bytes = dev->stats.rx_bytes;
>> + ovs_mutex_unlock(&dev->mutex);
>> +
>> + return 0;
>> +}
>> +
>> +static int
>> +netdev_netmap_get_status(const struct netdev *netdev, struct smap *args)
>> +{
>> + struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>> +
>> + ovs_mutex_lock(&dev->mutex);
>> + smap_add_format(args, "mtu", "%d", dev->mtu);
>> + ovs_mutex_unlock(&dev->mutex);
>> +
>> + return 0;
>> +}
>> +
>> +#define NETDEV_NETMAP_CLASS(NAME, PMD, INIT, CONSTRUCT, DESTRUCT,
>> SET_CONFIG, \
>> + SET_TX_MULTIQ, SEND, SEND_WAIT, GET_CARRIER, GET_STATS,
>> GET_FEATURES, \
>> + GET_STATUS, RECONFIGURE, RXQ_RECV, RXQ_WAIT) \
>> +{ \
>> + NAME, \
>> + PMD, /* is_pmd */ \
>> + INIT, /* init */ \
>> + NULL, /* netdev_netmap_run */ \
>> + NULL, /* netdev_netmap_wait */ \
>> + netdev_netmap_alloc, \
>> + CONSTRUCT, \
>> + DESTRUCT, \
>> + netdev_netmap_dealloc, \
>> + netdev_netmap_get_config, \
>> + SET_CONFIG, \
>> + NULL, /* get_tunnel_config */ \
>> + NULL, /* build header */ \
>> + NULL, /* push header */ \
>> + NULL, /* pop header */ \
>> + NULL, /* get numa id */ \
>> + SET_TX_MULTIQ, /* tx multiq */ \
>> + SEND, /* send */ \
>> + SEND_WAIT, \
>> + netdev_netmap_set_etheraddr, \
>> + netdev_netmap_get_etheraddr, \
>> + netdev_netmap_get_mtu, \
>> + netdev_netmap_set_mtu, \
>> + netdev_netmap_get_ifindex, \
>> + GET_CARRIER, \
>> + NULL, /* get_carrier_resets */ \
>> + NULL, /* get_miimon */ \
>> + GET_STATS, \
>> + NULL, /* get_custom_stats */ \
>> + \
>> + NULL, /* get_features */ \
>> + NULL, /* set_advertisements */ \
>> + NULL, /* get_pt_mode */ \
>> + \
>> + NULL, /* set_policing */ \
>> + NULL, /* get_qos_types */ \
>> + NULL, /* get_qos_capabilities */ \
>> + NULL, /* get_qos */ \
>> + NULL, /* set_qos */ \
>> + NULL, /* get_queue */ \
>> + NULL, /* set_queue */ \
>> + NULL, /* delete_queue */ \
>> + NULL, /* get_queue_stats */ \
>> + NULL, /* queue_dump_start */ \
>> + NULL, /* queue_dump_next */ \
>> + NULL, /* queue_dump_done */ \
>> + NULL, /* dump_queue_stats */ \
>> + \
>> + NULL, /* set_in4 */ \
>> + NULL, /* get_addr_list */ \
>> + NULL, /* add_router */ \
>> + NULL, /* get_next_hop */ \
>> + GET_STATUS, \
>> + NULL, /* arp_lookup */ \
>> + \
>> + netdev_netmap_update_flags, \
>> + RECONFIGURE, \
>> + \
>> + netdev_netmap_rxq_alloc, \
>> + netdev_netmap_rxq_construct, \
>> + netdev_netmap_rxq_destruct, \
>> + netdev_netmap_rxq_dealloc, \
>> + RXQ_RECV, \
>> + RXQ_WAIT, \
>> + NULL, /* rxq_drain */ \
>> + NO_OFFLOAD_API \
>> +}
>> +
>> +static const struct netdev_class netmap_class =
>> + NETDEV_NETMAP_CLASS(
>> + "netmap",
>> + true,
>> + netdev_netmap_class_init,
>> + netdev_netmap_construct,
>> + netdev_netmap_destruct,
>> + netdev_netmap_set_config,
>> + NULL,
>> + netdev_netmap_send,
>> + NULL,
>> + netdev_netmap_get_carrier,
>> + netdev_netmap_get_stats,
>> + NULL,
>> + netdev_netmap_get_status,
>> + netdev_netmap_reconfigure,
>> + netdev_netmap_rxq_recv,
>> + NULL);
>> +
>> +void
>> +netdev_netmap_register(void)
>> +{
>> + netdev_register_provider(&netmap_class);
>> +}
>> diff --git a/lib/netdev-netmap.h b/lib/netdev-netmap.h
>> new file mode 100644
>> index 000000000..49fe8c319
>> --- /dev/null
>> +++ b/lib/netdev-netmap.h
>> @@ -0,0 +1,13 @@
>> +#ifndef NETDEV_NETMAP_H
>> +#define NETDEV_NETMAP_H
>> +
>> +struct netdev_rxq;
>> +struct smap;
>> +struct dp_packet;
>> +
>> +void netmap_init_port(struct netdev_rxq *);
>> +void netmap_init_config(const struct smap *);
>> +void netmap_free_packet(struct dp_packet *);
>> +void netdev_netmap_register(void);
>> +
>> +#endif /* netdev-netmap.h */
>> diff --git a/lib/netmap-stub.c b/lib/netmap-stub.c
>> new file mode 100644
>> index 000000000..62f7a06b8
>> --- /dev/null
>> +++ b/lib/netmap-stub.c
>> @@ -0,0 +1,21 @@
>> +#include <config.h>
>> +#include "netmap.h"
>> +
>> +#include "smap.h"
>> +#include "ovs-thread.h"
>> +#include "openvswitch/vlog.h"
>> +
>> +VLOG_DEFINE_THIS_MODULE(netmap);
>> +
>> +void
>> +netmap_init(const struct smap *ovs_other_config)
>> +{
>> + if (smap_get_bool(ovs_other_config, "netmap-init", false)) {
>> + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>> +
>> + if (ovsthread_once_start(&once)) {
>> + VLOG_ERR("NETMAP not supported in this copy of Open
>> vSwitch.");
>> + ovsthread_once_done(&once);
>> + }
>> + }
>> +}
>> diff --git a/lib/netmap.c b/lib/netmap.c
>> new file mode 100644
>> index 000000000..b4147e0ad
>> --- /dev/null
>> +++ b/lib/netmap.c
>> @@ -0,0 +1,76 @@
>> +#include <config.h>
>> +
>> +#include <fcntl.h>
>> +#include <pthread.h>
>> +#include <stdio.h>
>> +#include <sys/time.h> /* timersub */
>> +#include <stdlib.h>
>> +#include <string.h>
>> +#include <stdint.h>
>> +#include <unistd.h> /* read() */
>> +
>> +#include "dirs.h"
>> +#include "netdev-netmap.h"
>> +#include "netmap.h"
>> +#include "openvswitch/vlog.h"
>> +#include "smap.h"
>> +
>> +VLOG_DEFINE_THIS_MODULE(netmap);
>> +
>> +/* initialize to avoid a division by 0 */
>> +uint64_t netmap_ticks_per_second = 1000000000; /* set by calibrate_tsc */
>> +
>> +/*
>> + * do an idle loop to compute the clock speed. We expect
>> + * a constant TSC rate and locked on all CPUs.
>> + * Returns ticks per second
>> + */
>> +static uint64_t
>> +netmap_calibrate_tsc(void)
>> +{
>> + struct timeval a, b;
>> + uint64_t ta_0, ta_1, tb_0, tb_1, dmax = ~0;
>> + uint64_t da, db, cy = 0;
>> + int i;
>> + for (i=0; i < 3; i++) {
>> + ta_0 = netmap_rdtsc();
>> + gettimeofday(&a, NULL);
>> + ta_1 = netmap_rdtsc();
>> + usleep(20000);
>> + tb_0 = netmap_rdtsc();
>> + gettimeofday(&b, NULL);
>> + tb_1 = netmap_rdtsc();
>> + da = ta_1 - ta_0;
>> + db = tb_1 - tb_0;
>> + if (da + db < dmax) {
>> + cy = (b.tv_sec - a.tv_sec)*1000000 + b.tv_usec - a.tv_usec;
>> + cy = (double)(tb_0 - ta_1)*1000000/(double)cy;
>> + dmax = da + db;
>> + }
>> + }
>> + netmap_ticks_per_second = cy;
>> + return cy;
>> +}
>> +
>> +void
>> +netmap_init(const struct smap *ovs_other_config)
>> +{
>> + static bool enabled = false;
>> +
>> + if (enabled || !ovs_other_config) {
>> + return;
>> + }
>> +
>> + if (smap_get_bool(ovs_other_config, "netmap-init", false)) {
>> + static struct ovsthread_once once_enable =
>> OVSTHREAD_ONCE_INITIALIZER;
>> + if (ovsthread_once_start(&once_enable)) {
>> + netmap_calibrate_tsc();
>> + netmap_init_config(ovs_other_config);
>> + netdev_netmap_register();
>> + enabled = true;
>> + ovsthread_once_done(&once_enable);
>> + VLOG_INFO("NETMAP Enabled");
>> + }
>> + } else
>> + VLOG_INFO_ONCE("NETMAP Disabled - Use other_config:netmap-init
>> to enable");
>> +}
>> diff --git a/lib/netmap.h b/lib/netmap.h
>> new file mode 100644
>> index 000000000..34ff7b7a2
>> --- /dev/null
>> +++ b/lib/netmap.h
>> @@ -0,0 +1,27 @@
>> +#ifndef NETMAP_H
>> +#define NETMAP_H
>> +
>> +#include <stdint.h>
>> +
>> +extern uint64_t netmap_ticks_per_second;
>> +#define US2TSC(x) ((x)*netmap_ticks_per_second/1000000UL)
>> +#define TSC2US(x) ((x)*1000000UL/netmap_ticks_per_second)
>> +
>> +#if 0 /* gcc intrinsic */
>> +#include <x86intrin.h>
>> +#define rdtsc __rdtsc
>> +#else
>> +static inline uint64_t
>> +netmap_rdtsc(void)
>> +{
>> + uint32_t hi, lo;
>> + __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
>> + return (uint64_t)lo | ((uint64_t)hi << 32);
>> +}
>> +#endif
>> +
>> +struct smap;
>> +
>> +void netmap_init(const struct smap *ovs_other_config);
>> +
>> +#endif /* netmap.h */
>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>> index d90997e3a..2dfcbb7f6 100644
>> --- a/vswitchd/bridge.c
>> +++ b/vswitchd/bridge.c
>> @@ -38,6 +38,7 @@
>> #include "mac-learning.h"
>> #include "mcast-snooping.h"
>> #include "netdev.h"
>> +#include "netmap.h"
>> #include "nx-match.h"
>> #include "ofproto/bond.h"
>> #include "ofproto/ofproto.h"
>> @@ -2977,6 +2978,7 @@ bridge_run(void)
>> if (cfg) {
>> netdev_set_flow_api_enabled(&cfg->other_config);
>> dpdk_init(&cfg->other_config);
>> + netmap_init(&cfg->other_config);
>> }
>>
>> /* Initialize the ofproto library. This only needs to run once, but
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index f899a1976..f6dd6e7b6 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -217,6 +217,46 @@
>> </p>
>> </column>
>>
>> + <column name="other_config" key="netmap-init"
>> + type='{"type": "boolean"}'>
>> + <p>
>> + Set this value to <code>true</code> to enable runtime support
>> for
>> + NETMAP ports. The vswitch must have compile-time support for
>> NETMAP as
>> + well.
>> + </p>
>> + <p>
>> + The default value is <code>false</code>. Changing this value
>> requires
>> + restarting the daemon
>> + </p>
>> + <p>
>> + If this value is <code>false</code> at startup, any netmap
>> ports which
>> + are configured in the bridge will fail.
>> + </p>
>> + </column>
>> +
>> + <column name="other_config" key="netmap-nextrabufs"
>> + type='{"type": "integer", "minInteger": 32}'>
>> + <p>
>> + Specifies the number of extra buffers to be requested to
>> netmap
>> + when opening each netmap port.
>> + </p>
>> + <p>
>> + Each packet received or transmitted by OVS from/to a netmap
>> port
>> + needs an extra buffer. The OVS netmap runtime needs at least
>> a
>> + batch worth of extra buffers (32 packets) for each port to
>> function
>> + properly. More extra buffers may be necessary if OVS
>> temporarily
>> + stores netmap buffers within its internal queues.
>> + </p>
>> + </column>
>> +
>> + <column name="other_config" key="rxsync-intval"
>> + type='{"type": "integer", "minInteger": 0}'>
>> + <p>
>> + Specifies the minimum time (in microseconds) between two
>> + consecutive rxsync calls issued on a netmap port.
>> + </p>
>> + </column>
>> +
>> <column name="other_config" key="dpdk-init"
>> type='{"type": "boolean"}'>
>> <p>
>>
>>
>> 2018-03-20 15:07 GMT+01:00 Alessandro Rosetti <
>> alessandro.rosetti at gmail.com>:
>>
>>> Hi Darrell,
>>>
>>> I'm developing netmap support for my thesis and I hope it will make it
>>> for OVS 2.10.
>>> In the next days I'm going to post the first prototype patch that is
>>> almost ready
>>>
>>> Thanks to you,
>>> Alessandro
>>>
>>> On 19 Mar 2018 9:26 pm, "Darrell Ball" <dlu998 at gmail.com> wrote:
>>>
>>>> Hi Alessandro
>>>>
>>>> I also think this would be interesting.
>>>> Is netmap integration being actively being worked on for OVS 2.10 ?
>>>>
>>>> Thanks Darrell
>>>>
>>>> On Wed, Feb 7, 2018 at 9:19 AM, Ilya Maximets <i.maximets at samsung.com>
>>>> wrote:
>>>>
>>>>> > Hi,
>>>>>
>>>>> Hi, Alessandro.
>>>>>
>>>>> >
>>>>> > My name is Alessandro Rosetti, and I'm currently adding netmap
>>>>> support to
>>>>> > ovs, following an approach similar to DPDK.
>>>>>
>>>>> Good to know that someone started to work on this. IMHO, it's a good
>>>>> idea.
>>>>> I also wanted to try to implement this someday, but had no much time.
>>>>>
>>>>> >
>>>>> > I've created a new netdev: netdev_netmap that uses the pmd
>>>>> infrastructure.
>>>>> > The prototype I have seems to work fine (I still need to tune
>>>>> performance,
>>>>> > test optional features, and test more complex topologies.)
>>>>>
>>>>> Cool. Looking forward for your RFC patch-set.
>>>>>
>>>>> >
>>>>> > I have a question about the lifetime of dp_packets.
>>>>> > Is there any guarantee that the dp_packets allocated in a receive
>>>>> callback
>>>>> > (e.g. netdev_netmap_rxq_recv) are consumed by OVS (e.g. dropped,
>>>>> cloned, or
>>>>> > sent to other ports) **before** a subsequent call to the receive
>>>>> callback
>>>>> > (on the same port)?
>>>>> > Or is it possible for dp_packets to be stored somewhere (e.g. in an
>>>>> OVS
>>>>> > internal queue) and live across subsequent invocations of the receive
>>>>> > callback that allocated them?
>>>>>
>>>>> I think that there was never such a guarantee, but recent changes in
>>>>> userspace
>>>>> datapath completely ruined this assumption. I mean output packet
>>>>> batching support.
>>>>>
>>>>> Please refer the following commits for details:
>>>>> 009e003 2017-12-14 | dpif-netdev: Output packet batching.
>>>>> c71ea3c 2018-01-15 | dpif-netdev: Time based output batching.
>>>>> 00adb8d 2018-01-15 | docs: Describe output packet batching in DPDK
>>>>> guide.
>>>>>
>>>>> >
>>>>> > I need to know if this is the case to check that my current
>>>>> prototype is
>>>>> > safe.
>>>>> > I use per-port pre-allocation of dp_packets, for maximum
>>>>> performance. I've
>>>>> > seen that DPDK uses its internal allocator to allocate and deallocate
>>>>> > dp_packets, but netmap does not expose one.
>>>>> > Each packet received with netmap is created as a new type dp_packet:
>>>>> > DPBUF_NETMAP. The data points to a netmap buffer (preallocated by the
>>>>> > kernel).
>>>>> > When I receive data (netdev_netmap_rxq_recv) I reuse the dp_packets,
>>>>> > updating the internal pointer and a couple of additional informations
>>>>> > stored inside the dp_packet.
>>>>> > When I have to send data I use zero copy if dp_packet is
>>>>> DPBUF_NETMAP and
>>>>> > copy if it's not.
>>>>> >
>>>>> > Thanks for the help!
>>>>> > Alessandro.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> dev mailing list
>>>>> dev at openvswitch.org
>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>
>>>>
>>>>
>>
>
More information about the dev
mailing list