[ovs-dev] question about dp_packet lifetime

Alessandro Rosetti alessandro.rosetti at gmail.com
Wed Mar 28 17:27:09 UTC 2018


Thank you Darrell and Ilya!

yes, I thought about wrapping pthread_spin_ using a generic API.

I'll review my patch with your tips and I'll send it again correctly,
thanks again!

Alessandro

2018-03-28 18:36 GMT+02:00 Darrell Ball <dlu998 at gmail.com>:

> I hit send too quick Alessandro; one clarification inline
>
> On Wed, Mar 28, 2018 at 9:13 AM, Darrell Ball <dlu998 at gmail.com> wrote:
>
>> Another aspect (besides what Ilya mentioned) you might want to check is
>> to look at OVS patchwork for your patches,
>> after you submit, and check that they are there, firstly.
>> Also check that they look like other accepted patches overall and for
>> chunks of similar code constructs.
>>
>> https://patchwork.ozlabs.org/project/openvswitch/list/
>>
>> Check that your patches can be applied on top of an updated master branch
>> of OVS.
>>
>> I did a quick pass over the raw diff and noticed that in many cases you
>> are already using lots of OVS apis which good.
>>
>> A few pointers:
>> 1/ Try to use inline functions as much as possible, instead of macros
>> 2/ Think about portability - Don't use direct calls to pthread_ apis for
>> example
>>
>
> I am specifically referring to the locking apis, like pthread_spin_
>
> 3/ Create wrappers for new locks that use generic OVS lock apis
>> 4/ Clearly describe any build dependencies, if any, in the install guide
>> documentation.
>> 5/ Think about portability for parts of the code and look how that is
>> handled in other cases.
>> 6/ I think it would be helpful for you to describe one or more use cases
>> for netmap, for the general user.
>> 7/ Think about testing and see what we can do to automate - we have
>> system tests that run with
>>     make check-kmod and make check-system-userspace
>>     Existing files are  tests/system-traffic.at and tests/system-ovn.at,
>> which is shared for Linux and userspace datapath
>> 8/ You might want to describe some tests results, including performance
>> numbers in the cover letter.
>>
>> Cheers Darrell
>>
>>
>> On Wed, Mar 28, 2018 at 1:50 AM, Alessandro Rosetti <
>> alessandro.rosetti at gmail.com> wrote:
>>
>>> Hi Darrell, Ilya and everyone else,
>>>
>>> I'm contacting you since you were interested.
>>> I've posted the patch that implements netmap in OVS attaching the file
>>> in the mail, did I do it wrong?
>>> https://mail.openvswitch.org/pipermail/ovs-dev/2018-March/345371.html
>>>
>>> I'm posting it inline now,
>>> sorry for the mess!
>>>
>>> Alessandro.
>>>
>>> ----------------------------------------------------------------------
>>>
>>> diff --git a/acinclude.m4 b/acinclude.m4
>>> index d61e37a5e..d9dd9fbd1 100644
>>> --- a/acinclude.m4
>>> +++ b/acinclude.m4
>>> @@ -341,6 +341,36 @@ AC_DEFUN([OVS_CHECK_DPDK], [
>>>    AM_CONDITIONAL([DPDK_NETDEV], test "$DPDKLIB_FOUND" = true)
>>>  ])
>>>
>>> +dnl OVS_CHECK_NETMAP
>>> +dnl
>>> +dnl Check netmap
>>> +AC_DEFUN([OVS_CHECK_NETMAP], [
>>> +  AC_ARG_WITH([netmap],
>>> +              [AC_HELP_STRING([--with-netmap], [Enable NETMAP])],
>>> +              [have_netmap=true])
>>> +  AC_MSG_CHECKING([whether netmap datapath is enabled])
>>> +
>>> +  if test "$have_netmap" != true || test "$with_netmap" = no; then
>>> +    AC_MSG_RESULT([no])
>>> +  else
>>> +    AC_MSG_RESULT([yes])
>>> +    NETMAP_FOUND=false
>>> +    AC_LINK_IFELSE(
>>> +       [AC_LANG_PROGRAM([#include <net/if.h>
>>> +                         #include<netinet/in.h>
>>> +                         #include<net/netmap.h>
>>> +                         #include<net/netmap_user.h>], [])],
>>> +                        [NETMAP_FOUND=true])
>>> +    if $NETMAP_FOUND; then
>>> +        AC_DEFINE([NETMAP_NETDEV], [1], [NETMAP datapath is enabled.])
>>> +    else
>>> +        AC_MSG_ERROR([Could not find NETMAP headers])
>>> +    fi
>>> +  fi
>>> +
>>> +  AM_CONDITIONAL([NETMAP_NETDEV], test "$NETMAP_FOUND" = true)
>>> +])
>>> +
>>>  dnl OVS_GREP_IFELSE(FILE, REGEX, [IF-MATCH], [IF-NO-MATCH])
>>>  dnl
>>>  dnl Greps FILE for REGEX.  If it matches, runs IF-MATCH, otherwise
>>> IF-NO-MATCH.
>>> @@ -900,7 +930,7 @@ dnl with or without modifications, as long as this
>>> notice is preserved.
>>>
>>>  AC_DEFUN([_OVS_CHECK_CC_OPTION], [dnl
>>>    m4_define([ovs_cv_name], [ovs_cv_[]m4_translit([$1], [-= ], [__])])dnl
>>> -  AC_CACHE_CHECK([whether $CC accepts $1], [ovs_cv_name],
>>> +  AC_CACHE_CHECK([whether $CC accepts $1], [ovs_cv_name],
>>>      [ovs_save_CFLAGS="$CFLAGS"
>>>       dnl Include -Werror in the compiler options, because without
>>> -Werror
>>>       dnl clang's GCC-compatible compiler driver does not return a
>>> failure
>>> @@ -951,7 +981,7 @@ dnl OVS_ENABLE_OPTION([OPTION])
>>>  dnl Check whether the given C compiler OPTION is accepted.
>>>  dnl If so, add it to WARNING_FLAGS.
>>>  dnl Example: OVS_ENABLE_OPTION([-Wdeclaration-after-statement])
>>> -AC_DEFUN([OVS_ENABLE_OPTION],
>>> +AC_DEFUN([OVS_ENABLE_OPTION],
>>>    [OVS_CHECK_CC_OPTION([$1], [WARNING_FLAGS="$WARNING_FLAGS $1"])
>>>     AC_SUBST([WARNING_FLAGS])])
>>>
>>> diff --git a/configure.ac b/configure.ac
>>> index 9940a1a45..24cd4718c 100644
>>> --- a/configure.ac
>>> +++ b/configure.ac
>>> @@ -180,6 +180,7 @@ AC_SUBST(KARCH)
>>>  OVS_CHECK_LINUX
>>>  OVS_CHECK_LINUX_TC
>>>  OVS_CHECK_DPDK
>>> +OVS_CHECK_NETMAP
>>>  OVS_CHECK_PRAGMA_MESSAGE
>>>  AC_SUBST([OVS_CFLAGS])
>>>  AC_SUBST([OVS_LDFLAGS])
>>> diff --git a/lib/automake.mk b/lib/automake.mk
>>> index 5c26e0f33..4ccd9e22a 100644
>>> --- a/lib/automake.mk
>>> +++ b/lib/automake.mk
>>> @@ -134,12 +134,14 @@ lib_libopenvswitch_la_SOURCES = \
>>>   lib/namemap.c \
>>>   lib/netdev-dpdk.h \
>>>   lib/netdev-dummy.c \
>>> + lib/netdev-netmap.h \
>>>   lib/netdev-provider.h \
>>>   lib/netdev-vport.c \
>>>   lib/netdev-vport.h \
>>>   lib/netdev-vport-private.h \
>>>   lib/netdev.c \
>>>   lib/netdev.h \
>>> + lib/netmap.h \
>>>   lib/netflow.h \
>>>   lib/netlink.c \
>>>   lib/netlink.h \
>>> @@ -403,6 +405,15 @@ lib_libopenvswitch_la_SOURCES += \
>>>   lib/dpdk-stub.c
>>>  endif
>>>
>>> +if NETMAP_NETDEV
>>> +lib_libopenvswitch_la_SOURCES += \
>>> + lib/netmap.c \
>>> + lib/netdev-netmap.c
>>> +else
>>> +lib_libopenvswitch_la_SOURCES += \
>>> + lib/netmap-stub.c
>>> +endif
>>> +
>>>  if WIN32
>>>  lib_libopenvswitch_la_SOURCES += \
>>>   lib/dpif-netlink.c \
>>> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>>> index 443c22504..e917e6d6a 100644
>>> --- a/lib/dp-packet.c
>>> +++ b/lib/dp-packet.c
>>> @@ -92,6 +92,7 @@ dp_packet_use_const(struct dp_packet *b, const void
>>> *data, size_t size)
>>>      dp_packet_set_size(b, size);
>>>  }
>>>
>>> +
>>>  /* Initializes 'b' as an empty dp_packet that contains the 'allocated'
>>> bytes.
>>>   * DPDK allocated dp_packet and *data is allocated from one continous
>>> memory
>>>   * region as part of memory pool, so in memory data start right after
>>> @@ -105,6 +106,19 @@ dp_packet_init_dpdk(struct dp_packet *b, size_t
>>> allocated)
>>>      b->source = DPBUF_DPDK;
>>>  }
>>>
>>> +/* Initializes 'b' as a dp_packet whose data points to a netmap buffer
>>> of size
>>> + * 'size' bytes. */
>>> +#ifdef NETMAP_NETDEV
>>> +void
>>> +dp_packet_init_netmap(struct dp_packet *b, void *data, size_t size)
>>> +{
>>> +    b->source = DPBUF_NETMAP;
>>> +    dp_packet_set_base(b, data);
>>> +    dp_packet_set_data(b, data);
>>> +    dp_packet_set_size(b, size);
>>> +}
>>> +#endif
>>> +
>>>  /* Initializes 'b' as an empty dp_packet with an initial capacity of
>>> 'size'
>>>   * bytes. */
>>>  void
>>> @@ -125,6 +139,11 @@ dp_packet_uninit(struct dp_packet *b)
>>>              /* If this dp_packet was allocated by DPDK it must have been
>>>               * created as a dp_packet */
>>>              free_dpdk_buf((struct dp_packet*) b);
>>> +#endif
>>> +        } else if (b->source == DPBUF_NETMAP) {
>>> +#ifdef NETMAP_NETDEV
>>> +            /* If this dp_packet was allocated by NETMAP, release it. */
>>> +            netmap_free_packet(b);
>>>  #endif
>>>          }
>>>      }
>>> @@ -241,6 +260,9 @@ dp_packet_resize__(struct dp_packet *b, size_t
>>> new_headroom, size_t new_tailroom
>>>      case DPBUF_DPDK:
>>>          OVS_NOT_REACHED();
>>>
>>> +    case DPBUF_NETMAP:
>>> +        OVS_NOT_REACHED();
>>> +
>>>      case DPBUF_MALLOC:
>>>          if (new_headroom == dp_packet_headroom(b)) {
>>>              new_base = xrealloc(dp_packet_base(b), new_allocated);
>>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>>> index 21c8ca525..bd7832533 100644
>>> --- a/lib/dp-packet.h
>>> +++ b/lib/dp-packet.h
>>> @@ -26,6 +26,7 @@
>>>  #endif
>>>
>>>  #include "netdev-dpdk.h"
>>> +#include "netdev-netmap.h"
>>>  #include "openvswitch/list.h"
>>>  #include "packets.h"
>>>  #include "util.h"
>>> @@ -42,6 +43,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
>>>      DPBUF_DPDK,                /* buffer data is from DPDK allocated
>>> memory.
>>>                                  * ref to dp_packet_init_dpdk() in
>>> dp-packet.c.
>>>                                  */
>>> +    DPBUF_NETMAP,              /* Buffers are from netmap allocated
>>> memory. */
>>>  };
>>>
>>>  #define DP_PACKET_CONTEXT_SIZE 64
>>> @@ -60,6 +62,9 @@ struct dp_packet {
>>>      uint32_t size_;             /* Number of bytes in use. */
>>>      uint32_t rss_hash;          /* Packet hash. */
>>>      bool rss_hash_valid;        /* Is the 'rss_hash' valid? */
>>> +#endif
>>> +#ifdef NETMAP_NETDEV
>>> +    uint32_t buf_idx;             /* Netmap slot index. */
>>>  #endif
>>>      enum dp_packet_source source;  /* Source of memory allocated as
>>> 'base'. */
>>>
>>> @@ -115,6 +120,7 @@ void dp_packet_use_stub(struct dp_packet *, void *,
>>> size_t);
>>>  void dp_packet_use_const(struct dp_packet *, const void *, size_t);
>>>
>>>  void dp_packet_init_dpdk(struct dp_packet *, size_t allocated);
>>> +void dp_packet_init_netmap(struct dp_packet *, void *, size_t);
>>>
>>>  void dp_packet_init(struct dp_packet *, size_t);
>>>  void dp_packet_uninit(struct dp_packet *);
>>> @@ -173,6 +179,13 @@ dp_packet_delete(struct dp_packet *b)
>>>               * created as a dp_packet */
>>>              free_dpdk_buf((struct dp_packet*) b);
>>>              return;
>>> +        } else if (b->source == DPBUF_NETMAP) {
>>> +            /* It was allocated by a netdev_netmap, it will be marked
>>> +             * for reuse. */
>>> +#ifdef NETMAP_NETDEV
>>> +            netmap_free_packet(b);
>>> +#endif
>>> +            return;
>>>          }
>>>
>>>          dp_packet_uninit(b);
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index b07fc6b8b..af81c992b 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -4119,11 +4119,14 @@ reload:
>>>
>>>      /* List port/core affinity */
>>>      for (i = 0; i < poll_cnt; i++) {
>>> -       VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
>>> -                pmd->core_id, netdev_rxq_get_name(poll_list[
>>> i].rxq->rx),
>>> -                netdev_rxq_get_queue_id(poll_list[i].rxq->rx));
>>> -       /* Reset the rxq current cycles counter. */
>>> -       dp_netdev_rxq_set_cycles(poll_list[i].rxq,
>>> RXQ_CYCLES_PROC_CURR, 0);
>>> +        VLOG_DBG("Core %d processing port \'%s\' with queue-id %d\n",
>>> +                 pmd->core_id, netdev_rxq_get_name(poll_list[
>>> i].rxq->rx),
>>> +                 netdev_rxq_get_queue_id(poll_list[i].rxq->rx));
>>> +        /* Reset the rxq current cycles counter. */
>>> +        dp_netdev_rxq_set_cycles(poll_list[i].rxq,
>>> RXQ_CYCLES_PROC_CURR, 0);
>>> +#ifdef NETMAP_NETDEV
>>> +        netmap_init_port(poll_list[i].rxq->rx);
>>> +#endif
>>>      }
>>>
>>>      if (!poll_cnt) {
>>> diff --git a/lib/netdev-netmap.c b/lib/netdev-netmap.c
>>> new file mode 100644
>>> index 000000000..87b292895
>>> --- /dev/null
>>> +++ b/lib/netdev-netmap.c
>>> @@ -0,0 +1,1014 @@
>>> +#include <config.h>
>>> +
>>> +#include <errno.h>
>>> +#include <math.h>
>>> +#include <net/if.h>
>>> +#include <netinet/in.h>
>>> +#include <net/netmap.h>
>>> +#define NETMAP_WITH_LIBS
>>> +#include <net/netmap_user.h>
>>> +#include <sys/ioctl.h>
>>> +#include <sys/syscall.h>
>>> +
>>> +#include "dpif.h"
>>> +#include "netdev.h"
>>> +#include "netdev-provider.h"
>>> +#include "netmap.h"
>>> +#include "netdev-netmap.h"
>>> +#include "openvswitch/list.h"
>>> +#include "openvswitch/poll-loop.h"
>>> +#include "openvswitch/vlog.h"
>>> +#include "ovs-thread.h"
>>> +#include "packets.h"
>>> +#include "smap.h"
>>> +
>>> +#define DP_BLOCK_SIZE NETDEV_MAX_BURST * 2
>>> +#define DEFAULT_RSYNC_INTVAL 5
>>> +
>>> +VLOG_DEFINE_THIS_MODULE(netdev_netmap);
>>> +
>>> +static struct vlog_rate_limit rl OVS_UNUSED = VLOG_RATE_LIMIT_INIT(5,
>>> 100);
>>> +
>>> +struct netdev_netmap {
>>> +    struct netdev up;
>>> +    struct nm_desc *nmd;
>>> +
>>> +    uint64_t timestamp;
>>> +    uint32_t rxsync_intval;
>>> +
>>> +    struct ovs_list list_node;
>>> +    long tid;
>>> +    struct nm_alloc *nma;
>>> +
>>> +    struct ovs_mutex mutex OVS_ACQ_AFTER(netmap_mutex);
>>> +    pthread_spinlock_t tx_lock;
>>> +
>>> +    struct netdev_stats stats;
>>> +    struct eth_addr hwaddr;
>>> +    enum netdev_flags flags;
>>> +
>>> +    int mtu;
>>> +    int requested_mtu;
>>> +};
>>> +
>>> +struct netdev_rxq_netmap {
>>> +    struct netdev_rxq up;
>>> +};
>>> +
>>> +static void netdev_netmap_destruct(struct netdev *netdev);
>>> +
>>> +static bool
>>> +is_netmap_class(const struct netdev_class *class)
>>> +{
>>> +    return class->destruct == netdev_netmap_destruct;
>>> +}
>>> +
>>> +static struct netdev_netmap *
>>> +netdev_netmap_cast(const struct netdev *netdev)
>>> +{
>>> +    ovs_assert(is_netmap_class(netdev_get_class(netdev)));
>>> +    return CONTAINER_OF(netdev, struct netdev_netmap, up);
>>> +}
>>> +
>>> +static struct netdev_rxq_netmap *
>>> +netdev_rxq_netmap_cast(const struct netdev_rxq *rx)
>>> +{
>>> +    ovs_assert(is_netmap_class(netdev_get_class(rx->netdev)));
>>> +    return CONTAINER_OF(rx, struct netdev_rxq_netmap, up);
>>> +}
>>> +
>>> +static struct ovs_mutex netmap_mutex = OVS_MUTEX_INITIALIZER;
>>> +
>>> +/* Blocks are used to store DP_BLOCK_SIZE preallocated netmap
>>> dp_packets.
>>> + * During receive operation, dp_packets are allocated by moving them
>>> from a
>>> + * block to a dp_batch. A block is refilled when packets are freed.
>>> + * Each netmap dp_packet has source type set to DPBUF_NETMAP, with
>>> buf_idx
>>> + * identifying a netmap buffer. Packets in the blocks (or in flight
>>> within OVS)
>>> + * are not attached to any netmap ring, i.e. their buf_idx is not
>>> stored in
>>> + * any netmap slot. On receive or transmit, the netmap buffer owned by a
>>> + * dp_packet is swapped with one attached to a receive/transmit ring
>>> slot,
>>> + * by simply swapping the buf_idx values. */
>>> +struct nm_block {
>>> +    struct ovs_list node;                     /* Blocks can be chained
>>> +                                               * in a list. */
>>> +    struct dp_packet* packets[DP_BLOCK_SIZE]; /* Array of dp_packets. */
>>> +    uint16_t idx;                             /* Array index of the
>>> current
>>> +                                               * packet. */
>>> +};
>>> +
>>> +enum nm_block_type {
>>> +    NM_BLOCK_TYPE_PUT = 0,
>>> +    NM_BLOCK_TYPE_GET = 1,
>>> +};
>>> +
>>> +/* Global data structures of the netmap dp_packet allocator. */
>>> +static struct nm_runtime {
>>> +    struct ovs_list port_list;     /* List of all netmap netdevs. */
>>> +    struct ovs_list block_list[2]; /* Lists for dp_packet blocks: one
>>> for
>>> +                                    * empty and one for full ones. */
>>> +    void *mem;
>>> +    uint16_t memid;
>>> +    uint32_t memsize;
>>> +    uint32_t nextrabufs;
>>> +} nmr = { 0 };
>>> +
>>> +/* Each thread uses a pair of blocks for allocations and deallocations.
>>> */
>>> +struct nm_alloc {
>>> +    struct nm_block *block[2];  /* Blocks used by TX/RX to
>>> allocate/dealloacte
>>> +                                 * dp_packets. */
>>> +};
>>> +
>>> +/* Thread local allocators for packet allocations/dellocations */
>>> +DEFINE_STATIC_PER_THREAD_DATA(struct nm_alloc, nma, { 0 });
>>> +#define NMA nma_get()
>>> +#define PUTB nma_get()->block[NM_BLOCK_TYPE_PUT]
>>> +#define GETB nma_get()->block[NM_BLOCK_TYPE_GET]
>>> +
>>> +/* Creates a new block.
>>> + * The block can be empty or initialized with new dp_packets associated
>>> to
>>> + * netmap buffers not attached to a netmap ring. */
>>> +static struct nm_block*
>>> +nm_block_new(struct nm_desc *nmd) {
>>> +    struct nm_block *block;
>>> +
>>> +    block = xmalloc(sizeof(struct nm_block));
>>> +    block->idx = 0;
>>> +    ovs_list_init(&block->node);
>>> +
>>> +    if (nmd) {
>>> +        struct dp_packet *packet;
>>> +        struct netmap_ring *ring = NETMAP_RXRING(nmd->nifp, 0);
>>> +        uint32_t idx = nmd->nifp->ni_bufs_head;
>>> +
>>> +        for (int i = 0; idx && i < DP_BLOCK_SIZE;
>>> +            i++, idx = *(uint32_t *)NETMAP_BUF(ring, idx)) {
>>> +            packet = dp_packet_new(0);
>>> +            packet->buf_idx = idx;
>>> +            packet->source = DPBUF_NETMAP;
>>> +            block->packets[block->idx++] = packet;
>>> +        }
>>> +
>>> +        nmd->nifp->ni_bufs_head = idx;
>>> +    }
>>> +
>>> +    return block;
>>> +}
>>> +
>>> +/* Swaps blocks from nm_runtime in order to replace the current block
>>> with
>>> + * an empty or full block.
>>> + * if we want GETB to be swapped with a block filled with dp_packets we
>>> will
>>> + * speciry NM_BLOCK_TYPE_GET.
>>> + * if we want PUTB to be swapped with a block filled with dp_packets we
>>> will
>>> + * speciry NM_BLOCK_TYPE_PUT. */
>>> +static void
>>> +nm_block_swap_global(enum nm_block_type type) {
>>> +    struct nm_block **bselect = NULL;
>>> +    struct nm_block *bswap = NULL, *btmp;
>>> +
>>> +    ovs_mutex_lock(&netmap_mutex);
>>> +
>>> +    bselect = &(NMA->block[type]);
>>> +
>>> +    /* Try to pop a block form the correct list */
>>> +    if (!ovs_list_is_empty(&nmr.block_list[type])) {
>>> +        bswap = CONTAINER_OF(ovs_list_pop_front(&nmr.block_list[type]),
>>> +                        struct nm_block, node);
>>> +    } else {
>>> +        bswap = nm_block_new(NULL);
>>> +    }
>>> +
>>> +    /* Swap blocks. */
>>> +    if (OVS_LIKELY(bswap)) {
>>> +        btmp = *bselect;
>>> +        *bselect = bswap;
>>> +        /* If the current block is empty it will be pushed to the empty
>>> list
>>> +         * and viceversa if it not empty. */
>>> +        type = btmp->idx ? NM_BLOCK_TYPE_GET : NM_BLOCK_TYPE_PUT;
>>> +        ovs_list_push_back(&nmr.block_list[type], &btmp->node);
>>> +    }
>>> +
>>> +    ovs_mutex_unlock(&netmap_mutex);
>>> +}
>>> +
>>> +/* Swap the two blocks of the local allocator. */
>>> +static void
>>> +nm_block_swap_local(void) {
>>> +    struct nm_block* block = GETB;
>>> +    GETB = PUTB;
>>> +    PUTB = block;
>>> +}
>>> +
>>> +/* Frees a block from memory.
>>> + * If nmd is specified we will return extra buffers to this
>>> + * nm_desc if the block contains any dp_packet. */
>>> +static void
>>> +nm_block_free(struct nm_block* b, struct nm_desc *nmd) {
>>> +    if (b) {
>>> +        if (nmd) {
>>> +            struct netmap_ring *ring = NETMAP_RXRING(nmd->nifp, 0);
>>> +
>>> +            for (int i = 0; i < b->idx; i++) {
>>> +                struct dp_packet *packet = b->packets[i];
>>> +                if (packet) {
>>> +                    uint32_t *e = (uint32_t *) NETMAP_BUF(ring,
>>> packet->buf_idx);
>>> +                    *e = nmd->nifp->ni_bufs_head;
>>> +                    nmd->nifp->ni_bufs_head = packet->buf_idx;
>>> +                    free(packet);
>>> +                }
>>> +            }
>>> +        }
>>> +
>>> +        free(b);
>>> +    }
>>> +}
>>> +
>>> +/* Set up the port by checking if any other port has already been
>>> opened.
>>> + * Prepare blocks of dp_packets. */
>>> +static int
>>> +netmap_setup_port(struct nm_desc *nmd) {
>>> +    ovs_mutex_lock(&netmap_mutex);
>>> +
>>> +    if (ovs_list_size(&nmr.port_list)) {
>>> +        /* Netmap memory has already been set up, check if the new port
>>> uses
>>> +         * the same memid */
>>> +        if (nmr.memid != nmd->req.nr_arg2) {
>>> +            VLOG_WARN("unable to add this port, it has a new mem_id
>>> (%x->%x)",
>>> +                    nmr.memid, nmd->req.nr_arg2);
>>> +            ovs_mutex_unlock(&netmap_mutex);
>>> +            return 1;
>>> +        }
>>> +    } else {
>>> +        /* We are initializing the first Netmap port: setup Netmap
>>> memory
>>> +         * to this process. */
>>> +        nmr.memid = nmd->req.nr_arg2;
>>> +        nmr.memsize = nmd->req.nr_memsize;
>>> +        nmr.mem = mmap(0, nmr.memsize, PROT_WRITE | PROT_READ,
>>> +                        MAP_SHARED, nmd->fd, 0);
>>> +
>>> +        if (nmr.mem == MAP_FAILED) {
>>> +            VLOG_WARN("mmap has failed!");
>>> +            ovs_mutex_unlock(&netmap_mutex);
>>> +            return 1;
>>> +        }
>>> +    }
>>> +
>>> +    /* Now we can set up the following nmd fields */
>>> +    {
>>> +        struct netmap_if *nifp;
>>> +
>>> +        nmd->memsize = nmr.memsize;
>>> +        nmd->mem = nmr.mem;
>>> +        nifp = NETMAP_IF(nmd->mem, nmd->req.nr_offset);
>>> +        *(struct netmap_if **)(uintptr_t)&(nmd->nifp) = nifp;
>>> +    }
>>> +
>>> +    /* Allocate a number of blocks containing dp_packets. The total
>>> number
>>> +     * of extrabuffers to be used is multiple of the blocksize */
>>> +    uint32_t nextrabufs = nmd->req.nr_arg3 & ~(DP_BLOCK_SIZE-1);
>>> +    struct nm_block *block;
>>> +    for (int i = 0 ; i < (nextrabufs/DP_BLOCK_SIZE); i++) {
>>> +        block = nm_block_new(nmd);
>>> +        ovs_list_push_back(&nmr.block_list[NM_BLOCK_TYPE_GET],
>>> &block->node);
>>> +    }
>>> +
>>> +    ovs_mutex_unlock(&netmap_mutex);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/* This function initializes some variables and has to be called in the
>>> pmd
>>> + * thread reload.
>>> + * Thanks to this we can initialize thread local blocks and recognize
>>> + * if there are other ports using our thread-id. */
>>> +void
>>> +netmap_init_port(struct netdev_rxq *rxq) {
>>> +
>>> +    ovs_mutex_lock(&netmap_mutex);
>>> +
>>> +    if(is_netmap_class(netdev_get_class(rxq->netdev))) {
>>> +        struct netdev_netmap *dev = netdev_netmap_cast(rxq->netdev);
>>> +        dev->tid = syscall(SYS_gettid);
>>> +        dev->nma = NMA;
>>> +    }
>>> +
>>> +    /* We need to initialize new blocks in the local allocator */
>>> +    if (!GETB) {
>>> +        GETB = nm_block_new(NULL);
>>> +    }
>>> +
>>> +    if (!PUTB) {
>>> +        PUTB = nm_block_new(NULL);
>>> +    }
>>> +
>>> +    ovs_mutex_unlock(&netmap_mutex);
>>> +}
>>> +
>>> +/* This function is called upon dp_packet deallocation. The pointer is
>>> not
>>> + * dellocated but saved in a nm_block that has free space. */
>>> +void
>>> +netmap_free_packet(struct dp_packet* packet) {
>>> +    struct nm_block* block = PUTB;
>>> +
>>> +    if (OVS_UNLIKELY(block->idx == (DP_BLOCK_SIZE - 1))) {
>>> +        block = GETB;
>>> +        if (OVS_UNLIKELY(block->idx == (DP_BLOCK_SIZE - 1))) {
>>> +            nm_block_swap_global(NM_BLOCK_TYPE_PUT);
>>> +            block = PUTB;
>>> +        }
>>> +    }
>>> +
>>> +    block->packets[block->idx++] = packet;
>>> +}
>>> +
>>> +/* Allocate 'n' dp_packets to the batch. This operation might require
>>> + * multiple memcpy operations. If no thread local nm_block has data we
>>> need
>>> + * to ask for a new block to the nm_runtime. */
>>> +static int
>>> +netmap_alloc_packets(struct dp_packet_batch* b, size_t n) {
>>> +    struct nm_block* block;
>>> +    size_t step, tot = 0, s;
>>> +
>>> +    for (step = 0; step < 3; step++) {
>>> +        block = GETB;
>>> +        s = MIN(n, block->idx);
>>> +        memcpy(&b->packets[tot], &block->packets[block->idx - s],
>>> +                s * sizeof(struct dp_packet*));
>>> +        block->idx -= s;
>>> +        tot += s;
>>> +        n -= s;
>>> +
>>> +        if (n == 0) {
>>> +            break;
>>> +        } else if (OVS_LIKELY(step == 0)) {
>>> +            nm_block_swap_local();
>>> +        } else {
>>> +            nm_block_swap_global(NM_BLOCK_TYPE_GET);
>>> +        }
>>> +    }
>>> +
>>> +    return tot;
>>> +}
>>> +
>>> +/* Set up some values from the configuration. */
>>> +void
>>> +netmap_init_config(const struct smap *ovs_other_config) {
>>> +    nmr.nextrabufs = (uint32_t)
>>> +        smap_get_int(ovs_other_config, "netmap-nextrabufs",
>>> DP_BLOCK_SIZE);
>>> +
>>> +    nmr.nextrabufs &= ~(DP_BLOCK_SIZE-1);
>>> +
>>> +    VLOG_INFO("nextrabufs: %d", nmr.nextrabufs);
>>> +}
>>> +
>>> +static struct netdev_rxq *
>>> +netdev_netmap_rxq_alloc(void)
>>> +{
>>> +    struct netdev_rxq_netmap *rx = xzalloc(sizeof *rx);
>>> +    return &rx->up;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_rxq_construct(struct netdev_rxq *rxq OVS_UNUSED)
>>> +{
>>> +    /* Nothing to do here */
>>> +    return 0;
>>> +}
>>> +
>>> +static void
>>> +netdev_netmap_rxq_destruct(struct netdev_rxq *rxq OVS_UNUSED)
>>> +{
>>> +    /* Nothing to do here */
>>> +    return;
>>> +}
>>> +
>>> +static void
>>> +netdev_netmap_rxq_dealloc(struct netdev_rxq *rxq)
>>> +{
>>> +    struct netdev_rxq_netmap *rx = netdev_rxq_netmap_cast(rxq);
>>> +    free(rx);
>>> +}
>>> +
>>> +static struct netdev *
>>> +netdev_netmap_alloc(void)
>>> +{
>>> +    struct netdev_netmap *dev;
>>> +
>>> +    dev = (struct netdev_netmap *) xzalloc(sizeof *dev);
>>> +    if (dev) {
>>> +        return &dev->up;
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_construct(struct netdev *netdev)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +    const char *ifname = netdev_get_name(netdev);
>>> +
>>> +    struct nmreq req;
>>> +    memset(&req, 0 , sizeof(req));
>>> +    req.nr_arg3 = nmr.nextrabufs;
>>> +
>>> +    /* Open Netmap port requesting a number of extrabuffers. We also
>>> avoid to
>>> +     * mmap netmap memory here. */
>>> +    dev->nmd = nm_open(ifname, &req, NM_OPEN_NO_MMAP, NULL);
>>> +
>>> +    if (!dev->nmd) {
>>> +        if (!errno) {
>>> +            VLOG_WARN("opening port \"%s\" failed: not a netmap port",
>>> ifname);
>>> +        } else {
>>> +            VLOG_WARN("opening port \"%s\" failed: %s", ifname,
>>> +                ovs_strerror(errno));
>>> +        }
>>> +        return EINVAL;
>>> +    } else {
>>> +        VLOG_INFO("opening port \"%s\"", ifname);
>>> +    }
>>> +
>>> +    /* Check if we have enough extra buffers to create a nm_block. */
>>> +    if (dev->nmd->req.nr_arg3 < DP_BLOCK_SIZE) {
>>> +        VLOG_WARN("not enough extra buffers(%d/%d), closing port",
>>> +                dev->nmd->req.nr_arg3, DP_BLOCK_SIZE);
>>> +        nm_close(dev->nmd);
>>> +        return EINVAL;
>>> +    }
>>> +
>>> +    /* Possibly mmap netmap memory, initialize the nm_desc, nm_runtime.
>>> +     * Allocate some nm_blocks using the extrabuffers given to this
>>> port. */
>>> +    if (netmap_setup_port(dev->nmd)) {
>>> +        VLOG_WARN("could not setup \"%s\" port", ifname);
>>> +        nm_close(dev->nmd);
>>> +        return EINVAL;
>>> +    }
>>> +
>>> +    ovs_list_init(&dev->list_node);
>>> +    ovs_mutex_lock(&netmap_mutex);
>>> +    ovs_list_push_front(&nmr.port_list, &dev->list_node);
>>> +    ovs_mutex_unlock(&netmap_mutex);
>>> +
>>> +    ovs_mutex_init(&dev->mutex);
>>> +    pthread_spin_init(&dev->tx_lock, PTHREAD_PROCESS_SHARED);
>>> +    eth_addr_random(&dev->hwaddr);
>>> +    dev->flags = NETDEV_UP | NETDEV_PROMISC;
>>> +    dev->timestamp = netmap_rdtsc();
>>> +    dev->rxsync_intval = DEFAULT_RSYNC_INTVAL;
>>> +    dev->requested_mtu = NETMAP_RXRING(dev->nmd->nifp, 0)->nr_buf_size;
>>> +    netdev_request_reconfigure(netdev);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void
>>> +netdev_netmap_destruct(struct netdev *netdev)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +    struct nm_block* b;
>>> +
>>> +    ovs_mutex_lock(&netmap_mutex);
>>> +    VLOG_INFO("closing port \"%s\"", (const char*)
>>> netdev_get_name(netdev));
>>> +
>>> +    ovs_list_remove(&dev->list_node);
>>> +
>>> +    /* A netmap netdev is being removed.
>>> +     * If this is the last netmap port we remove all blocks. */
>>> +    if (!ovs_list_size(&nmr.port_list)) {
>>> +        LIST_FOR_EACH_POP(b, node, &nmr.block_list[NM_BLOCK_TYPE_PUT])
>>> {
>>> +            nm_block_free(b, dev->nmd);
>>> +        }
>>> +
>>> +        LIST_FOR_EACH_POP(b, node, &nmr.block_list[NM_BLOCK_TYPE_GET])
>>> {
>>> +            nm_block_free(b, dev->nmd);
>>> +        }
>>> +    } else {
>>> +        struct netdev_netmap *d;
>>> +        enum nm_block_type type;
>>> +        int last_thread_port = true;
>>> +
>>> +        /* Check if there are other netmap ports using the same thread
>>> id. */
>>> +        LIST_FOR_EACH(d, list_node, &nmr.port_list) {
>>> +            if (dev->tid == d->tid) {
>>> +                last_thread_port = false;
>>> +                break;
>>> +            }
>>> +        }
>>> +
>>> +        /* If there are no ports using this thread id we return thread
>>> local
>>> +         * blocks to the global allocator nm_runtime. */
>>> +        if (last_thread_port) {
>>> +            b = dev->nma->block[NM_BLOCK_TYPE_PUT];
>>> +            type = b->idx ? NM_BLOCK_TYPE_GET : NM_BLOCK_TYPE_PUT;
>>> +            ovs_list_push_front(&nmr.block_list[type], &b->node);
>>> +            dev->nma->block[NM_BLOCK_TYPE_PUT] = NULL;
>>> +
>>> +            b = dev->nma->block[NM_BLOCK_TYPE_GET];
>>> +            type = b->idx ? NM_BLOCK_TYPE_GET : NM_BLOCK_TYPE_PUT;
>>> +            ovs_list_push_front(&nmr.block_list[type], &b->node);
>>> +            dev->nma->block[NM_BLOCK_TYPE_GET] = NULL;
>>> +        }
>>> +
>>> +        /* We will now try to free a number of blocks equal to the
>>> blocks
>>> +         * allocated when the port was created.
>>> +         * Each block is then freed returning the extra bufs to the
>>> nm_desc. */
>>> +        int nblocks = nmr.nextrabufs / DP_BLOCK_SIZE;
>>> +        LIST_FOR_EACH_POP(b, node, &nmr.block_list[NM_BLOCK_TYPE_GET])
>>> {
>>> +            nm_block_free(b, dev->nmd);
>>> +            if (!--nblocks) {
>>> +                break;
>>> +            }
>>> +        }
>>> +
>>> +        if (!ovs_list_is_empty(&nmr.block_list[NM_BLOCK_TYPE_PUT])) {
>>> +            struct ovs_list *list_node = ovs_list_pop_front(
>>> +                                        &nmr.block_list[NM_BLOCK_TYPE_
>>> PUT]);
>>> +            b = CONTAINER_OF(list_node, struct nm_block, node);
>>> +            nm_block_free(b, dev->nmd);
>>> +        }
>>> +    }
>>> +
>>> +    ovs_mutex_unlock(&netmap_mutex);
>>> +
>>> +    /* Now we can close the port. */
>>> +    nm_close(dev->nmd);
>>> +}
>>> +
>>> +static void
>>> +netdev_netmap_dealloc(struct netdev *netdev)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +
>>> +    ovs_mutex_destroy(&dev->mutex);
>>> +    pthread_spin_destroy(&dev->tx_lock);
>>> +
>>> +    free(dev);
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_class_init(void)
>>> +{
>>> +    static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>>> +
>>> +    if (ovsthread_once_start(&once)) {
>>> +        ovs_list_init(&nmr.block_list[NM_BLOCK_TYPE_PUT]);
>>> +        ovs_list_init(&nmr.block_list[NM_BLOCK_TYPE_GET]);
>>> +        ovs_list_init(&nmr.port_list);
>>> +        ovsthread_once_done(&once);
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_reconfigure(struct netdev *netdev)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +    int err = 0;
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +
>>> +    if (dev->mtu == dev->requested_mtu) {
>>> +        /* Reconfiguration is unnecessary */
>>> +        goto out;
>>> +    }
>>> +
>>> +    dev->mtu = dev->requested_mtu;
>>> +    netdev_change_seq_changed(netdev);
>>> +
>>> +out:
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +    return err;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_get_config(const struct netdev *netdev, struct smap
>>> *args)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    smap_add_format(args, "mtu", "%d", dev->mtu);
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_set_config(struct netdev *netdev, const struct smap
>>> *args,
>>> +                         char **errp OVS_UNUSED)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    dev->rxsync_intval = smap_get_int(args, "rxsync-intval",
>>> +            DEFAULT_RSYNC_INTVAL);
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static inline void
>>> +netmap_rxsync(struct netdev_netmap *dev)
>>> +{
>>> +    uint64_t now = netmap_rdtsc();
>>> +    unsigned int diff = TSC2US(now - dev->timestamp);
>>> +
>>> +    if (diff < dev->rxsync_intval) {
>>> +        /* skipping rxsync */
>>> +        return;
>>> +    }
>>> +
>>> +    ioctl(dev->nmd->fd, NIOCRXSYNC, NULL);
>>> +
>>> +    /* update current timestamp */
>>> +    dev->timestamp = now;
>>> +}
>>> +
>>> +static inline void
>>> +netmap_swap_slot(struct dp_packet *packet, struct netmap_slot *s) {
>>> +    uint32_t idx;
>>> +
>>> +    idx = s->buf_idx;
>>> +    s->buf_idx = packet->buf_idx;
>>> +    s->flags |= NS_BUF_CHANGED;
>>> +    packet->buf_idx = idx;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_send(struct netdev *netdev, int qid OVS_UNUSED,
>>> +                     struct dp_packet_batch *batch, bool concurrent_txq)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +    struct nm_desc *nmd = dev->nmd;
>>> +    uint16_t r, nrings = dev->nmd->nifp->ni_tx_rings;
>>> +    uint32_t budget = batch->count, count = 0;
>>> +    bool again = false;
>>> +
>>> +    if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
>>> +        dp_packet_delete_batch(batch, true);
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (OVS_UNLIKELY(concurrent_txq)) {
>>> +        pthread_spin_lock(&dev->tx_lock);
>>> +    }
>>> +
>>> +try_again:
>>> +    for (r = 0; r < nrings; r++) {
>>> +        struct netmap_ring *ring;
>>> +        uint32_t head, space;
>>> +
>>> +        ring = NETMAP_TXRING(nmd->nifp, nmd->cur_tx_ring);
>>> +        space = nm_ring_space(ring); /* Available slots in this ring. */
>>> +        head = ring->head;
>>> +
>>> +        if (space > budget) {
>>> +            space = budget;
>>> +        }
>>> +        budget -= space;
>>> +
>>> +        /* Transmit as much as possible in this ring. */
>>> +        while (space--) {
>>> +            struct netmap_slot *ts = &ring->slot[head];
>>> +            struct dp_packet *packet = batch->packets[count++];
>>> +
>>> +            ts->len = dp_packet_get_send_len(packet);
>>> +
>>> +            if (OVS_UNLIKELY(packet->source != DPBUF_NETMAP)) {
>>> +                /* send packet copying data to the netmap slot */
>>> +                memcpy(NETMAP_BUF(ring, ts->buf_idx),
>>> +                        dp_packet_data(packet), ts->len);
>>> +            } else {
>>> +                /* send packet using zerocopy */
>>> +                netmap_swap_slot(packet, ts);
>>> +            }
>>> +
>>> +            head = nm_ring_next(ring, head);
>>> +        }
>>> +
>>> +        ring->head = ring->cur = head;
>>> +
>>> +        /* We may have exhausted the budget */
>>> +        if (OVS_LIKELY(!budget)) {
>>> +            break;
>>> +        }
>>> +
>>> +        /* We still have packets to send, select next ring. */
>>> +        if (OVS_UNLIKELY(++dev->nmd->cur_tx_ring == nrings)) {
>>> +            nmd->cur_tx_ring = 0;
>>> +        }
>>> +    }
>>> +
>>> +    ioctl(dev->nmd->fd, NIOCTXSYNC, NULL);
>>> +
>>> +    if (OVS_UNLIKELY(!count && !again)) {
>>> +        again = true;
>>> +        goto try_again;
>>> +    }
>>> +
>>> +    dp_packet_delete_batch(batch, true);
>>> +
>>> +    if (OVS_UNLIKELY(concurrent_txq)) {
>>> +        pthread_spin_unlock(&dev->tx_lock);
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch
>>> *batch)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(rxq->netdev);
>>> +    struct nm_desc *nmd = dev->nmd;
>>> +    uint16_t r, nrings = nmd->nifp->ni_rx_rings;
>>> +    uint32_t budget = 0;
>>> +
>>> +    if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
>>> +        return EAGAIN;
>>> +    }
>>> +
>>> +    /* check how much we can receive */
>>> +    for (r = nmd->first_rx_ring; r < nrings; r++) {
>>> +        budget += nm_ring_space(NETMAP_RXRING(nmd->nifp, r));
>>> +    }
>>> +
>>> +    /* sync if there is no packet */
>>> +    if (budget == 0) {
>>> +        netmap_rxsync(dev);
>>> +        return EAGAIN;
>>> +    }
>>> +
>>> +    /* allocate the batch */
>>> +    budget = netmap_alloc_packets(batch, MIN(budget, NETDEV_MAX_BURST));
>>> +
>>> +    for (r = 0; r < nrings; r++) {
>>> +        struct netmap_ring *ring;
>>> +        uint32_t head, space;
>>> +
>>> +        ring = NETMAP_RXRING(nmd->nifp, nmd->cur_rx_ring);
>>> +        head = ring->head;
>>> +        space = nm_ring_space(ring);
>>> +
>>> +        if (space > budget) {
>>> +            space = budget;
>>> +        }
>>> +        budget -= space;
>>> +
>>> +        /* Receive as much as possible from this ring. */
>>> +        while (space--) {
>>> +            struct netmap_slot *rs = &ring->slot[head];
>>> +            struct dp_packet *packet = batch->packets[batch->count++];
>>> +            dp_packet_init_netmap(packet, NETMAP_BUF(ring, rs->buf_idx),
>>> +                                    rs->len);
>>> +            /* receiving from a netmap port we can always zero copy
>>> here. */
>>> +            netmap_swap_slot(packet, rs);
>>> +            head = nm_ring_next(ring, head);
>>> +        }
>>> +
>>> +        ring->cur = ring->head = head;
>>> +
>>> +        /* check if the batch has been filled. */
>>> +        if (!budget) {
>>> +            break;
>>> +        }
>>> +
>>> +        /* batch isn't full, try to receive on other rings. */
>>> +        if (OVS_UNLIKELY(++nmd->cur_rx_ring == nrings)) {
>>> +            nmd->cur_rx_ring = 0;
>>> +        }
>>> +    }
>>> +
>>> +    dp_packet_batch_init_packet_fields(batch);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_get_ifindex(const struct netdev *netdev)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    /* Calculate hash from the netdev name. Ensure that ifindex is a
>>> 24-bit
>>> +     * postive integer to meet RFC 2863 recommendations.
>>> +     */
>>> +    int ifindex = hash_string(netdev->name, 0) % 0xfffffe + 1;
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    return ifindex;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_get_mtu(const struct netdev *netdev, int *mtu)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    *mtu = dev->mtu;
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_set_mtu(struct netdev *netdev, int mtu)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +
>>> +    if (mtu > NETMAP_RXRING(dev->nmd->nifp, 0)->nr_buf_size
>>> +        || mtu < ETH_HEADER_LEN) {
>>> +        VLOG_WARN("%s: unsupported MTU %d\n", dev->up.name, mtu);
>>> +        return EINVAL;
>>> +    }
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    if (dev->requested_mtu != mtu) {
>>> +        dev->requested_mtu = mtu;
>>> +        netdev_request_reconfigure(netdev);
>>> +    }
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_set_etheraddr(struct netdev *netdev, const struct
>>> eth_addr mac)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    dev->hwaddr = mac;
>>> +    netdev_change_seq_changed(netdev);
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_get_etheraddr(const struct netdev *netdev, struct
>>> eth_addr *mac)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    *mac = dev->hwaddr;
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_update_flags(struct netdev *netdev,
>>> +                          enum netdev_flags off, enum netdev_flags on,
>>> +                          enum netdev_flags *old_flagsp)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +
>>> +    if ((off | on) & ~(NETDEV_UP | NETDEV_PROMISC)) {
>>> +        return EINVAL;
>>> +    }
>>> +
>>> +    *old_flagsp = dev->flags;
>>> +    dev->flags |= on;
>>> +    dev->flags &= ~off;
>>> +
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_get_carrier(const struct netdev *netdev, bool *carrier)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    *carrier = true;
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_get_stats(const struct netdev *netdev, struct
>>> netdev_stats *stats)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    stats->tx_packets = dev->stats.tx_packets;
>>> +    stats->tx_bytes = dev->stats.tx_bytes;
>>> +    stats->rx_packets = dev->stats.rx_packets;
>>> +    stats->rx_bytes = dev->stats.rx_bytes;
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int
>>> +netdev_netmap_get_status(const struct netdev *netdev, struct smap
>>> *args)
>>> +{
>>> +    struct netdev_netmap *dev = netdev_netmap_cast(netdev);
>>> +
>>> +    ovs_mutex_lock(&dev->mutex);
>>> +    smap_add_format(args, "mtu", "%d", dev->mtu);
>>> +    ovs_mutex_unlock(&dev->mutex);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +#define NETDEV_NETMAP_CLASS(NAME, PMD, INIT, CONSTRUCT, DESTRUCT,
>>> SET_CONFIG, \
>>> +        SET_TX_MULTIQ, SEND, SEND_WAIT, GET_CARRIER, GET_STATS,
>>> GET_FEATURES, \
>>> +        GET_STATUS, RECONFIGURE, RXQ_RECV, RXQ_WAIT)        \
>>> +{                                                           \
>>> +    NAME,                                                   \
>>> +    PMD,                        /* is_pmd */                \
>>> +    INIT,                       /* init */                  \
>>> +    NULL,                       /* netdev_netmap_run */     \
>>> +    NULL,                       /* netdev_netmap_wait */    \
>>> +    netdev_netmap_alloc,                                    \
>>> +    CONSTRUCT,                                              \
>>> +    DESTRUCT,                                               \
>>> +    netdev_netmap_dealloc,                                  \
>>> +    netdev_netmap_get_config,                               \
>>> +    SET_CONFIG,                                             \
>>> +    NULL,                       /* get_tunnel_config */     \
>>> +    NULL,                       /* build header */          \
>>> +    NULL,                       /* push header */           \
>>> +    NULL,                       /* pop header */            \
>>> +    NULL,                       /* get numa id */           \
>>> +    SET_TX_MULTIQ,              /* tx multiq */             \
>>> +    SEND,                       /* send */                  \
>>> +    SEND_WAIT,                                              \
>>> +    netdev_netmap_set_etheraddr,                            \
>>> +    netdev_netmap_get_etheraddr,                            \
>>> +    netdev_netmap_get_mtu,                                  \
>>> +    netdev_netmap_set_mtu,                                  \
>>> +    netdev_netmap_get_ifindex,                              \
>>> +    GET_CARRIER,                                            \
>>> +    NULL,                       /* get_carrier_resets */    \
>>> +    NULL,                       /* get_miimon */            \
>>> +    GET_STATS,                                              \
>>> +    NULL,                       /* get_custom_stats */      \
>>> +                                                            \
>>> +    NULL,                       /* get_features */          \
>>> +    NULL,                       /* set_advertisements */    \
>>> +    NULL,                       /* get_pt_mode */           \
>>> +                                                            \
>>> +    NULL,                       /* set_policing */          \
>>> +    NULL,                       /* get_qos_types */         \
>>> +    NULL,                       /* get_qos_capabilities */  \
>>> +    NULL,                       /* get_qos */               \
>>> +    NULL,                       /* set_qos */               \
>>> +    NULL,                       /* get_queue */             \
>>> +    NULL,                       /* set_queue */             \
>>> +    NULL,                       /* delete_queue */          \
>>> +    NULL,                       /* get_queue_stats */       \
>>> +    NULL,                       /* queue_dump_start */      \
>>> +    NULL,                       /* queue_dump_next */       \
>>> +    NULL,                       /* queue_dump_done */       \
>>> +    NULL,                       /* dump_queue_stats */      \
>>> +                                                            \
>>> +    NULL,                       /* set_in4 */               \
>>> +    NULL,                       /* get_addr_list */         \
>>> +    NULL,                       /* add_router */            \
>>> +    NULL,                       /* get_next_hop */          \
>>> +    GET_STATUS,                                             \
>>> +    NULL,                       /* arp_lookup */            \
>>> +                                                            \
>>> +    netdev_netmap_update_flags,                             \
>>> +    RECONFIGURE,                                            \
>>> +                                                            \
>>> +    netdev_netmap_rxq_alloc,                                \
>>> +    netdev_netmap_rxq_construct,                            \
>>> +    netdev_netmap_rxq_destruct,                             \
>>> +    netdev_netmap_rxq_dealloc,                              \
>>> +    RXQ_RECV,                                               \
>>> +    RXQ_WAIT,                                               \
>>> +    NULL,                       /* rxq_drain */             \
>>> +    NO_OFFLOAD_API                                          \
>>> +}
>>> +
>>> +static const struct netdev_class netmap_class =
>>> +    NETDEV_NETMAP_CLASS(
>>> +        "netmap",
>>> +        true,
>>> +        netdev_netmap_class_init,
>>> +        netdev_netmap_construct,
>>> +        netdev_netmap_destruct,
>>> +        netdev_netmap_set_config,
>>> +        NULL,
>>> +        netdev_netmap_send,
>>> +        NULL,
>>> +        netdev_netmap_get_carrier,
>>> +        netdev_netmap_get_stats,
>>> +        NULL,
>>> +        netdev_netmap_get_status,
>>> +        netdev_netmap_reconfigure,
>>> +        netdev_netmap_rxq_recv,
>>> +        NULL);
>>> +
>>> +void
>>> +netdev_netmap_register(void)
>>> +{
>>> +    netdev_register_provider(&netmap_class);
>>> +}
>>> diff --git a/lib/netdev-netmap.h b/lib/netdev-netmap.h
>>> new file mode 100644
>>> index 000000000..49fe8c319
>>> --- /dev/null
>>> +++ b/lib/netdev-netmap.h
>>> @@ -0,0 +1,13 @@
>>> +#ifndef NETDEV_NETMAP_H
>>> +#define NETDEV_NETMAP_H
>>> +
>>> +struct netdev_rxq;
>>> +struct smap;
>>> +struct dp_packet;
>>> +
>>> +void netmap_init_port(struct netdev_rxq *);
>>> +void netmap_init_config(const struct smap *);
>>> +void netmap_free_packet(struct dp_packet *);
>>> +void netdev_netmap_register(void);
>>> +
>>> +#endif /* netdev-netmap.h */
>>> diff --git a/lib/netmap-stub.c b/lib/netmap-stub.c
>>> new file mode 100644
>>> index 000000000..62f7a06b8
>>> --- /dev/null
>>> +++ b/lib/netmap-stub.c
>>> @@ -0,0 +1,21 @@
>>> +#include <config.h>
>>> +#include "netmap.h"
>>> +
>>> +#include "smap.h"
>>> +#include "ovs-thread.h"
>>> +#include "openvswitch/vlog.h"
>>> +
>>> +VLOG_DEFINE_THIS_MODULE(netmap);
>>> +
>>> +void
>>> +netmap_init(const struct smap *ovs_other_config)
>>> +{
>>> +    if (smap_get_bool(ovs_other_config, "netmap-init", false)) {
>>> +        static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>>> +
>>> +        if (ovsthread_once_start(&once)) {
>>> +            VLOG_ERR("NETMAP not supported in this copy of Open
>>> vSwitch.");
>>> +            ovsthread_once_done(&once);
>>> +        }
>>> +    }
>>> +}
>>> diff --git a/lib/netmap.c b/lib/netmap.c
>>> new file mode 100644
>>> index 000000000..b4147e0ad
>>> --- /dev/null
>>> +++ b/lib/netmap.c
>>> @@ -0,0 +1,76 @@
>>> +#include <config.h>
>>> +
>>> +#include <fcntl.h>
>>> +#include <pthread.h>
>>> +#include <stdio.h>
>>> +#include <sys/time.h>   /* timersub */
>>> +#include <stdlib.h>
>>> +#include <string.h>
>>> +#include <stdint.h>
>>> +#include <unistd.h> /* read() */
>>> +
>>> +#include "dirs.h"
>>> +#include "netdev-netmap.h"
>>> +#include "netmap.h"
>>> +#include "openvswitch/vlog.h"
>>> +#include "smap.h"
>>> +
>>> +VLOG_DEFINE_THIS_MODULE(netmap);
>>> +
>>> +/* initialize to avoid a division by 0 */
>>> +uint64_t netmap_ticks_per_second = 1000000000; /* set by calibrate_tsc
>>> */
>>> +
>>> +/*
>>> + * do an idle loop to compute the clock speed. We expect
>>> + * a constant TSC rate and locked on all CPUs.
>>> + * Returns ticks per second
>>> + */
>>> +static uint64_t
>>> +netmap_calibrate_tsc(void)
>>> +{
>>> +    struct timeval a, b;
>>> +    uint64_t ta_0, ta_1, tb_0, tb_1, dmax = ~0;
>>> +    uint64_t da, db, cy = 0;
>>> +    int i;
>>> +    for (i=0; i < 3; i++) {
>>> +    ta_0 = netmap_rdtsc();
>>> +    gettimeofday(&a, NULL);
>>> +    ta_1 = netmap_rdtsc();
>>> +    usleep(20000);
>>> +    tb_0 = netmap_rdtsc();
>>> +    gettimeofday(&b, NULL);
>>> +    tb_1 = netmap_rdtsc();
>>> +    da = ta_1 - ta_0;
>>> +    db = tb_1 - tb_0;
>>> +    if (da + db < dmax) {
>>> +        cy = (b.tv_sec - a.tv_sec)*1000000 + b.tv_usec - a.tv_usec;
>>> +        cy = (double)(tb_0 - ta_1)*1000000/(double)cy;
>>> +        dmax = da + db;
>>> +    }
>>> +    }
>>> +    netmap_ticks_per_second = cy;
>>> +    return cy;
>>> +}
>>> +
>>> +void
>>> +netmap_init(const struct smap *ovs_other_config)
>>> +{
>>> +    static bool enabled = false;
>>> +
>>> +    if (enabled || !ovs_other_config) {
>>> +        return;
>>> +    }
>>> +
>>> +    if (smap_get_bool(ovs_other_config, "netmap-init", false)) {
>>> +        static struct ovsthread_once once_enable =
>>> OVSTHREAD_ONCE_INITIALIZER;
>>> +        if (ovsthread_once_start(&once_enable)) {
>>> +            netmap_calibrate_tsc();
>>> +            netmap_init_config(ovs_other_config);
>>> +            netdev_netmap_register();
>>> +            enabled = true;
>>> +            ovsthread_once_done(&once_enable);
>>> +            VLOG_INFO("NETMAP Enabled");
>>> +        }
>>> +    } else
>>> +        VLOG_INFO_ONCE("NETMAP Disabled - Use other_config:netmap-init
>>> to enable");
>>> +}
>>> diff --git a/lib/netmap.h b/lib/netmap.h
>>> new file mode 100644
>>> index 000000000..34ff7b7a2
>>> --- /dev/null
>>> +++ b/lib/netmap.h
>>> @@ -0,0 +1,27 @@
>>> +#ifndef NETMAP_H
>>> +#define NETMAP_H
>>> +
>>> +#include <stdint.h>
>>> +
>>> +extern uint64_t netmap_ticks_per_second;
>>> +#define US2TSC(x) ((x)*netmap_ticks_per_second/1000000UL)
>>> +#define TSC2US(x) ((x)*1000000UL/netmap_ticks_per_second)
>>> +
>>> +#if 0 /* gcc intrinsic */
>>> +#include <x86intrin.h>
>>> +#define rdtsc __rdtsc
>>> +#else
>>> +static inline uint64_t
>>> +netmap_rdtsc(void)
>>> +{
>>> +    uint32_t hi, lo;
>>> +    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
>>> +    return (uint64_t)lo | ((uint64_t)hi << 32);
>>> +}
>>> +#endif
>>> +
>>> +struct smap;
>>> +
>>> +void netmap_init(const struct smap *ovs_other_config);
>>> +
>>> +#endif /* netmap.h */
>>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>>> index d90997e3a..2dfcbb7f6 100644
>>> --- a/vswitchd/bridge.c
>>> +++ b/vswitchd/bridge.c
>>> @@ -38,6 +38,7 @@
>>>  #include "mac-learning.h"
>>>  #include "mcast-snooping.h"
>>>  #include "netdev.h"
>>> +#include "netmap.h"
>>>  #include "nx-match.h"
>>>  #include "ofproto/bond.h"
>>>  #include "ofproto/ofproto.h"
>>> @@ -2977,6 +2978,7 @@ bridge_run(void)
>>>      if (cfg) {
>>>          netdev_set_flow_api_enabled(&cfg->other_config);
>>>          dpdk_init(&cfg->other_config);
>>> +        netmap_init(&cfg->other_config);
>>>      }
>>>
>>>      /* Initialize the ofproto library.  This only needs to run once, but
>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>>> index f899a1976..f6dd6e7b6 100644
>>> --- a/vswitchd/vswitch.xml
>>> +++ b/vswitchd/vswitch.xml
>>> @@ -217,6 +217,46 @@
>>>          </p>
>>>        </column>
>>>
>>> +      <column name="other_config" key="netmap-init"
>>> +              type='{"type": "boolean"}'>
>>> +        <p>
>>> +          Set this value to <code>true</code> to enable runtime support
>>> for
>>> +          NETMAP ports. The vswitch must have compile-time support for
>>> NETMAP as
>>> +          well.
>>> +        </p>
>>> +        <p>
>>> +          The default value is <code>false</code>. Changing this value
>>> requires
>>> +          restarting the daemon
>>> +        </p>
>>> +        <p>
>>> +          If this value is <code>false</code> at startup, any netmap
>>> ports which
>>> +          are configured in the bridge will fail.
>>> +        </p>
>>> +      </column>
>>> +
>>> +      <column name="other_config" key="netmap-nextrabufs"
>>> +              type='{"type": "integer", "minInteger": 32}'>
>>> +        <p>
>>> +            Specifies the number of extra buffers to be requested to
>>> netmap
>>> +            when opening each netmap port.
>>> +        </p>
>>> +        <p>
>>> +            Each packet received or transmitted by OVS from/to a netmap
>>> port
>>> +            needs an extra buffer. The OVS netmap runtime needs at
>>> least a
>>> +            batch worth of extra buffers (32 packets) for each port to
>>> function
>>> +            properly. More extra buffers may be necessary if OVS
>>> temporarily
>>> +            stores netmap buffers within its internal queues.
>>> +        </p>
>>> +      </column>
>>> +
>>> +      <column name="other_config" key="rxsync-intval"
>>> +              type='{"type": "integer", "minInteger": 0}'>
>>> +        <p>
>>> +            Specifies the minimum time (in microseconds) between two
>>> +            consecutive rxsync calls issued on a netmap port.
>>> +        </p>
>>> +      </column>
>>> +
>>>        <column name="other_config" key="dpdk-init"
>>>                type='{"type": "boolean"}'>
>>>          <p>
>>>
>>>
>>> 2018-03-20 15:07 GMT+01:00 Alessandro Rosetti <
>>> alessandro.rosetti at gmail.com>:
>>>
>>>> Hi Darrell,
>>>>
>>>> I'm developing netmap support for my thesis and I hope it will make it
>>>> for OVS 2.10.
>>>> In the next days I'm going to post the first prototype patch that is
>>>> almost ready
>>>>
>>>> Thanks to you,
>>>> Alessandro
>>>>
>>>> On 19 Mar 2018 9:26 pm, "Darrell Ball" <dlu998 at gmail.com> wrote:
>>>>
>>>>> Hi Alessandro
>>>>>
>>>>> I also think this would be interesting.
>>>>> Is netmap integration being actively being worked on for OVS 2.10 ?
>>>>>
>>>>> Thanks Darrell
>>>>>
>>>>> On Wed, Feb 7, 2018 at 9:19 AM, Ilya Maximets <i.maximets at samsung.com>
>>>>> wrote:
>>>>>
>>>>>> > Hi,
>>>>>>
>>>>>> Hi, Alessandro.
>>>>>>
>>>>>> >
>>>>>> >   My name is Alessandro Rosetti, and I'm currently adding netmap
>>>>>> support to
>>>>>> > ovs, following an approach similar to DPDK.
>>>>>>
>>>>>> Good to know that someone started to work on this. IMHO, it's a good
>>>>>> idea.
>>>>>> I also wanted to try to implement this someday, but had no much time.
>>>>>>
>>>>>> >
>>>>>> > I've created a new netdev: netdev_netmap that uses the pmd
>>>>>> infrastructure.
>>>>>> > The prototype I have seems to work fine (I still need to tune
>>>>>> performance,
>>>>>> > test optional features, and test more complex topologies.)
>>>>>>
>>>>>> Cool. Looking forward for your RFC patch-set.
>>>>>>
>>>>>> >
>>>>>> > I have a question about the lifetime of dp_packets.
>>>>>> > Is there any guarantee that the dp_packets allocated in a receive
>>>>>> callback
>>>>>> > (e.g. netdev_netmap_rxq_recv) are consumed by OVS (e.g. dropped,
>>>>>> cloned, or
>>>>>> > sent to other ports) **before** a subsequent call to the receive
>>>>>> callback
>>>>>> > (on the same port)?
>>>>>> > Or is it possible for dp_packets to be stored somewhere (e.g. in an
>>>>>> OVS
>>>>>> > internal queue) and live across subsequent invocations of the
>>>>>> receive
>>>>>> > callback that allocated them?
>>>>>>
>>>>>> I think that there was never such a guarantee, but recent changes in
>>>>>> userspace
>>>>>> datapath completely ruined this assumption. I mean output packet
>>>>>> batching support.
>>>>>>
>>>>>> Please refer the following commits for details:
>>>>>> 009e003 2017-12-14 | dpif-netdev: Output packet batching.
>>>>>> c71ea3c 2018-01-15 | dpif-netdev: Time based output batching.
>>>>>> 00adb8d 2018-01-15 | docs: Describe output packet batching in DPDK
>>>>>> guide.
>>>>>>
>>>>>> >
>>>>>> > I need to know if this is the case to check that my current
>>>>>> prototype is
>>>>>> > safe.
>>>>>> > I use per-port pre-allocation of dp_packets, for maximum
>>>>>> performance. I've
>>>>>> > seen that DPDK uses its internal allocator to allocate and
>>>>>> deallocate
>>>>>> > dp_packets, but netmap does not expose one.
>>>>>> > Each packet received with netmap is created as a new type dp_packet:
>>>>>> > DPBUF_NETMAP. The data points to a netmap buffer (preallocated by
>>>>>> the
>>>>>> > kernel).
>>>>>> > When I receive data (netdev_netmap_rxq_recv) I reuse the dp_packets,
>>>>>> > updating the internal pointer and a couple of additional
>>>>>> informations
>>>>>> > stored inside the dp_packet.
>>>>>> > When I have to send data I use zero copy if dp_packet is
>>>>>> DPBUF_NETMAP and
>>>>>> > copy if it's not.
>>>>>> >
>>>>>> > Thanks for the help!
>>>>>> > Alessandro.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev at openvswitch.org
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>>>>
>>>>>
>>>>>
>>>
>>
>


More information about the dev mailing list