[ovs-dev] [PATCH] Revert "dpif_netdev: Refactor dp_netdev_pmd_thread structure."

Bodireddy, Bhanuprakash bhanuprakash.bodireddy at intel.com
Mon Nov 27 17:02:03 UTC 2017


>I agree with Ilya here. Adding theses cache line markers and re-grouping
>variables to minimize gaps in cache lines is creating a maintenance burden
>without any tangible benefit. I have had to go through the pain of refactoring
>my PMD Performance Metrics patch to the new dp_netdev_pmd_thread
>struct and spent a lot of time to analyze the actual memory layout with GDB
>and play Tetris with the variables.

Analyzing the memory layout with gdb for large structures is time consuming and not usually recommended.
I would suggest using Poke-a-hole(pahole) and that helps to understand and fix the structures in no time.
With pahole it's going to be lot easier to work with large structures especially.

>
>There will never be more than a handful of PMDs, so minimizing the gaps does
>not matter from memory perspective. And whether the individual members
>occupy 4 or 5 cache lines does not matter either compared to the many
>hundred cache lines touched for EMC and DPCLS lookups of an Rx batch. And
>any optimization done for x86 is not necessarily optimal for other
>architectures.

I agree that optimization targeted for x86 doesn't necessarily suit ARM due to its different cache line size.

>
>Finally, even for x86 there is not even a performance improvement. I re-ran
>our standard L3VPN over VXLAN performance PVP test on master and with
>Ilya's revert patch:
>
>Flows   master  reverted
>8,      4.46    4.48
>100,    4.27    4.29
>1000,   4.07    4.07
>2000,   3.68    3.68
>5000,   3.03    3.03
>10000,  2.76    2.77
>20000,  2.64    2.65
>50000,  2.60    2.61
>100000, 2.60    2.61
>500000, 2.60    2.61

What are the  CFLAGS in this case, as they seem to make difference. I have added my finding here for a different patch targeted at performance
      https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341270.html

Patches to consider when testing your use case:
     Xzalloc_cachline:  https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341231.html
     (If using output batching)      https://mail.openvswitch.org/pipermail/ovs-dev/2017-November/341230.html

- Bhanuprakash.

>
>All in all, I support reverting this change.
>
>Regards, Jan
>
>Acked-by: Jan Scheurich <jan.scheurich at ericsson.com>
>
>> -----Original Message-----
>> From: ovs-dev-bounces at openvswitch.org
>> [mailto:ovs-dev-bounces at openvswitch.org] On Behalf Of Bodireddy,
>> Bhanuprakash
>> Sent: Friday, 24 November, 2017 17:09
>> To: Ilya Maximets <i.maximets at samsung.com>; ovs-dev at openvswitch.org;
>> Ben Pfaff <blp at ovn.org>
>> Cc: Heetae Ahn <heetae82.ahn at samsung.com>
>> Subject: Re: [ovs-dev] [PATCH] Revert "dpif_netdev: Refactor
>dp_netdev_pmd_thread structure."
>>
>> >On 22.11.2017 20:14, Bodireddy, Bhanuprakash wrote:
>> >>> This reverts commit a807c15796ddc43ba1ffb2a6b0bd2ad4e2b73941.
>> >>>
>> >>> Padding and aligning of dp_netdev_pmd_thread structure members is
>> >>> useless, broken in a several ways and only greatly degrades
>> >>> maintainability and extensibility of the structure.
>> >>
>> >> The idea of my earlier patch was to mark the cache lines and reduce
>> >> the
>> >holes while still maintaining the grouping of related members in this
>structure.
>> >
>> >Some of the grouping aspects looks strange. For example, it looks
>> >illogical that 'exit_latch' is grouped with 'flow_table' but not the
>> >'reload_seq' and other reload related stuff. It looks strange that
>> >statistics and counters spread across different groups. So, IMHO, it's not
>well grouped.
>>
>> I had to strike a fine balance and some members may be placed in a
>> different group due to their sizes and importance. Let me think if I can make
>it better.
>>
>> >
>> >> Also cache line marking is a good practice to make some one extra
>> >> cautious
>> >when extending or editing important structures .
>> >> Most importantly I was experimenting with prefetching on this
>> >> structure
>> >and needed cache line markers for it.
>> >>
>> >> I see that you are on ARM (I don't have HW to test) and want to
>> >> know if this
>> >commit has any negative affect and any numbers would be appreciated.
>> >
>> >Basic VM-VM testing shows stable 0.5% perfromance improvement with
>> >revert applied.
>>
>> I did P2P, PVP and PVVP with IXIA and haven't noticed any drop on X86.
>>
>> >Padding adds 560 additional bytes of holes.
>> As the cache line in ARM is 128 , it created holes, I can find a workaround to
>handle this.
>>
>> >
>> >> More comments inline.
>> >>
>> >>>
>> >>> Issues:
>> >>>
>> >>>    1. It's not working because all the instances of struct
>> >>>       dp_netdev_pmd_thread allocated only by usual malloc. All the
>> >>>       memory is not aligned to cachelines -> structure almost never
>> >>>       starts at aligned memory address. This means that any further
>> >>>       paddings and alignments inside the structure are completely
>> >>>       useless. Fo example:
>> >>>
>> >>>       Breakpoint 1, pmd_thread_main
>> >>>       (gdb) p pmd
>> >>>       $49 = (struct dp_netdev_pmd_thread *) 0x1b1af20
>> >>>       (gdb) p &pmd->cacheline1
>> >>>       $51 = (OVS_CACHE_LINE_MARKER *) 0x1b1af60
>> >>>       (gdb) p &pmd->cacheline0
>> >>>       $52 = (OVS_CACHE_LINE_MARKER *) 0x1b1af20
>> >>>       (gdb) p &pmd->flow_cache
>> >>>       $53 = (struct emc_cache *) 0x1b1afe0
>> >>>
>> >>>       All of the above addresses shifted from cacheline start by 32B.
>> >>
>> >> If you see below all the addresses are 64 byte aligned.
>> >>
>> >> (gdb) p pmd
>> >> $1 = (struct dp_netdev_pmd_thread *) 0x7fc1e9b1a040
>> >> (gdb) p &pmd->cacheline0
>> >> $2 = (OVS_CACHE_LINE_MARKER *) 0x7fc1e9b1a040
>> >> (gdb) p &pmd->cacheline1
>> >> $3 = (OVS_CACHE_LINE_MARKER *) 0x7fc1e9b1a080
>> >> (gdb) p &pmd->flow_cache
>> >> $4 = (struct emc_cache *) 0x7fc1e9b1a0c0
>> >> (gdb) p &pmd->flow_table
>> >> $5 = (struct cmap *) 0x7fc1e9fba100
>> >> (gdb) p &pmd->stats
>> >> $6 = (struct dp_netdev_pmd_stats *) 0x7fc1e9fba140
>> >> (gdb) p &pmd->port_mutex
>> >> $7 = (struct ovs_mutex *) 0x7fc1e9fba180
>> >> (gdb) p &pmd->poll_list
>> >> $8 = (struct hmap *) 0x7fc1e9fba1c0
>> >> (gdb) p &pmd->tnl_port_cache
>> >> $9 = (struct hmap *) 0x7fc1e9fba200
>> >> (gdb) p &pmd->stats_zero
>> >> $10 = (unsigned long long (*)[5]) 0x7fc1e9fba240
>> >>
>> >> I tried using xzalloc_cacheline instead of default xzalloc() here.
>> >> I tried tens of times and always found that the address is
>> >> 64 byte aligned and it should start at the beginning of cache line on X86.
>> >> Not sure why the comment  " (The memory returned will not be at the
>> >> start
>> >of  a cache line, though, so don't assume such alignment.)" says otherwise?
>> >
>> >Yes, you will always get aligned addressess on your x86 Linux system
>> >that supports
>> >posix_memalign() call. The comment says what it says because it will
>> >make some memory allocation tricks in case posix_memalign() is not
>> >available (Windows, some MacOS, maybe some Linux systems (not sure))
>> >and the address will not be aligned it this case.
>>
>> I also verified the other case when posix_memalign isn't available and
>> even in that case it returns the address aligned on CACHE_LINE_SIZE
>> boundary. I will send out a patch to use  xzalloc_cacheline for allocating the
>memory.
>>
>> >
>> >>
>> >>>
>> >>>       Can we fix it properly? NO.
>> >>>       OVS currently doesn't have appropriate API to allocate aligned
>> >>>       memory. The best candidate is 'xmalloc_cacheline()' but it
>> >>>       clearly states that "The memory returned will not be at the
>> >>>       start of a cache line, though, so don't assume such alignment".
>> >>>       And also, this function will never return aligned memory on
>> >>>       Windows or MacOS.
>> >>>
>> >>>    2. CACHE_LINE_SIZE is not constant. Different architectures have
>> >>>       different cache line sizes, but the code assumes that
>> >>>       CACHE_LINE_SIZE is always equal to 64 bytes. All the structure
>> >>>       members are grouped by 64 bytes and padded to CACHE_LINE_SIZE.
>> >>>       This leads to a huge holes in a structures if CACHE_LINE_SIZE
>> >>>       differs from 64. This is opposite to portability. If I want
>> >>>       good performance of cmap I need to have CACHE_LINE_SIZE equal
>> >>>       to the real cache line size, but I will have huge holes in the
>> >>>       structures. If you'll take a look to struct rte_mbuf from DPDK
>> >>>       you'll see that it uses 2 defines: RTE_CACHE_LINE_SIZE and
>> >>>       RTE_CACHE_LINE_MIN_SIZE to avoid holes in mbuf structure.
>> >>
>> >> I understand that ARM and few other processors (like OCTEON) have
>> >> 128
>> >bytes cache lines.
>> >> But  again curious of performance impact in your case with this new
>> >alignment.
>> >>
>> >>>
>> >>>    3. Sizes of system/libc defined types are not constant for all the
>> >>>       systems. For example, sizeof(pthread_mutex_t) == 48 on my
>> >>>       ARMv8 machine, but only 40 on x86. The difference could be
>> >>>       much bigger on Windows or MacOS systems. But the code assumes
>> >>>       that sizeof(struct ovs_mutex) is always 48 bytes. This may lead
>> >>>       to broken alignment/big holes in case of padding/wrong comments
>> >>>       about amount of free pad bytes.
>> >>
>> >> This isn't an issue as you would have already mentioned and more
>> >> about
>> >issue with the comment that reads the pad bytes.
>> >> In case of ARM it would be just 8 pad bytes instead of 16 on X86.
>> >>
>> >>         union {
>> >>                 struct {
>> >>                         struct ovs_mutex port_mutex;     /* 4849984    48 */
>> >>                 };    /*          48 */
>> >>                 uint8_t            pad13[64];            /*          64 */
>> >>         };                                               /
>> >>
>> >
>> >It's not only about 'port_mutex'. If you'll take a look at
>> >'flow_mutex', you will see, that it even not padded. So, increasing the size
>of 'flow_mutex'
>> >leads to shifting of all the following padded blocks and no other
>> >blocks will be properly aligned even if the structure allocated on the
>aligned memory.
>> >
>> >>>
>> >>>    4. Sizes of the many fileds in structure depends on defines like
>> >>>       DP_N_STATS, PMD_N_CYCLES, EM_FLOW_HASH_ENTRIES and so
>on.
>> >>>       Any change in these defines or any change in any structure
>> >>>       contained by thread should lead to the not so simple
>> >>>       refactoring of the whole dp_netdev_pmd_thread structure. This
>> >>>       greatly reduces maintainability and complicates development of
>> >>>       a new features.
>> >>
>> >> I don't think it complicates development and instead I feel the
>> >> commit gives a clear indication to the developer that the members
>> >> are grouped and
>> >aligned and marked with cacheline markers.
>> >> This makes the developer extra cautious when adding new members so
>> >> that
>> >holes can be avoided.
>> >
>> >Starting rebase of the output batching patch-set I figured out that I
>> >need to remove 'unsigned long long last_cycles' and add 'struct
>> >dp_netdev_pmd_thread_ctx ctx'
>> >which is 8 bytes larger. Could you, please, suggest me where should I
>> >place that new structure member and what to do with a hole from
>'last_cycles'?
>> >
>> >This is not a trivial question, because already poor grouping will
>> >become worse almost anyway.
>>
>> Aah, realized now that the batching series doesn't cleanly apply on master.
>> Let me check this and will send across the changes that should fix this.
>>
>> - Bhanuprakash
>>
>> >>
>> >> Cacheline marking the structure is a good practice and I am sure
>> >> this structure is significant and should be carefully extended in the
>future.
>> >
>> >Not so sure about that.
>> >
>> >>
>> >>>
>> >>>    5. There is no reason to align flow_cache member because it's
>> >>>       too big and we usually access random entries by single thread
>> >>>       only.
>> >>>
>> >>
>> >> I see your point. This patch wasn't done for performance and
>> >> instead more to have some order with this ever growing structure.
>> >> During testing I found that for some test cases aligning the
>> >> flow_cache was giving
>> >me 100k+ performance consistently and so was added.
>> >
>> >This was a random performance boost. You achieved it without aligned
>> >memory allocation.
>> >Just a luck with you system environment. Using of mzalloc_cacheline
>> >will likely eliminate this performance difference or even degrade the
>performance.
>> >
>> >>
>> >>> So, the padding/alignment only creates some visibility of
>> >>> performance optimization but does nothing useful in reality. It
>> >>> only complicates maintenance and adds huge holes for non-x86
>> >>> architectures and non-Linux systems. Performance improvement
>> >>> stated in a original commit message should be random and not
>> >>> valuable. I see no
>> >performance difference.
>> >>
>> >> I understand that this is causing issues with architecture having
>> >> different cache line sizes, but unfortunately majority of them have
>> >> 64 byte
>> >cache lines so this change makes sense.
>> >
>> >I understand that 64 byte cache lines are spread a lot wider. I also
>> >have x86 as a target arch, but still, IMHO, OVS is a cross-platform
>> >application and it should not have platform dependent stuff which
>> >makes one architecture/platform better and worsen others.
>> >
>> >>
>> >> If you have performance data to prove that this causes sever perf
>> >degradation I can think of work arounds for ARM.
>> >>
>> >> - Bhanuprakash.
>> >
>> >
>> >P.S.: If you'll want to test with CACHE_LINE_SIZE=128 you will have
>> >to apply following patch to avoid build time assert (I'll send it formally
>later):
>> >
>> >---------------------------------------------------------------------
>> >--------
>> >diff --git a/lib/cmap.c b/lib/cmap.c
>> >index 35decea..5b15ecd 100644
>> >--- a/lib/cmap.c
>> >+++ b/lib/cmap.c
>> >@@ -123,12 +123,11 @@ COVERAGE_DEFINE(cmap_shrink);
>> > /* Number of entries per bucket: 7 on 32-bit, 5 on 64-bit. */
>> >#define CMAP_K ((CACHE_LINE_SIZE - 4) / CMAP_ENTRY_SIZE)
>> >
>> >-/* Pad to make a bucket a full cache line in size: 4 on 32-bit, 0 on
>> >64-bit. */ - #define CMAP_PADDING ((CACHE_LINE_SIZE - 4) - (CMAP_K *
>> >CMAP_ENTRY_SIZE))
>> >-
>> > /* A cuckoo hash bucket.  Designed to be cache-aligned and exactly
>> >one cache
>> >  * line long. */
>> > struct cmap_bucket {
>> >+    /* Padding to make cmap_bucket exactly one cache line long. */
>> >+    PADDED_MEMBERS(CACHE_LINE_SIZE,
>> >     /* Allows readers to track in-progress changes.  Initially zero, each
>> >      * writer increments this value just before and just after each change
>(see
>> >      * cmap_set_bucket()).  Thus, a reader can ensure that it gets a
>> >consistent @@ -145,11 +144,7 @@ struct cmap_bucket {
>> >      * slots. */
>> >     uint32_t hashes[CMAP_K];
>> >     struct cmap_node nodes[CMAP_K];
>> >-
>> >-    /* Padding to make cmap_bucket exactly one cache line long. */
>> >-#if CMAP_PADDING > 0
>> >-    uint8_t pad[CMAP_PADDING];
>> >-#endif
>> >+    );
>> > };
>> > BUILD_ASSERT_DECL(sizeof(struct cmap_bucket) == CACHE_LINE_SIZE);
>> >
>> >diff --git a/lib/util.h b/lib/util.h
>> >index 3c43c2c..514fdaa 100644
>> >--- a/lib/util.h
>> >+++ b/lib/util.h
>> >@@ -61,7 +61,7 @@ struct Bad_arg_to_ARRAY_SIZE {
>> >
>> > /* This system's cache line size, in bytes.
>> >  * Being wrong hurts performance but not correctness. */ -#define
>> >CACHE_LINE_SIZE 64
>> >+#define CACHE_LINE_SIZE 128
>> > BUILD_ASSERT_DECL(IS_POW2(CACHE_LINE_SIZE));
>> >
>> > /* Cacheline marking is typically done using zero-sized array.
>> >---------------------------------------------------------------------
>> >--------
>> >
>> >
>> >Best regards, Ilya Maximets.
>> >
>> >>
>> >>>
>> >>> Most of the above issues are also true for some other
>> >>> padded/aligned structures like 'struct netdev_dpdk'. They will be
>treated separately.
>> >>>
>> >>> CC: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com>
>> >>> CC: Ben Pfaff <blp at ovn.org>
>> >>> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
>> >>> ---
>> >>> lib/dpif-netdev.c | 160
>> >>> +++++++++++++++++++++++-----------------------------
>> >>> --
>> >>> 1 file changed, 69 insertions(+), 91 deletions(-)
>> >>>
>> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>> >>> 0a62630..6784269
>> >>> 100644
>> >>> --- a/lib/dpif-netdev.c
>> >>> +++ b/lib/dpif-netdev.c
>> >>> @@ -547,31 +547,18 @@ struct tx_port {
>> >>>  * actions in either case.
>> >>>  * */
>> >>> struct dp_netdev_pmd_thread {
>> >>> -    PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE,
>> >cacheline0,
>> >>> -        struct dp_netdev *dp;
>> >>> -        struct cmap_node node;          /* In 'dp->poll_threads'. */
>> >>> -        pthread_cond_t cond;            /* For synchronizing pmd thread
>> >>> -                                           reload. */
>> >>> -    );
>> >>> -
>> >>> -    PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE,
>> >cacheline1,
>> >>> -        struct ovs_mutex cond_mutex;    /* Mutex for condition variable.
>*/
>> >>> -        pthread_t thread;
>> >>> -        unsigned core_id;               /* CPU core id of this pmd thread. */
>> >>> -        int numa_id;                    /* numa node id of this pmd thread. */
>> >>> -    );
>> >>> +    struct dp_netdev *dp;
>> >>> +    struct ovs_refcount ref_cnt;    /* Every reference must be
>refcount'ed.
>> >*/
>> >>> +    struct cmap_node node;          /* In 'dp->poll_threads'. */
>> >>> +
>> >>> +    pthread_cond_t cond;            /* For synchronizing pmd thread reload.
>*/
>> >>> +    struct ovs_mutex cond_mutex;    /* Mutex for condition variable. */
>> >>>
>> >>>     /* Per thread exact-match cache.  Note, the instance for cpu core
>> >>>      * NON_PMD_CORE_ID can be accessed by multiple threads, and
>thusly
>> >>>      * need to be protected by 'non_pmd_mutex'.  Every other instance
>> >>>      * will only be accessed by its own pmd thread. */
>> >>> -    OVS_ALIGNED_VAR(CACHE_LINE_SIZE) struct emc_cache
>flow_cache;
>> >>> -    struct ovs_refcount ref_cnt;    /* Every reference must be
>refcount'ed.
>> >*/
>> >>> -
>> >>> -    /* Queue id used by this pmd thread to send packets on all netdevs
>if
>> >>> -     * XPS disabled for this netdev. All static_tx_qid's are unique and less
>> >>> -     * than 'cmap_count(dp->poll_threads)'. */
>> >>> -    uint32_t static_tx_qid;
>> >>> +    struct emc_cache flow_cache;
>> >>>
>> >>>     /* Flow-Table and classifiers
>> >>>      *
>> >>> @@ -580,77 +567,68 @@ struct dp_netdev_pmd_thread {
>> >>>      * 'flow_mutex'.
>> >>>      */
>> >>>     struct ovs_mutex flow_mutex;
>> >>> -    PADDED_MEMBERS(CACHE_LINE_SIZE,
>> >>> -        struct cmap flow_table OVS_GUARDED; /* Flow table. */
>> >>> -
>> >>> -        /* One classifier per in_port polled by the pmd */
>> >>> -        struct cmap classifiers;
>> >>> -        /* Periodically sort subtable vectors according to hit frequencies */
>> >>> -        long long int next_optimization;
>> >>> -        /* End of the next time interval for which processing cycles
>> >>> -           are stored for each polled rxq. */
>> >>> -        long long int rxq_next_cycle_store;
>> >>> -
>> >>> -        /* Cycles counters */
>> >>> -        struct dp_netdev_pmd_cycles cycles;
>> >>> -
>> >>> -        /* Used to count cycles. See 'cycles_counter_end()'. */
>> >>> -        unsigned long long last_cycles;
>> >>> -        struct latch exit_latch;        /* For terminating the pmd thread. */
>> >>> -    );
>> >>> -
>> >>> -    PADDED_MEMBERS(CACHE_LINE_SIZE,
>> >>> -        /* Statistics. */
>> >>> -        struct dp_netdev_pmd_stats stats;
>> >>> -
>> >>> -        struct seq *reload_seq;
>> >>> -        uint64_t last_reload_seq;
>> >>> -        atomic_bool reload;             /* Do we need to reload ports? */
>> >>> -        bool isolated;
>> >>> -
>> >>> -        /* Set to true if the pmd thread needs to be reloaded. */
>> >>> -        bool need_reload;
>> >>> -        /* 5 pad bytes. */
>> >>> -    );
>> >>> -
>> >>> -    PADDED_MEMBERS(CACHE_LINE_SIZE,
>> >>> -        struct ovs_mutex port_mutex;    /* Mutex for 'poll_list'
>> >>> -                                           and 'tx_ports'. */
>> >>> -        /* 16 pad bytes. */
>> >>> -    );
>> >>> -    PADDED_MEMBERS(CACHE_LINE_SIZE,
>> >>> -        /* List of rx queues to poll. */
>> >>> -        struct hmap poll_list OVS_GUARDED;
>> >>> -        /* Map of 'tx_port's used for transmission.  Written by the main
>> >>> -         * thread, read by the pmd thread. */
>> >>> -        struct hmap tx_ports OVS_GUARDED;
>> >>> -    );
>> >>> -    PADDED_MEMBERS(CACHE_LINE_SIZE,
>> >>> -        /* These are thread-local copies of 'tx_ports'.  One contains only
>> >>> -         * tunnel ports (that support push_tunnel/pop_tunnel), the other
>> >>> -         * contains ports with at least one txq (that support send).
>> >>> -         * A port can be in both.
>> >>> -         *
>> >>> -         * There are two separate maps to make sure that we don't try to
>> >>> -         * execute OUTPUT on a device which has 0 txqs or PUSH/POP on a
>> >>> -         * non-tunnel device.
>> >>> -         *
>> >>> -         * The instances for cpu core NON_PMD_CORE_ID can be accessed
>by
>> >>> -         * multiple threads and thusly need to be protected by
>> >>> 'non_pmd_mutex'.
>> >>> -         * Every other instance will only be accessed by its own pmd
>thread.
>> >*/
>> >>> -        struct hmap tnl_port_cache;
>> >>> -        struct hmap send_port_cache;
>> >>> -    );
>> >>> -
>> >>> -    PADDED_MEMBERS(CACHE_LINE_SIZE,
>> >>> -        /* Only a pmd thread can write on its own 'cycles' and 'stats'.
>> >>> -         * The main thread keeps 'stats_zero' and 'cycles_zero' as base
>> >>> -         * values and subtracts them from 'stats' and 'cycles' before
>> >>> -         * reporting to the user */
>> >>> -        unsigned long long stats_zero[DP_N_STATS];
>> >>> -        uint64_t cycles_zero[PMD_N_CYCLES];
>> >>> -        /* 8 pad bytes. */
>> >>> -    );
>> >>> +    struct cmap flow_table OVS_GUARDED; /* Flow table. */
>> >>> +
>> >>> +    /* One classifier per in_port polled by the pmd */
>> >>> +    struct cmap classifiers;
>> >>> +    /* Periodically sort subtable vectors according to hit frequencies */
>> >>> +    long long int next_optimization;
>> >>> +    /* End of the next time interval for which processing cycles
>> >>> +       are stored for each polled rxq. */
>> >>> +    long long int rxq_next_cycle_store;
>> >>> +
>> >>> +    /* Statistics. */
>> >>> +    struct dp_netdev_pmd_stats stats;
>> >>> +
>> >>> +    /* Cycles counters */
>> >>> +    struct dp_netdev_pmd_cycles cycles;
>> >>> +
>> >>> +    /* Used to count cicles. See 'cycles_counter_end()' */
>> >>> +    unsigned long long last_cycles;
>> >>> +
>> >>> +    struct latch exit_latch;        /* For terminating the pmd thread. */
>> >>> +    struct seq *reload_seq;
>> >>> +    uint64_t last_reload_seq;
>> >>> +    atomic_bool reload;             /* Do we need to reload ports? */
>> >>> +    pthread_t thread;
>> >>> +    unsigned core_id;               /* CPU core id of this pmd thread. */
>> >>> +    int numa_id;                    /* numa node id of this pmd thread. */
>> >>> +    bool isolated;
>> >>> +
>> >>> +    /* Queue id used by this pmd thread to send packets on all netdevs
>if
>> >>> +     * XPS disabled for this netdev. All static_tx_qid's are unique and less
>> >>> +     * than 'cmap_count(dp->poll_threads)'. */
>> >>> +    uint32_t static_tx_qid;
>> >>> +
>> >>> +    struct ovs_mutex port_mutex;    /* Mutex for 'poll_list' and
>'tx_ports'.
>> >*/
>> >>> +    /* List of rx queues to poll. */
>> >>> +    struct hmap poll_list OVS_GUARDED;
>> >>> +    /* Map of 'tx_port's used for transmission.  Written by the main
>thread,
>> >>> +     * read by the pmd thread. */
>> >>> +    struct hmap tx_ports OVS_GUARDED;
>> >>> +
>> >>> +    /* These are thread-local copies of 'tx_ports'.  One contains only
>tunnel
>> >>> +     * ports (that support push_tunnel/pop_tunnel), the other
>> >>> + contains
>> >ports
>> >>> +     * with at least one txq (that support send).  A port can be in both.
>> >>> +     *
>> >>> +     * There are two separate maps to make sure that we don't try
>> >>> + to
>> >execute
>> >>> +     * OUTPUT on a device which has 0 txqs or PUSH/POP on a
>> >>> + non-tunnel
>> >>> device.
>> >>> +     *
>> >>> +     * The instances for cpu core NON_PMD_CORE_ID can be accessed
>> >>> + by
>> >>> multiple
>> >>> +     * threads, and thusly need to be protected by 'non_pmd_mutex'.
>> >Every
>> >>> +     * other instance will only be accessed by its own pmd thread. */
>> >>> +    struct hmap tnl_port_cache;
>> >>> +    struct hmap send_port_cache;
>> >>> +
>> >>> +    /* Only a pmd thread can write on its own 'cycles' and 'stats'.
>> >>> +     * The main thread keeps 'stats_zero' and 'cycles_zero' as base
>> >>> +     * values and subtracts them from 'stats' and 'cycles' before
>> >>> +     * reporting to the user */
>> >>> +    unsigned long long stats_zero[DP_N_STATS];
>> >>> +    uint64_t cycles_zero[PMD_N_CYCLES];
>> >>> +
>> >>> +    /* Set to true if the pmd thread needs to be reloaded. */
>> >>> +    bool need_reload;
>> >>> };
>> >>>
>> >>> /* Interface to netdev-based datapath. */
>> >>> --
>> >>> 2.7.4
>> >>
>> >>
>> >>
>> >>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list