[ovs-dev] [PATCH v2] Upcall/Slowpath rate limiter for OVS

Manohar Krishnappa Chidambaraswamy manohar.krishnappa.chidambaraswamy at ericsson.com
Thu Jan 25 06:20:53 UTC 2018


Ben,

On 23/01/18, 2:01 AM, "Ben Pfaff" <blp at ovn.org> wrote:

    I guess I'm probably the wrong one to ultimately review this patch,
    since it's really a DPDK patch.
    
    It's difficult to say for sure that this is really an improvement, since
    it also causes traffic to be dropped (that is, the traffic that needs to
    be slow-pathed).
    
[manu] Thanx for your response. Its true that this also results in 
packet-drops, but these packets belong to new flows that can potentially
take lot more cycles of PMD. Already established flows will *not* see any drops
due to this rate-limiting. If implemented in DPDK, we can’t differentiate packets
belonging to already established vs new flows. To do this, we would need something
in OVS only.

Sorry I didn’t include the description in my v2 patch. Here is the main intent
behind this patch (from v1). Do you think this is reasonable?

“In OVS-DPDK, both fast-path and slow-path execute in the context of a common
thread (i.e, PMD thread), without any partitioning of CPU cycles between the
two. When there is a burst of new flows coming into the data-path, packets
are punted to slow-path in the order they are received and the PMD is busy
for the duration of the upcall. Slow-path processing of a packet consumes
100-200 times the cycles of fast-path handling. As a result, the forwarding
performance of a PMD degrades significantly during an upcall burst. If the
PMD was highly loaded already, it becomes temporarily overloaded and its rx
queues start filling up. If the upcall burst is long enough, packets will be
dropped when rx queues are full. This happens even if the new flows are
unexpected and the slow-path decides to drop the packets.
            
It is likely that most of the packets dropped due to rx queue overflow belong
to established flows that should have been processed by the fast-path. Hence,
the current OVS-DPDK architecture favors the handling of new flows over the
forwarding of established flows. This is generally a sub-optimal approach.
            
Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for DoS attacks.
But even sporadic bursts of e.g. unexpected multicast packets have shown to
cause such packet drops.”
            
    I think that this could be handled in a datapath independent way using
    meters.

[manu] Do you mean using meters in dp-netdev instead of token-bucket. I don’t
know much about meters. Will check.

Thanx
Manu  
    
    On Mon, Jan 15, 2018 at 08:33:31AM +0000, Manohar Krishnappa Chidambaraswamy wrote:
    > Rebased to master.
    > 
    > Could you please review this patch?
    > 
    > v1 of this patch has been sitting idle for quite sometime. I believe this
    > feature is important. Without this feature, an attacker sending packets 
    > belonging to different (unestablished) flows can result in PMDs running only
    > upcalls, reducing the throughput drastically. Huge traffic loss due to this
    > is more like a DOS attack.
    > 
    > Please let me know your thoughts/comments. This is my first patch series in
    > OVS. Sorry if naming of this patch as v2 is not appropriate.
    > 
    > Thanx
    > Manu
    > 
    > v1 of this patch: https://patchwork.ozlabs.org/patch/836737/
    > 
    > Signed-off-by: Manohar K C <manohar.krishnappa.chidambaraswamy at ericsson.com>
    > cc: Jan Scheurich <jan.scheurich at ericsson.com>
    > cc: Ben Pfaff <blp at ovn.org>
    > 
    > ---
    >  Documentation/howto/dpdk.rst      | 21 ++++++++++
    >  debian/libopenvswitch-dev.install |  1 -
    >  debian/libopenvswitch.install     |  1 -
    >  debian/rules                      |  4 +-
    >  lib/dpif-netdev.c                 | 81 +++++++++++++++++++++++++++++++++++++--
    >  vswitchd/vswitch.xml              | 47 +++++++++++++++++++++++
    >  6 files changed, 148 insertions(+), 7 deletions(-)
    > 
    > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
    > index 587aaed..1db1562 100644
    > --- a/Documentation/howto/dpdk.rst
    > +++ b/Documentation/howto/dpdk.rst
    > @@ -716,3 +716,24 @@ devices to bridge ``br0``. Once complete, follow the below steps:
    >     Check traffic on multiple queues::
    >  
    >         $ cat /proc/interrupts | grep virtio
    > +
    > +Upcall rate limiting
    > +--------------------
    > +ovs-vsctl can be used to enable and configure upcall rate limit parameters.
    > +There are 2 configurable values ``upcall-rate`` and ``upcall-burst`` which
    > +take effect when global enable knob ``upcall-rl`` is set to true.
    > +
    > +Upcall rate should be set using ``upcall-rate`` in packets-per-sec. For
    > +example::
    > +
    > +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000
    > +
    > +Upcall burst should be set using ``upcall-burst`` in packets-per-sec. For
    > +example::
    > +
    > +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000
    > +
    > +Upcall ratelimit feature should be globally enabled using ``upcall-rl``. For
    > +example::
    > +
    > +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
    > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
    > index c7d157a..e2fd92c 100644
    > --- a/lib/dpif-netdev.c
    > +++ b/lib/dpif-netdev.c
    > @@ -101,6 +101,16 @@ static struct shash dp_netdevs OVS_GUARDED_BY(dp_netdev_mutex)
    >  
    >  static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600);
    >  
    > +/* Upcall rate-limit parameters */
    > +static bool upcall_ratelimit;
    > +static unsigned int upcall_rate;
    > +static unsigned int upcall_burst;
    > +
    > +/* TODO: Tune these default values based on upcall perf test */
    > +#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */
    > +#define UPCALL_RATE_DEFAULT      1000  /* pps */
    > +#define UPCALL_BURST_DEFAULT     1000  /* pps */
    > +
    >  #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \
    >                                       | CS_INVALID | CS_REPLY_DIR | CS_TRACKED \
    >                                       | CS_SRC_NAT | CS_DST_NAT)
    > @@ -340,6 +350,7 @@ enum dp_stat_type {
    >                                     hits */
    >      DP_STAT_SENT_PKTS,          /* Packets that has been sent. */
    >      DP_STAT_SENT_BATCHES,       /* Number of batches sent. */
    > +    DP_STAT_RATELIMIT_DROP,     /* Packets dropped due to upcall policer */
    >      DP_N_STATS
    >  };
    >  
    > @@ -645,6 +656,9 @@ struct dp_netdev_pmd_thread {
    >      unsigned long long stats_zero[DP_N_STATS];
    >      uint64_t cycles_zero[PMD_N_CYCLES];
    >  
    > +    /* Policer to rate limit slow-path */
    > +    struct token_bucket upcall_tb;
    > +
    >      /* Set to true if the pmd thread needs to be reloaded. */
    >      bool need_reload;
    >  };
    > @@ -894,10 +908,11 @@ pmd_info_show_stats(struct ds *reply,
    >      ds_put_format(reply,
    >                    "\temc hits:%llu\n\tmegaflow hits:%llu\n"
    >                    "\tavg. subtable lookups per hit:%.2f\n"
    > -                  "\tmiss:%llu\n\tlost:%llu\n"
    > +                  "\tmiss:%llu\n\tlost:%llu\n\tratelimit drops:%llu\n"
    >                    "\tavg. packets per output batch: %.2f\n",
    >                    stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
    >                    lookups_per_hit, stats[DP_STAT_MISS], stats[DP_STAT_LOST],
    > +                  stats[DP_STAT_RATELIMIT_DROP],
    >                    packets_per_batch);
    >  
    >      if (total_cycles == 0) {
    > @@ -3031,6 +3046,8 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
    >          smap_get_ullong(other_config, "emc-insert-inv-prob",
    >                          DEFAULT_EM_FLOW_INSERT_INV_PROB);
    >      uint32_t insert_min, cur_min;
    > +    unsigned int rate, burst;
    > +    bool ratelimit;
    >  
    >      if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
    >          free(dp->pmd_cmask);
    > @@ -3056,6 +3073,36 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
    >          }
    >      }
    >  
    > +    /* Handle upcall policer params */
    > +    ratelimit = smap_get_bool(other_config, "upcall-rl",
    > +                              UPCALL_RATELIMIT_DEFAULT);
    > +    rate = smap_get_int(other_config, "upcall-rate",
    > +                        UPCALL_RATE_DEFAULT);
    > +    burst = smap_get_int(other_config, "upcall-burst",
    > +                         UPCALL_BURST_DEFAULT);
    > +
    > +    if ((rate != upcall_rate) || (burst != upcall_burst)) {
    > +        VLOG_INFO("Upcall ratelimit params changed : Old - rate=%d burst=%d "
    > +                  ": New - rate=%d burst=%d\n", upcall_rate, upcall_burst,
    > +                  rate, burst);
    > +
    > +        upcall_rate = rate;
    > +        upcall_burst = burst;
    > +
    > +        /*
    > +         * TODO: See if there is a way to reconfig only the policer
    > +         * in each PMD.
    > +         */
    > +        dp_netdev_request_reconfigure(dp);
    > +    }
    > +
    > +    if (ratelimit != upcall_ratelimit) {
    > +        upcall_ratelimit = ratelimit;
    > +
    > +        VLOG_INFO("Upcall ratelimit changed to %s\n",
    > +                  (upcall_ratelimit ? "Enabled" : "Disabled"));
    > +    }
    > +
    >      return 0;
    >  }
    >  
    > @@ -3958,6 +4005,12 @@ dpif_netdev_run(struct dpif *dpif)
    >      ovs_mutex_lock(&dp->port_mutex);
    >      non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
    >      if (non_pmd) {
    > +        /* Reconfig the upcall policer if params have changed */
    > +        if ((upcall_rate != non_pmd->upcall_tb.rate) ||
    > +            (upcall_burst != non_pmd->upcall_tb.burst)) {
    > +            token_bucket_init(&non_pmd->upcall_tb, upcall_rate, upcall_burst);
    > +        }
    > +
    >          ovs_mutex_lock(&dp->non_pmd_mutex);
    >          cycles_count_start(non_pmd);
    >          HMAP_FOR_EACH (port, node, &dp->ports) {
    > @@ -4154,6 +4207,9 @@ reload:
    >          lc = UINT_MAX;
    >      }
    >  
    > +    /* Initialize upcall policer token bucket with configured params */
    > +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
    > +
    >      cycles_count_start(pmd);
    >      for (;;) {
    >          for (i = 0; i < poll_cnt; i++) {
    > @@ -4638,6 +4694,10 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
    >          emc_cache_init(&pmd->flow_cache);
    >          pmd_alloc_static_tx_qid(pmd);
    >      }
    > +
    > +    /* Initialize upcall policer token bucket with configured params */
    > +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
    > +
    >      cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node),
    >                  hash_int(core_id, 0));
    >  }
    > @@ -5076,7 +5136,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
    >                       struct dp_packet *packet,
    >                       const struct netdev_flow_key *key,
    >                       struct ofpbuf *actions, struct ofpbuf *put_actions,
    > -                     int *lost_cnt)
    > +                     int *lost_cnt, int *rl_drop_cnt)
    >  {
    >      struct ofpbuf *add_actions;
    >      struct dp_packet_batch b;
    > @@ -5084,6 +5144,18 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
    >      ovs_u128 ufid;
    >      int error;
    >  
    > +    /*
    > +     * Grab a token from the upcall policer to enter slowpath. If token
    > +     * is not available, drop and account the packet. This is to
    > +     * rate-limit packets getting into slowpath.
    > +     */
    > +    if (upcall_ratelimit && !token_bucket_withdraw(&pmd->upcall_tb, 1)) {
    > +        dp_packet_delete(packet);
    > +        (*rl_drop_cnt)++;
    > +
    > +        return;
    > +    }
    > +
    >      match.tun_md.valid = false;
    >      miniflow_expand(&key->mf, &match.flow);
    >  
    > @@ -5158,7 +5230,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
    >      struct dpcls *cls;
    >      struct dpcls_rule *rules[PKT_ARRAY_SIZE];
    >      struct dp_netdev *dp = pmd->dp;
    > -    int miss_cnt = 0, lost_cnt = 0;
    > +    int miss_cnt = 0, lost_cnt = 0, rl_drop_cnt = 0;
    >      int lookup_cnt = 0, add_lookup_cnt;
    >      bool any_miss;
    >      size_t i;
    > @@ -5202,7 +5274,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
    >  
    >              miss_cnt++;
    >              handle_packet_upcall(pmd, packet, &keys[i], &actions,
    > -                                 &put_actions, &lost_cnt);
    > +                                 &put_actions, &lost_cnt, &rl_drop_cnt);
    >          }
    >  
    >          ofpbuf_uninit(&actions);
    > @@ -5235,6 +5307,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
    >      dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
    >      dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
    >      dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
    > +    dp_netdev_count_packet(pmd, DP_STAT_RATELIMIT_DROP, rl_drop_cnt);
    >  }
    >  
    >  /* Packets enter the datapath from a port (or from recirculation) here.
    > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
    > index 58c0ebd..bc77c35 100644
    > --- a/vswitchd/vswitch.xml
    > +++ b/vswitchd/vswitch.xml
    > @@ -412,6 +412,53 @@
    >          </p>
    >        </column>
    >  
    > +      <column name="other_config" key="upcall-rl"
    > +              type='{"type": "boolean"}'>
    > +        <p>
    > +          Set this value to <code>true</code> to enable upcall rate-limiting.
    > +          The upcall parameters like rate and burst will be ignored, if this is
    > +          not set.
    > +        </p>
    > +        <p>
    > +          The default value is <code>false</code> and upcall rate-limiting will
    > +          be disabled.
    > +        </p>
    > +      </column>
    > +
    > +      <column name="other_config" key="upcall-rate"
    > +        type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
    > +        <p>
    > +          Specifies the rate of upcalls in packets-per-second that is to be
    > +          allowed. For example, if the value is 10000, then those many upcalls
    > +          (for packets) are allowed per second in each of the packet polling
    > +          thread (PMD or non-PMD).
    > +        </p>
    > +        <p>
    > +          A value of <code>0</code> means, no upcalls would be allowed i.e,
    > +          upcall will be disabled. This is mainly for debugging.
    > +        </p>
    > +        <p>
    > +          The default value is 1000.
    > +        </p>
    > +      </column>
    > +
    > +      <column name="other_config" key="upcall-burst"
    > +        type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
    > +        <p>
    > +          Specifies the maximum burst of upcalls in packets-per-second that is
    > +          to be allowed. For example, if the value is 15000, then a maximum
    > +          burst of 15000 upcalls (for packets) are allowed per second in each
    > +          of the packet polling thread (PMD or non-PMD).
    > +        </p>
    > +        <p>
    > +          A value of <code>0</code> means, no upcalls would be allowed i.e,
    > +          upcall will be disabled. This is mainly for debugging.
    > +        </p>
    > +        <p>
    > +          The default value is 1000.
    > +        </p>
    > +      </column>
    > +
    >        <column name="other_config" key="vlan-limit"
    >                type='{"type": "integer", "minInteger": 0}'>
    >          <p>
    > -- 
    > 1.9.1
    > 
    > 
    > 
    



More information about the dev mailing list