[ovs-dev] [PATCH v2] Upcall/Slowpath rate limiter for OVS

Jan Scheurich jan.scheurich at ericsson.com
Thu Feb 15 22:42:02 UTC 2018


Hi Ben,

This patch is about rate-limiting miss upcalls. Could you elaborate how you suggest to use the metering datapath action to realize that? 

I also think this is a specific issue for the DPDK datapath as only the miss upcalls are executed in-line in the PMD thread so that mass upcalls directly impact the datapath performance. The kernel datapath (not sure about Windows) doesn't have the same issue as upcalls are transferred to the user-space through netlink.

Thanks, Jan

> -----Original Message-----
> From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-bounces at openvswitch.org] On Behalf Of Ben Pfaff
> Sent: Thursday, 15 February, 2018 19:13
> To: Manohar Krishnappa Chidambaraswamy <manohar.krishnappa.chidambaraswamy at ericsson.com>
> Cc: dev at openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v2] Upcall/Slowpath rate limiter for OVS
> 
> Metering is implemented as a datapath action so I think you can share
> the mechanism instead of inventing a new one.
> 
> On Thu, Feb 15, 2018 at 05:57:58PM +0000, Manohar Krishnappa Chidambaraswamy wrote:
> > [Adding to my previous reply]
> >
> > Ben,
> >
> > “[manu] Do you mean using meters in dp-netdev instead of token-bucket. I don’t know much about meters. Will check.”
> >
> > I checked about meters. For limiting flows hitting slowpath, meters can’t be used. Since meter is an action, new flows would still hit the
> slow-path before meter-action can be executed.
> >
> > Thanx
> > Manu
> >
> > On 25/01/18, 11:52 AM, "ovs-dev-bounces at openvswitch.org on behalf of Manohar Krishnappa Chidambaraswamy" <ovs-dev-
> bounces at openvswitch.org on behalf of manohar.krishnappa.chidambaraswamy at ericsson.com> wrote:
> >
> >     Ben,
> >
> >     On 23/01/18, 2:01 AM, "Ben Pfaff" <blp at ovn.org> wrote:
> >
> >         I guess I'm probably the wrong one to ultimately review this patch,
> >         since it's really a DPDK patch.
> >
> >         It's difficult to say for sure that this is really an improvement, since
> >         it also causes traffic to be dropped (that is, the traffic that needs to
> >         be slow-pathed).
> >
> >     [manu] Thanx for your response. Its true that this also results in
> >     packet-drops, but these packets belong to new flows that can potentially
> >     take lot more cycles of PMD. Already established flows will *not* see any drops
> >     due to this rate-limiting. If implemented in DPDK, we can’t differentiate packets
> >     belonging to already established vs new flows. To do this, we would need something
> >     in OVS only.
> >
> >     Sorry I didn’t include the description in my v2 patch. Here is the main intent
> >     behind this patch (from v1). Do you think this is reasonable?
> >
> >     “In OVS-DPDK, both fast-path and slow-path execute in the context of a common
> >     thread (i.e, PMD thread), without any partitioning of CPU cycles between the
> >     two. When there is a burst of new flows coming into the data-path, packets
> >     are punted to slow-path in the order they are received and the PMD is busy
> >     for the duration of the upcall. Slow-path processing of a packet consumes
> >     100-200 times the cycles of fast-path handling. As a result, the forwarding
> >     performance of a PMD degrades significantly during an upcall burst. If the
> >     PMD was highly loaded already, it becomes temporarily overloaded and its rx
> >     queues start filling up. If the upcall burst is long enough, packets will be
> >     dropped when rx queues are full. This happens even if the new flows are
> >     unexpected and the slow-path decides to drop the packets.
> >
> >     It is likely that most of the packets dropped due to rx queue overflow belong
> >     to established flows that should have been processed by the fast-path. Hence,
> >     the current OVS-DPDK architecture favors the handling of new flows over the
> >     forwarding of established flows. This is generally a sub-optimal approach.
> >
> >     Without a limit to the rate of upcalls, OVS-DPDK is vulnerable for DoS attacks.
> >     But even sporadic bursts of e.g. unexpected multicast packets have shown to
> >     cause such packet drops.”
> >
> >         I think that this could be handled in a datapath independent way using
> >         meters.
> >
> >     [manu] Do you mean using meters in dp-netdev instead of token-bucket. I don’t
> >     know much about meters. Will check.
> >
> >     Thanx
> >     Manu
> >
> >         On Mon, Jan 15, 2018 at 08:33:31AM +0000, Manohar Krishnappa Chidambaraswamy wrote:
> >         > Rebased to master.
> >         >
> >         > Could you please review this patch?
> >         >
> >         > v1 of this patch has been sitting idle for quite sometime. I believe this
> >         > feature is important. Without this feature, an attacker sending packets
> >         > belonging to different (unestablished) flows can result in PMDs running only
> >         > upcalls, reducing the throughput drastically. Huge traffic loss due to this
> >         > is more like a DOS attack.
> >         >
> >         > Please let me know your thoughts/comments. This is my first patch series in
> >         > OVS. Sorry if naming of this patch as v2 is not appropriate.
> >         >
> >         > Thanx
> >         > Manu
> >         >
> >         > v1 of this patch: https://patchwork.ozlabs.org/patch/836737/
> >         >
> >         > Signed-off-by: Manohar K C <manohar.krishnappa.chidambaraswamy at ericsson.com>
> >         > cc: Jan Scheurich <jan.scheurich at ericsson.com>
> >         > cc: Ben Pfaff <blp at ovn.org>
> >         >
> >         > ---
> >         >  Documentation/howto/dpdk.rst      | 21 ++++++++++
> >         >  debian/libopenvswitch-dev.install |  1 -
> >         >  debian/libopenvswitch.install     |  1 -
> >         >  debian/rules                      |  4 +-
> >         >  lib/dpif-netdev.c                 | 81 +++++++++++++++++++++++++++++++++++++--
> >         >  vswitchd/vswitch.xml              | 47 +++++++++++++++++++++++
> >         >  6 files changed, 148 insertions(+), 7 deletions(-)
> >         >
> >         > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> >         > index 587aaed..1db1562 100644
> >         > --- a/Documentation/howto/dpdk.rst
> >         > +++ b/Documentation/howto/dpdk.rst
> >         > @@ -716,3 +716,24 @@ devices to bridge ``br0``. Once complete, follow the below steps:
> >         >     Check traffic on multiple queues::
> >         >
> >         >         $ cat /proc/interrupts | grep virtio
> >         > +
> >         > +Upcall rate limiting
> >         > +--------------------
> >         > +ovs-vsctl can be used to enable and configure upcall rate limit parameters.
> >         > +There are 2 configurable values ``upcall-rate`` and ``upcall-burst`` which
> >         > +take effect when global enable knob ``upcall-rl`` is set to true.
> >         > +
> >         > +Upcall rate should be set using ``upcall-rate`` in packets-per-sec. For
> >         > +example::
> >         > +
> >         > +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rate=2000
> >         > +
> >         > +Upcall burst should be set using ``upcall-burst`` in packets-per-sec. For
> >         > +example::
> >         > +
> >         > +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-burst=2000
> >         > +
> >         > +Upcall ratelimit feature should be globally enabled using ``upcall-rl``. For
> >         > +example::
> >         > +
> >         > +    $ ovs-vsctl set Open_vSwitch . other_config:upcall-rl=true
> >         > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >         > index c7d157a..e2fd92c 100644
> >         > --- a/lib/dpif-netdev.c
> >         > +++ b/lib/dpif-netdev.c
> >         > @@ -101,6 +101,16 @@ static struct shash dp_netdevs OVS_GUARDED_BY(dp_netdev_mutex)
> >         >
> >         >  static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600);
> >         >
> >         > +/* Upcall rate-limit parameters */
> >         > +static bool upcall_ratelimit;
> >         > +static unsigned int upcall_rate;
> >         > +static unsigned int upcall_burst;
> >         > +
> >         > +/* TODO: Tune these default values based on upcall perf test */
> >         > +#define UPCALL_RATELIMIT_DEFAULT false /* Disabled by default */
> >         > +#define UPCALL_RATE_DEFAULT      1000  /* pps */
> >         > +#define UPCALL_BURST_DEFAULT     1000  /* pps */
> >         > +
> >         >  #define DP_NETDEV_CS_SUPPORTED_MASK (CS_NEW | CS_ESTABLISHED | CS_RELATED \
> >         >                                       | CS_INVALID | CS_REPLY_DIR | CS_TRACKED \
> >         >                                       | CS_SRC_NAT | CS_DST_NAT)
> >         > @@ -340,6 +350,7 @@ enum dp_stat_type {
> >         >                                     hits */
> >         >      DP_STAT_SENT_PKTS,          /* Packets that has been sent. */
> >         >      DP_STAT_SENT_BATCHES,       /* Number of batches sent. */
> >         > +    DP_STAT_RATELIMIT_DROP,     /* Packets dropped due to upcall policer */
> >         >      DP_N_STATS
> >         >  };
> >         >
> >         > @@ -645,6 +656,9 @@ struct dp_netdev_pmd_thread {
> >         >      unsigned long long stats_zero[DP_N_STATS];
> >         >      uint64_t cycles_zero[PMD_N_CYCLES];
> >         >
> >         > +    /* Policer to rate limit slow-path */
> >         > +    struct token_bucket upcall_tb;
> >         > +
> >         >      /* Set to true if the pmd thread needs to be reloaded. */
> >         >      bool need_reload;
> >         >  };
> >         > @@ -894,10 +908,11 @@ pmd_info_show_stats(struct ds *reply,
> >         >      ds_put_format(reply,
> >         >                    "\temc hits:%llu\n\tmegaflow hits:%llu\n"
> >         >                    "\tavg. subtable lookups per hit:%.2f\n"
> >         > -                  "\tmiss:%llu\n\tlost:%llu\n"
> >         > +                  "\tmiss:%llu\n\tlost:%llu\n\tratelimit drops:%llu\n"
> >         >                    "\tavg. packets per output batch: %.2f\n",
> >         >                    stats[DP_STAT_EXACT_HIT], stats[DP_STAT_MASKED_HIT],
> >         >                    lookups_per_hit, stats[DP_STAT_MISS], stats[DP_STAT_LOST],
> >         > +                  stats[DP_STAT_RATELIMIT_DROP],
> >         >                    packets_per_batch);
> >         >
> >         >      if (total_cycles == 0) {
> >         > @@ -3031,6 +3046,8 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
> >         >          smap_get_ullong(other_config, "emc-insert-inv-prob",
> >         >                          DEFAULT_EM_FLOW_INSERT_INV_PROB);
> >         >      uint32_t insert_min, cur_min;
> >         > +    unsigned int rate, burst;
> >         > +    bool ratelimit;
> >         >
> >         >      if (!nullable_string_is_equal(dp->pmd_cmask, cmask)) {
> >         >          free(dp->pmd_cmask);
> >         > @@ -3056,6 +3073,36 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
> >         >          }
> >         >      }
> >         >
> >         > +    /* Handle upcall policer params */
> >         > +    ratelimit = smap_get_bool(other_config, "upcall-rl",
> >         > +                              UPCALL_RATELIMIT_DEFAULT);
> >         > +    rate = smap_get_int(other_config, "upcall-rate",
> >         > +                        UPCALL_RATE_DEFAULT);
> >         > +    burst = smap_get_int(other_config, "upcall-burst",
> >         > +                         UPCALL_BURST_DEFAULT);
> >         > +
> >         > +    if ((rate != upcall_rate) || (burst != upcall_burst)) {
> >         > +        VLOG_INFO("Upcall ratelimit params changed : Old - rate=%d burst=%d "
> >         > +                  ": New - rate=%d burst=%d\n", upcall_rate, upcall_burst,
> >         > +                  rate, burst);
> >         > +
> >         > +        upcall_rate = rate;
> >         > +        upcall_burst = burst;
> >         > +
> >         > +        /*
> >         > +         * TODO: See if there is a way to reconfig only the policer
> >         > +         * in each PMD.
> >         > +         */
> >         > +        dp_netdev_request_reconfigure(dp);
> >         > +    }
> >         > +
> >         > +    if (ratelimit != upcall_ratelimit) {
> >         > +        upcall_ratelimit = ratelimit;
> >         > +
> >         > +        VLOG_INFO("Upcall ratelimit changed to %s\n",
> >         > +                  (upcall_ratelimit ? "Enabled" : "Disabled"));
> >         > +    }
> >         > +
> >         >      return 0;
> >         >  }
> >         >
> >         > @@ -3958,6 +4005,12 @@ dpif_netdev_run(struct dpif *dpif)
> >         >      ovs_mutex_lock(&dp->port_mutex);
> >         >      non_pmd = dp_netdev_get_pmd(dp, NON_PMD_CORE_ID);
> >         >      if (non_pmd) {
> >         > +        /* Reconfig the upcall policer if params have changed */
> >         > +        if ((upcall_rate != non_pmd->upcall_tb.rate) ||
> >         > +            (upcall_burst != non_pmd->upcall_tb.burst)) {
> >         > +            token_bucket_init(&non_pmd->upcall_tb, upcall_rate, upcall_burst);
> >         > +        }
> >         > +
> >         >          ovs_mutex_lock(&dp->non_pmd_mutex);
> >         >          cycles_count_start(non_pmd);
> >         >          HMAP_FOR_EACH (port, node, &dp->ports) {
> >         > @@ -4154,6 +4207,9 @@ reload:
> >         >          lc = UINT_MAX;
> >         >      }
> >         >
> >         > +    /* Initialize upcall policer token bucket with configured params */
> >         > +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
> >         > +
> >         >      cycles_count_start(pmd);
> >         >      for (;;) {
> >         >          for (i = 0; i < poll_cnt; i++) {
> >         > @@ -4638,6 +4694,10 @@ dp_netdev_configure_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev *dp,
> >         >          emc_cache_init(&pmd->flow_cache);
> >         >          pmd_alloc_static_tx_qid(pmd);
> >         >      }
> >         > +
> >         > +    /* Initialize upcall policer token bucket with configured params */
> >         > +    token_bucket_init(&pmd->upcall_tb, upcall_rate, upcall_burst);
> >         > +
> >         >      cmap_insert(&dp->poll_threads, CONST_CAST(struct cmap_node *, &pmd->node),
> >         >                  hash_int(core_id, 0));
> >         >  }
> >         > @@ -5076,7 +5136,7 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
> >         >                       struct dp_packet *packet,
> >         >                       const struct netdev_flow_key *key,
> >         >                       struct ofpbuf *actions, struct ofpbuf *put_actions,
> >         > -                     int *lost_cnt)
> >         > +                     int *lost_cnt, int *rl_drop_cnt)
> >         >  {
> >         >      struct ofpbuf *add_actions;
> >         >      struct dp_packet_batch b;
> >         > @@ -5084,6 +5144,18 @@ handle_packet_upcall(struct dp_netdev_pmd_thread *pmd,
> >         >      ovs_u128 ufid;
> >         >      int error;
> >         >
> >         > +    /*
> >         > +     * Grab a token from the upcall policer to enter slowpath. If token
> >         > +     * is not available, drop and account the packet. This is to
> >         > +     * rate-limit packets getting into slowpath.
> >         > +     */
> >         > +    if (upcall_ratelimit && !token_bucket_withdraw(&pmd->upcall_tb, 1)) {
> >         > +        dp_packet_delete(packet);
> >         > +        (*rl_drop_cnt)++;
> >         > +
> >         > +        return;
> >         > +    }
> >         > +
> >         >      match.tun_md.valid = false;
> >         >      miniflow_expand(&key->mf, &match.flow);
> >         >
> >         > @@ -5158,7 +5230,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
> >         >      struct dpcls *cls;
> >         >      struct dpcls_rule *rules[PKT_ARRAY_SIZE];
> >         >      struct dp_netdev *dp = pmd->dp;
> >         > -    int miss_cnt = 0, lost_cnt = 0;
> >         > +    int miss_cnt = 0, lost_cnt = 0, rl_drop_cnt = 0;
> >         >      int lookup_cnt = 0, add_lookup_cnt;
> >         >      bool any_miss;
> >         >      size_t i;
> >         > @@ -5202,7 +5274,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
> >         >
> >         >              miss_cnt++;
> >         >              handle_packet_upcall(pmd, packet, &keys[i], &actions,
> >         > -                                 &put_actions, &lost_cnt);
> >         > +                                 &put_actions, &lost_cnt, &rl_drop_cnt);
> >         >          }
> >         >
> >         >          ofpbuf_uninit(&actions);
> >         > @@ -5235,6 +5307,7 @@ fast_path_processing(struct dp_netdev_pmd_thread *pmd,
> >         >      dp_netdev_count_packet(pmd, DP_STAT_LOOKUP_HIT, lookup_cnt);
> >         >      dp_netdev_count_packet(pmd, DP_STAT_MISS, miss_cnt);
> >         >      dp_netdev_count_packet(pmd, DP_STAT_LOST, lost_cnt);
> >         > +    dp_netdev_count_packet(pmd, DP_STAT_RATELIMIT_DROP, rl_drop_cnt);
> >         >  }
> >         >
> >         >  /* Packets enter the datapath from a port (or from recirculation) here.
> >         > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> >         > index 58c0ebd..bc77c35 100644
> >         > --- a/vswitchd/vswitch.xml
> >         > +++ b/vswitchd/vswitch.xml
> >         > @@ -412,6 +412,53 @@
> >         >          </p>
> >         >        </column>
> >         >
> >         > +      <column name="other_config" key="upcall-rl"
> >         > +              type='{"type": "boolean"}'>
> >         > +        <p>
> >         > +          Set this value to <code>true</code> to enable upcall rate-limiting.
> >         > +          The upcall parameters like rate and burst will be ignored, if this is
> >         > +          not set.
> >         > +        </p>
> >         > +        <p>
> >         > +          The default value is <code>false</code> and upcall rate-limiting will
> >         > +          be disabled.
> >         > +        </p>
> >         > +      </column>
> >         > +
> >         > +      <column name="other_config" key="upcall-rate"
> >         > +        type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
> >         > +        <p>
> >         > +          Specifies the rate of upcalls in packets-per-second that is to be
> >         > +          allowed. For example, if the value is 10000, then those many upcalls
> >         > +          (for packets) are allowed per second in each of the packet polling
> >         > +          thread (PMD or non-PMD).
> >         > +        </p>
> >         > +        <p>
> >         > +          A value of <code>0</code> means, no upcalls would be allowed i.e,
> >         > +          upcall will be disabled. This is mainly for debugging.
> >         > +        </p>
> >         > +        <p>
> >         > +          The default value is 1000.
> >         > +        </p>
> >         > +      </column>
> >         > +
> >         > +      <column name="other_config" key="upcall-burst"
> >         > +        type='{"type": "integer", "minInteger": 0, "maxInteger": 4294967295}'>
> >         > +        <p>
> >         > +          Specifies the maximum burst of upcalls in packets-per-second that is
> >         > +          to be allowed. For example, if the value is 15000, then a maximum
> >         > +          burst of 15000 upcalls (for packets) are allowed per second in each
> >         > +          of the packet polling thread (PMD or non-PMD).
> >         > +        </p>
> >         > +        <p>
> >         > +          A value of <code>0</code> means, no upcalls would be allowed i.e,
> >         > +          upcall will be disabled. This is mainly for debugging.
> >         > +        </p>
> >         > +        <p>
> >         > +          The default value is 1000.
> >         > +        </p>
> >         > +      </column>
> >         > +
> >         >        <column name="other_config" key="vlan-limit"
> >         >                type='{"type": "integer", "minInteger": 0}'>
> >         >          <p>
> >         > --
> >         > 1.9.1
> >         >
> >         >
> >         >
> >
> >
> >     _______________________________________________
> >     dev mailing list
> >     dev at openvswitch.org
> >     https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list