[ovs-dev] [RFC PATCH 1/2] sflow: introduce egress flow sampling

Weglicki, MichalX michalx.weglicki at intel.com
Mon Sep 4 09:28:45 UTC 2017


Hello Neil, 

NAT can be configured as range of addresses, so it is impossible to 
Get this information before actual translation happens. 

In general I don't see any problem with creating egress action 
as it works in ipfix as default configuration along ingress 
action. When it comes to caching packet, I don't know yet exactly 
how it should implemented, but as long as this is optional sFLOW
feature which would enable only during specific NAT case, I don't 
really see big impact as well. Also it gives us possible improvements 
in the future, as there could be other cases where some of the 
data fields can't be read on ingress. As you confirmed that it would
work according to sFLOW specs, I really think it is worth it. 

Br, 
Michal. 

> -----Original Message-----
> From: Neil McKee [mailto:neil.mckee at inmon.com]
> Sent: Thursday, August 31, 2017 8:10 PM
> To: Weglicki, MichalX <michalx.weglicki at intel.com>
> Cc: dev at openvswitch.org
> Subject: Re: [ovs-dev] [RFC PATCH 1/2] sflow: introduce egress flow sampling
> 
> Yes.  Any solution that samples the original packet and annotates it
> with accurate information about its forwarding will conform to the
> spec.   But anything you do that touches the chain of actions in more
> than one place is likely to be problematic...
> 
> For example,  one possible approach that OVS *might* have taken from
> the start was to (1) mark the packet for sampling at ingress (2)
> accumulate forwarding information in a buffer as it goes through each
> action,  then (3) send an upcall with original-packet+forwarding-info
> at the point when all the actions were completed.  At first glance
> this seems clean because there is only one upcall and no awkward
> rendez-vous,  but actually it is making assumptions that the datapath
> is happy to buffer information at any stage,  and that it will process
> each packet to completion.  Suppose there were a datapath
> implementation (in hardware or software) where it made sense to
> pipeline the datapath with batches of packets?  It's not hard to
> imagine scenarios where those assumptions would turn into serious
> constraints.
> 
> I completely agree with you that the NAT detail is a high-value
> measurement,  but the radical simplicity of the current implementation
> is important,  so making changes to the datapath should be weighed
> against that.  One question is:  would it help if the sFlow feed
> included at least the outline of the NAT translation that is known
> from the actions-list? Or is there some other way to look up the NAT
> translation table from user-space?  Perhaps similar to the way the
> host-sflow agent annotates samples with TCP performance metrics here:
> https://github.com/sflow/host-sflow/blob/v2.0.11/src/Linux/mod_tcp.c#L512-L631
> 
> Neil
> 
> ------
> Neil McKee
> InMon Corp.
> http://www.inmon.com
> 
> 
> On Thu, Aug 31, 2017 at 5:38 AM, Weglicki, MichalX
> <michalx.weglicki at intel.com> wrote:
> > Hello Neil,
> >
> > The problem is that to fill NAT translation correctly through
> > extended_nat we need sample packet before and after the translation.
> > I understand that such information (possibly) could be analyzed by
> > collector based on information from two switches, however I think that
> > correctly getting this information at one point is beneficial.
> >
> > The approach which Przemek took is similar to ipfix, where default
> > configuration per bridge is to have packet sampled on ingress and egress.
> > This allows to support all the middlebox functions (e.g. NAT) that ipfix
> > defines. I think it should also be supported in OVS-sFLOW.
> > I wasn't aware that same packet can't be sampled on ingreess
> > and egress, according to specification it can have to be sampled on
> > igress OR egress - I didn't find any clear statement forbidding it,
> > but I'm sure it is there, as you said.
> >
> > What if I would go into some hybrid approach, when all sampling
> > would happen on ingress, but there would be optional egress action
> > which fills additional counters (like extended_nat) and sent it when needed.
> > So the logic would be:
> > - When ingress action is executed, all counters are calculated, if any
> >   of the counters can't be retrieved at this point, packet is buffered, and
> >   marked for egress sampling with additional information what information
> >  is missing.
> > - Then on egress when action is executed, requested function is
> >   calculated, filled, and then sent to collector.
> > - In all other cases, egress action will just ignore packet, as it would be
> >   already sampled on ingress action.
> >
> > Do you think such approach would work?
> >
> > Br,
> > Michal.
> >
> >> -----Original Message-----
> >> From: Neil McKee [mailto:neil.mckee at inmon.com]
> >> Sent: Monday, August 28, 2017 9:47 PM
> >> To: Weglicki, MichalX <michalx.weglicki at intel.com>
> >> Cc: dev at openvswitch.org
> >> Subject: Re: [ovs-dev] [RFC PATCH 1/2] sflow: introduce egress flow sampling
> >>
> >> If you add an egress sampling action like this it ends up violating
> >> the sFlow spec.   To understand why,  first see this thread back in
> >> 2013 when we switched from per-interface datasources to per-bridge
> >> datasources:
> >> https://mail.openvswitch.org/pipermail/ovs-dev/2013-April/270575.html
> >>
> >> Having the packet-sampling datasource be the bridge was very helpful.
> >> It meant we didn't have to add a new datasource for every new input
> >> port,  and didn't have to worry about input ports that had no ifIndex.
> >> However a single datasource cannot sample the same "Packet Flow" twice
> >> (see sFlow spec) so the same bridge datasource cannot sample on both
> >> ingress and egress.
> >>
> >> If we took the radical step of going back to per-interface
> >> datasources,  then it would be allowable,  but that is not tempting.
> >> For one thing we would have to make sure that the sampled packet
> >> header was fully decided and knowable at that point.   If,  for
> >> example,  the MAC source address of the egress packet is not yet
> >> determined then inserting a made-up layer-2 header would be
> >> problematic  (this is actually a known problem with existing
> >> egress-sampling implementations -- when packet samples are reported
> >> that never actually existed in the network it can pollute the analysis
> >> of address-mappings, host-location, traffic matrix de-duplication and
> >> more).
> >>
> >> A guiding principle of sFlow is that the agent should be radically
> >> simple and implemented everywhere.  The NAT addressing should be
> >> visible at the next downstream switch port.   I concede that it can be
> >> very helpful when the NAT info is included by the device doing the
> >> translation,  but if you examine the packet samples from each
> >> observation point you should still be able to associate the internal
> >> and external flows at the sFlow collector.  Does that work for your
> >> use-case?
> >>
> >> Neil
> >>
> >>
> >> ------
> >> Neil McKee
> >> InMon Corp.
> >> http://www.inmon.com
> >>
> >>
> >> On Thu, Aug 24, 2017 at 5:33 AM, Michal Weglicki
> >> <michalx.weglicki at intel.com> wrote:
> >> > Currently, sFlow implementation in Open vSwitch samples packets only on igress.
> >> > While it works great for majority of cases, it can be troublesome to expose
> >> > some of extended data structures. Ingress sampling is done on initial reception
> >> > of packet, which means that at this stage required information might not be
> >> > available yet.
> >> >
> >> > For instance, let's consider many-to-many source NAT translation configuration.
> >> > During ingress sampling, only range of IP addresses that were specified by the
> >> > user are known, but not the IP address that will be eventually selected for
> >> > translation. This is an issue for sFlow, which will prevent us from exporting
> >> > important data structure - extended NAT.
> >> >
> >> > To address this issue, sFlow egress sampling is introduced. The idea behind it,
> >> > is similar to IPFIX per-bridge sampling. User can configure whether packets
> >> > should be sampled on ingress and/or egress. In order to control sampling
> >> > configuration two other_config options are added to sFlow table:
> >> >
> >> > - enable-input-sampling
> >> > - enable-output-sampling
> >> >
> >> > Following patch series breaks UTs and should not be merged.
> >> > My intention is to get initial feedback regarding this feature. Assuming that
> >> > feedback is positive, I'll submit next version of the patch with integrated
> >> > UTs.
> >> >
> >> > Signed-off-by: Przemyslaw Szczerbik <przemyslawx.szczerbik at intel.com>
> >> > ---
> >> >  ofproto/ofproto-dpif-sflow.c | 30 +++++++++++++++++++++-
> >> >  ofproto/ofproto-dpif-sflow.h |  2 ++
> >> >  ofproto/ofproto-dpif-xlate.c | 61 ++++++++++++++++++++++++++++++++++++--------
> >> >  ofproto/ofproto.h            |  2 ++
> >> >  utilities/ovs-vsctl.8.in     |  9 ++++---
> >> >  vswitchd/bridge.c            |  5 ++++
> >> >  vswitchd/vswitch.ovsschema   |  5 +++-
> >> >  vswitchd/vswitch.xml         | 12 +++++++++
> >> >  8 files changed, 109 insertions(+), 17 deletions(-)
> >> >
> >> > diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c
> >> > index 65a2003..b92f681 100644
> >> > --- a/ofproto/ofproto-dpif-sflow.c
> >> > +++ b/ofproto/ofproto-dpif-sflow.c
> >> > @@ -100,7 +100,9 @@ ofproto_sflow_options_equal(const struct ofproto_sflow_options *a,
> >> >              && a->header_len == b->header_len
> >> >              && a->sub_id == b->sub_id
> >> >              && nullable_string_is_equal(a->agent_device, b->agent_device)
> >> > -            && nullable_string_is_equal(a->control_ip, b->control_ip));
> >> > +            && nullable_string_is_equal(a->control_ip, b->control_ip)
> >> > +            && a->enable_input_sampling == b->enable_input_sampling
> >> > +            && a->enable_output_sampling == b->enable_output_sampling);
> >> >  }
> >> >
> >> >  static struct ofproto_sflow_options *
> >> > @@ -573,6 +575,32 @@ dpif_sflow_get_probability(const struct dpif_sflow *ds) OVS_EXCLUDED(mutex)
> >> >      return probability;
> >> >  }
> >> >
> >> > +bool
> >> > +dpif_sflow_get_input_sampling(const struct dpif_sflow *ds)
> >> > +    OVS_EXCLUDED(mutex)
> >> > +{
> >> > +    bool ret = false;
> >> > +    ovs_mutex_lock(&mutex);
> >> > +    if (ds->options) {
> >> > +        ret = ds->options->enable_input_sampling;
> >> > +    }
> >> > +    ovs_mutex_unlock(&mutex);
> >> > +    return ret;
> >> > +}
> >> > +
> >> > +bool
> >> > +dpif_sflow_get_output_sampling(const struct dpif_sflow *ds)
> >> > +    OVS_EXCLUDED(mutex)
> >> > +{
> >> > +    bool ret = false;
> >> > +    ovs_mutex_lock(&mutex);
> >> > +    if (ds->options) {
> >> > +        ret = ds->options->enable_output_sampling;
> >> > +    }
> >> > +    ovs_mutex_unlock(&mutex);
> >> > +    return ret;
> >> > +}
> >> > +
> >> >  void
> >> >  dpif_sflow_unref(struct dpif_sflow *ds) OVS_EXCLUDED(mutex)
> >> >  {
> >> > diff --git a/ofproto/ofproto-dpif-sflow.h b/ofproto/ofproto-dpif-sflow.h
> >> > index 014e6cc..dd002fa 100644
> >> > --- a/ofproto/ofproto-dpif-sflow.h
> >> > +++ b/ofproto/ofproto-dpif-sflow.h
> >> > @@ -56,6 +56,8 @@ struct dpif_sflow *dpif_sflow_ref(const struct dpif_sflow *);
> >> >  void dpif_sflow_unref(struct dpif_sflow *);
> >> >
> >> >  uint32_t dpif_sflow_get_probability(const struct dpif_sflow *);
> >> > +bool dpif_sflow_get_input_sampling(const struct dpif_sflow *);
> >> > +bool dpif_sflow_get_output_sampling(const struct dpif_sflow *);
> >> >
> >> >  void dpif_sflow_set_options(struct dpif_sflow *,
> >> >                              const struct ofproto_sflow_options *);
> >> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> >> > index 973e760..c9bbf3c 100644
> >> > --- a/ofproto/ofproto-dpif-xlate.c
> >> > +++ b/ofproto/ofproto-dpif-xlate.c
> >> > @@ -2903,23 +2903,60 @@ compose_sample_action(struct xlate_ctx *ctx,
> >> >
> >> >  /* If sFLow is not enabled, returns 0 without doing anything.
> >> >   *
> >> > - * If sFlow is enabled, appends a template "sample" action to the ODP actions
> >> > - * in 'ctx'.  This action is a template because some of the information needed
> >> > - * to fill it out is not available until flow translation is complete.  In this
> >> > - * case, this functions returns an offset, which is always nonzero, to pass
> >> > - * later to fix_sflow_action() to fill in the rest of the template. */
> >> > + * If sFlow is enabled, appends a "sample" action to the ODP actions in 'ctx'.
> >> > + *
> >> > + * For sFlow input sampling this action is a template because some of the
> >> > + * information needed to fill it out is not available until flow translation is
> >> > + * complete.  In this case, this functions returns an offset, which is always
> >> > + * nonzero, to pass later to fix_sflow_action() to fill in the rest of the
> >> > + * template.
> >> > + *
> >> > + * In case of sFlow output sampling, "sample" action doesn't require calling
> >> > + * fix_sflow_action(), thus this function always returns 0. */
> >> >  static size_t
> >> > -compose_sflow_action(struct xlate_ctx *ctx)
> >> > +compose_sflow_action(struct xlate_ctx *ctx, odp_port_t output_odp_port)
> >> >  {
> >> >      struct dpif_sflow *sflow = ctx->xbridge->sflow;
> >> > +    uint32_t output_ifindex = 0;
> >> > +    ovs_be16 dst_vlan_tci = 0;
> >> > +    size_t cookie_offset;
> >> > +
> >> >      if (!sflow || ctx->xin->flow.in_port.ofp_port == OFPP_NONE) {
> >> >          return 0;
> >> >      }
> >> >
> >> > -    union user_action_cookie cookie = { .type = USER_ACTION_COOKIE_SFLOW };
> >> > -    return compose_sample_action(ctx, dpif_sflow_get_probability(sflow),
> >> > -                                 &cookie, sizeof cookie.sflow, ODPP_NONE,
> >> > -                                 true);
> >> > +    /* For input case, output_odp_port is ODPP_NONE, which is an invalid port
> >> > +     * number. */
> >> > +    if (output_odp_port == ODPP_NONE &&
> >> > +        !dpif_sflow_get_input_sampling(sflow)) {
> >> > +        return 0;
> >> > +    }
> >> > +
> >> > +    /* For output case, output_odp_port is valid. */
> >> > +    if (output_odp_port != ODPP_NONE) {
> >> > +        if (!dpif_sflow_get_output_sampling(sflow)) {
> >> > +            return 0;
> >> > +        }
> >> > +        output_ifindex = dpif_sflow_odp_port_to_ifindex(ctx->xbridge->sflow,
> >> > +                                                        output_odp_port);
> >> > +        dst_vlan_tci = ctx->base_flow.vlans[0].tci;
> >> > +
> >> > +    }
> >> > +
> >> > +    union user_action_cookie cookie = {
> >> > +        .sflow = {
> >> > +            .type = USER_ACTION_COOKIE_SFLOW,
> >> > +            .output = output_ifindex,
> >> > +            .vlan_tci = dst_vlan_tci
> >> > +        }
> >> > +    };
> >> > +    cookie_offset = compose_sample_action(ctx,
> >> > +                                          dpif_sflow_get_probability(sflow),
> >> > +                                          &cookie, sizeof cookie.sflow,
> >> > +                                          ODPP_NONE, true);
> >> > +
> >> > +    /* Return cookie offset only for input sampling */
> >> > +    return output_odp_port == ODPP_NONE ? cookie_offset : 0;
> >> >  }
> >> >
> >> >  /* If flow IPFIX is enabled, make sure IPFIX flow sample action
> >> > @@ -3836,6 +3873,7 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
> >> >                                  odp_tnl_port);
> >> >
> >> >          } else {
> >> > +            compose_sflow_action(ctx, out_port);
> >> >              /* Tunnel push-pop action is not compatible with
> >> >               * IPFIX action. */
> >> >              compose_ipfix_action(ctx, out_port);
> >> > @@ -7065,8 +7103,9 @@ xlate_actions(struct xlate_in *xin, struct xlate_out *xout)
> >> >      } else {
> >> >          /* Sampling is done on initial reception; don't redo after thawing. */
> >> >          unsigned int user_cookie_offset = 0;
> >> > +
> >> >          if (!xin->frozen_state) {
> >> > -            user_cookie_offset = compose_sflow_action(&ctx);
> >> > +            user_cookie_offset = compose_sflow_action(&ctx, ODPP_NONE);
> >> >              compose_ipfix_action(&ctx, ODPP_NONE);
> >> >          }
> >> >          size_t sample_actions_len = ctx.odp_actions->size;
> >> > diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
> >> > index 1e48e19..0d6566f 100644
> >> > --- a/ofproto/ofproto.h
> >> > +++ b/ofproto/ofproto.h
> >> > @@ -70,6 +70,8 @@ struct ofproto_sflow_options {
> >> >      uint32_t sub_id;
> >> >      char *agent_device;
> >> >      char *control_ip;
> >> > +    bool enable_input_sampling;
> >> > +    bool enable_output_sampling;
> >> >  };
> >> >
> >> >  struct ofproto_ipfix_bridge_exporter_options {
> >> > diff --git a/utilities/ovs-vsctl.8.in b/utilities/ovs-vsctl.8.in
> >> > index e5e565e..6707451 100644
> >> > --- a/utilities/ovs-vsctl.8.in
> >> > +++ b/utilities/ovs-vsctl.8.in
> >> > @@ -694,11 +694,12 @@ the NetFlow record (since it is now unreferenced):
> >> >  .B "ovs\-vsctl clear Bridge br0 netflow"
> >> >  .SS "sFlow"
> >> >  .PP
> >> > -Configure bridge \fBbr0\fR to send sFlow records to a collector on
> >> > -10.0.0.1 at port 6343, using \fBeth1\fR\'s IP address as the source,
> >> > -with specific sampling parameters:
> >> > +Configure bridge \fBbr0\fR to enable input packet flow sampling (sampling on
> >> > +input and output port is enabled by default if not disabled) and send sFlow
> >> > +records to a collector on 10.0.0.1 at port 6343, using \fBeth1\fR\'s IP address
> >> > +as the source, with specific sampling parameters:
> >> >  .IP
> >> > -.B "ovs\-vsctl \-\- \-\-id=@s create sFlow agent=eth1 target=\(rs\(dq10.0.0.1:6343\(rs\(dq header=128 sampling=64 polling=10
> >> \(rs"
> >> > +.B "ovs\-vsctl \-\- \-\-id=@s create sFlow agent=eth1 target=\(rs\(dq10.0.0.1:6343\(rs\(dq header=128 sampling=64
> polling=10
> >> other_config:enable-output-sampling=false \(rs"
> >> >  .IP
> >> >  .B "\-\- set Bridge br0 sflow=@s"
> >> >  .PP
> >> > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> >> > index a8cbae7..8ccab94 100644
> >> > --- a/vswitchd/bridge.c
> >> > +++ b/vswitchd/bridge.c
> >> > @@ -1191,6 +1191,11 @@ bridge_configure_sflow(struct bridge *br, int *sflow_bridge_number)
> >> >          oso.sampling_rate = *cfg->sampling;
> >> >      }
> >> >
> >> > +    oso.enable_input_sampling = smap_get_bool(&cfg->other_config,
> >> > +                                              "enable-input-sampling", true);
> >> > +    oso.enable_output_sampling = smap_get_bool(&cfg->other_config,
> >> > +                                               "enable-output-sampling", true);
> >> > +
> >> >      oso.polling_interval = SFL_DEFAULT_POLLING_INTERVAL;
> >> >      if (cfg->polling) {
> >> >          oso.polling_interval = *cfg->polling;
> >> > diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
> >> > index 90e50b6..7d691f7 100644
> >> > --- a/vswitchd/vswitch.ovsschema
> >> > +++ b/vswitchd/vswitch.ovsschema
> >> > @@ -1,6 +1,6 @@
> >> >  {"name": "Open_vSwitch",
> >> >   "version": "7.15.1",
> >> > - "cksum": "3682332033 23608",
> >> > + "cksum": "973392103 23737",
> >> >   "tables": {
> >> >     "Open_vSwitch": {
> >> >       "columns": {
> >> > @@ -475,6 +475,9 @@
> >> >       "columns": {
> >> >         "targets": {
> >> >           "type": {"key": "string", "min": 1, "max": "unlimited"}},
> >> > +       "other_config": {
> >> > +         "type": {"key": "string", "value": "string",
> >> > +                  "min": 0, "max": "unlimited"}},
> >> >         "sampling": {
> >> >           "type": {"key": "integer", "min": 0, "max": 1}},
> >> >         "polling": {
> >> > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> >> > index 074535b..40dc42c 100644
> >> > --- a/vswitchd/vswitch.xml
> >> > +++ b/vswitchd/vswitch.xml
> >> > @@ -4962,6 +4962,18 @@
> >> >        <code><var>ip</var>:<var>port</var></code>.
> >> >      </column>
> >> >
> >> > +    <column name="other_config" key="enable-input-sampling"
> >> > +            type='{"type": "boolean"}'>
> >> > +      By default, sFlow sampling is enabled for both ingress and egress. Set
> >> > +      this column to <code>false</code> to disable input sampling.
> >> > +    </column>
> >> > +
> >> > +    <column name="other_config" key="enable-output-sampling"
> >> > +            type='{"type": "boolean"}'>
> >> > +      By default, sFlow sampling is enabled for both ingress and egress. Set
> >> > +      this column to <code>false</code> to disable input sampling.
> >> > +    </column>
> >> > +
> >> >      <group title="Common Columns">
> >> >        The overall purpose of these columns is described under <code>Common
> >> >        Columns</code> at the beginning of this document.
> >> > --
> >> > 1.8.3.1
> >> >
> >> > _______________________________________________
> >> > dev mailing list
> >> > dev at openvswitch.org
> >> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list