[ovs-dev] [RFC PATCH 1/2] sflow: introduce egress flow sampling

Neil McKee neil.mckee at inmon.com
Thu Aug 31 18:10:03 UTC 2017


Yes.  Any solution that samples the original packet and annotates it
with accurate information about its forwarding will conform to the
spec.   But anything you do that touches the chain of actions in more
than one place is likely to be problematic...

For example,  one possible approach that OVS *might* have taken from
the start was to (1) mark the packet for sampling at ingress (2)
accumulate forwarding information in a buffer as it goes through each
action,  then (3) send an upcall with original-packet+forwarding-info
at the point when all the actions were completed.  At first glance
this seems clean because there is only one upcall and no awkward
rendez-vous,  but actually it is making assumptions that the datapath
is happy to buffer information at any stage,  and that it will process
each packet to completion.  Suppose there were a datapath
implementation (in hardware or software) where it made sense to
pipeline the datapath with batches of packets?  It's not hard to
imagine scenarios where those assumptions would turn into serious
constraints.

I completely agree with you that the NAT detail is a high-value
measurement,  but the radical simplicity of the current implementation
is important,  so making changes to the datapath should be weighed
against that.  One question is:  would it help if the sFlow feed
included at least the outline of the NAT translation that is known
from the actions-list? Or is there some other way to look up the NAT
translation table from user-space?  Perhaps similar to the way the
host-sflow agent annotates samples with TCP performance metrics here:
https://github.com/sflow/host-sflow/blob/v2.0.11/src/Linux/mod_tcp.c#L512-L631

Neil

------
Neil McKee
InMon Corp.
http://www.inmon.com


On Thu, Aug 31, 2017 at 5:38 AM, Weglicki, MichalX
<michalx.weglicki at intel.com> wrote:
> Hello Neil,
>
> The problem is that to fill NAT translation correctly through
> extended_nat we need sample packet before and after the translation.
> I understand that such information (possibly) could be analyzed by
> collector based on information from two switches, however I think that
> correctly getting this information at one point is beneficial.
>
> The approach which Przemek took is similar to ipfix, where default
> configuration per bridge is to have packet sampled on ingress and egress.
> This allows to support all the middlebox functions (e.g. NAT) that ipfix
> defines. I think it should also be supported in OVS-sFLOW.
> I wasn't aware that same packet can't be sampled on ingreess
> and egress, according to specification it can have to be sampled on
> igress OR egress - I didn't find any clear statement forbidding it,
> but I'm sure it is there, as you said.
>
> What if I would go into some hybrid approach, when all sampling
> would happen on ingress, but there would be optional egress action
> which fills additional counters (like extended_nat) and sent it when needed.
> So the logic would be:
> - When ingress action is executed, all counters are calculated, if any
>   of the counters can't be retrieved at this point, packet is buffered, and
>   marked for egress sampling with additional information what information
>  is missing.
> - Then on egress when action is executed, requested function is
>   calculated, filled, and then sent to collector.
> - In all other cases, egress action will just ignore packet, as it would be
>   already sampled on ingress action.
>
> Do you think such approach would work?
>
> Br,
> Michal.
>
>> -----Original Message-----
>> From: Neil McKee [mailto:neil.mckee at inmon.com]
>> Sent: Monday, August 28, 2017 9:47 PM
>> To: Weglicki, MichalX <michalx.weglicki at intel.com>
>> Cc: dev at openvswitch.org
>> Subject: Re: [ovs-dev] [RFC PATCH 1/2] sflow: introduce egress flow sampling
>>
>> If you add an egress sampling action like this it ends up violating
>> the sFlow spec.   To understand why,  first see this thread back in
>> 2013 when we switched from per-interface datasources to per-bridge
>> datasources:
>> https://mail.openvswitch.org/pipermail/ovs-dev/2013-April/270575.html
>>
>> Having the packet-sampling datasource be the bridge was very helpful.
>> It meant we didn't have to add a new datasource for every new input
>> port,  and didn't have to worry about input ports that had no ifIndex.
>> However a single datasource cannot sample the same "Packet Flow" twice
>> (see sFlow spec) so the same bridge datasource cannot sample on both
>> ingress and egress.
>>
>> If we took the radical step of going back to per-interface
>> datasources,  then it would be allowable,  but that is not tempting.
>> For one thing we would have to make sure that the sampled packet
>> header was fully decided and knowable at that point.   If,  for
>> example,  the MAC source address of the egress packet is not yet
>> determined then inserting a made-up layer-2 header would be
>> problematic  (this is actually a known problem with existing
>> egress-sampling implementations -- when packet samples are reported
>> that never actually existed in the network it can pollute the analysis
>> of address-mappings, host-location, traffic matrix de-duplication and
>> more).
>>
>> A guiding principle of sFlow is that the agent should be radically
>> simple and implemented everywhere.  The NAT addressing should be
>> visible at the next downstream switch port.   I concede that it can be
>> very helpful when the NAT info is included by the device doing the
>> translation,  but if you examine the packet samples from each
>> observation point you should still be able to associate the internal
>> and external flows at the sFlow collector.  Does that work for your
>> use-case?
>>
>> Neil
>>
>>
>> ------
>> Neil McKee
>> InMon Corp.
>> http://www.inmon.com
>>
>>
>> On Thu, Aug 24, 2017 at 5:33 AM, Michal Weglicki
>> <michalx.weglicki at intel.com> wrote:
>> > Currently, sFlow implementation in Open vSwitch samples packets only on igress.
>> > While it works great for majority of cases, it can be troublesome to expose
>> > some of extended data structures. Ingress sampling is done on initial reception
>> > of packet, which means that at this stage required information might not be
>> > available yet.
>> >
>> > For instance, let's consider many-to-many source NAT translation configuration.
>> > During ingress sampling, only range of IP addresses that were specified by the
>> > user are known, but not the IP address that will be eventually selected for
>> > translation. This is an issue for sFlow, which will prevent us from exporting
>> > important data structure - extended NAT.
>> >
>> > To address this issue, sFlow egress sampling is introduced. The idea behind it,
>> > is similar to IPFIX per-bridge sampling. User can configure whether packets
>> > should be sampled on ingress and/or egress. In order to control sampling
>> > configuration two other_config options are added to sFlow table:
>> >
>> > - enable-input-sampling
>> > - enable-output-sampling
>> >
>> > Following patch series breaks UTs and should not be merged.
>> > My intention is to get initial feedback regarding this feature. Assuming that
>> > feedback is positive, I'll submit next version of the patch with integrated
>> > UTs.
>> >
>> > Signed-off-by: Przemyslaw Szczerbik <przemyslawx.szczerbik at intel.com>
>> > ---
>> >  ofproto/ofproto-dpif-sflow.c | 30 +++++++++++++++++++++-
>> >  ofproto/ofproto-dpif-sflow.h |  2 ++
>> >  ofproto/ofproto-dpif-xlate.c | 61 ++++++++++++++++++++++++++++++++++++--------
>> >  ofproto/ofproto.h            |  2 ++
>> >  utilities/ovs-vsctl.8.in     |  9 ++++---
>> >  vswitchd/bridge.c            |  5 ++++
>> >  vswitchd/vswitch.ovsschema   |  5 +++-
>> >  vswitchd/vswitch.xml         | 12 +++++++++
>> >  8 files changed, 109 insertions(+), 17 deletions(-)
>> >
>> > diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c
>> > index 65a2003..b92f681 100644
>> > --- a/ofproto/ofproto-dpif-sflow.c
>> > +++ b/ofproto/ofproto-dpif-sflow.c
>> > @@ -100,7 +100,9 @@ ofproto_sflow_options_equal(const struct ofproto_sflow_options *a,
>> >              && a->header_len == b->header_len
>> >              && a->sub_id == b->sub_id
>> >              && nullable_string_is_equal(a->agent_device, b->agent_device)
>> > -            && nullable_string_is_equal(a->control_ip, b->control_ip));
>> > +            && nullable_string_is_equal(a->control_ip, b->control_ip)
>> > +            && a->enable_input_sampling == b->enable_input_sampling
>> > +            && a->enable_output_sampling == b->enable_output_sampling);
>> >  }
>> >
>> >  static struct ofproto_sflow_options *
>> > @@ -573,6 +575,32 @@ dpif_sflow_get_probability(const struct dpif_sflow *ds) OVS_EXCLUDED(mutex)
>> >      return probability;
>> >  }
>> >
>> > +bool
>> > +dpif_sflow_get_input_sampling(const struct dpif_sflow *ds)
>> > +    OVS_EXCLUDED(mutex)
>> > +{
>> > +    bool ret = false;
>> > +    ovs_mutex_lock(&mutex);
>> > +    if (ds->options) {
>> > +        ret = ds->options->enable_input_sampling;
>> > +    }
>> > +    ovs_mutex_unlock(&mutex);
>> > +    return ret;
>> > +}
>> > +
>> > +bool
>> > +dpif_sflow_get_output_sampling(const struct dpif_sflow *ds)
>> > +    OVS_EXCLUDED(mutex)
>> > +{
>> > +    bool ret = false;
>> > +    ovs_mutex_lock(&mutex);
>> > +    if (ds->options) {
>> > +        ret = ds->options->enable_output_sampling;
>> > +    }
>> > +    ovs_mutex_unlock(&mutex);
>> > +    return ret;
>> > +}
>> > +
>> >  void
>> >  dpif_sflow_unref(struct dpif_sflow *ds) OVS_EXCLUDED(mutex)
>> >  {
>> > diff --git a/ofproto/ofproto-dpif-sflow.h b/ofproto/ofproto-dpif-sflow.h
>> > index 014e6cc..dd002fa 100644
>> > --- a/ofproto/ofproto-dpif-sflow.h
>> > +++ b/ofproto/ofproto-dpif-sflow.h
>> > @@ -56,6 +56,8 @@ struct dpif_sflow *dpif_sflow_ref(const struct dpif_sflow *);
>> >  void dpif_sflow_unref(struct dpif_sflow *);
>> >
>> >  uint32_t dpif_sflow_get_probability(const struct dpif_sflow *);
>> > +bool dpif_sflow_get_input_sampling(const struct dpif_sflow *);
>> > +bool dpif_sflow_get_output_sampling(const struct dpif_sflow *);
>> >
>> >  void dpif_sflow_set_options(struct dpif_sflow *,
>> >                              const struct ofproto_sflow_options *);
>> > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>> > index 973e760..c9bbf3c 100644
>> > --- a/ofproto/ofproto-dpif-xlate.c
>> > +++ b/ofproto/ofproto-dpif-xlate.c
>> > @@ -2903,23 +2903,60 @@ compose_sample_action(struct xlate_ctx *ctx,
>> >
>> >  /* If sFLow is not enabled, returns 0 without doing anything.
>> >   *
>> > - * If sFlow is enabled, appends a template "sample" action to the ODP actions
>> > - * in 'ctx'.  This action is a template because some of the information needed
>> > - * to fill it out is not available until flow translation is complete.  In this
>> > - * case, this functions returns an offset, which is always nonzero, to pass
>> > - * later to fix_sflow_action() to fill in the rest of the template. */
>> > + * If sFlow is enabled, appends a "sample" action to the ODP actions in 'ctx'.
>> > + *
>> > + * For sFlow input sampling this action is a template because some of the
>> > + * information needed to fill it out is not available until flow translation is
>> > + * complete.  In this case, this functions returns an offset, which is always
>> > + * nonzero, to pass later to fix_sflow_action() to fill in the rest of the
>> > + * template.
>> > + *
>> > + * In case of sFlow output sampling, "sample" action doesn't require calling
>> > + * fix_sflow_action(), thus this function always returns 0. */
>> >  static size_t
>> > -compose_sflow_action(struct xlate_ctx *ctx)
>> > +compose_sflow_action(struct xlate_ctx *ctx, odp_port_t output_odp_port)
>> >  {
>> >      struct dpif_sflow *sflow = ctx->xbridge->sflow;
>> > +    uint32_t output_ifindex = 0;
>> > +    ovs_be16 dst_vlan_tci = 0;
>> > +    size_t cookie_offset;
>> > +
>> >      if (!sflow || ctx->xin->flow.in_port.ofp_port == OFPP_NONE) {
>> >          return 0;
>> >      }
>> >
>> > -    union user_action_cookie cookie = { .type = USER_ACTION_COOKIE_SFLOW };
>> > -    return compose_sample_action(ctx, dpif_sflow_get_probability(sflow),
>> > -                                 &cookie, sizeof cookie.sflow, ODPP_NONE,
>> > -                                 true);
>> > +    /* For input case, output_odp_port is ODPP_NONE, which is an invalid port
>> > +     * number. */
>> > +    if (output_odp_port == ODPP_NONE &&
>> > +        !dpif_sflow_get_input_sampling(sflow)) {
>> > +        return 0;
>> > +    }
>> > +
>> > +    /* For output case, output_odp_port is valid. */
>> > +    if (output_odp_port != ODPP_NONE) {
>> > +        if (!dpif_sflow_get_output_sampling(sflow)) {
>> > +            return 0;
>> > +        }
>> > +        output_ifindex = dpif_sflow_odp_port_to_ifindex(ctx->xbridge->sflow,
>> > +                                                        output_odp_port);
>> > +        dst_vlan_tci = ctx->base_flow.vlans[0].tci;
>> > +
>> > +    }
>> > +
>> > +    union user_action_cookie cookie = {
>> > +        .sflow = {
>> > +            .type = USER_ACTION_COOKIE_SFLOW,
>> > +            .output = output_ifindex,
>> > +            .vlan_tci = dst_vlan_tci
>> > +        }
>> > +    };
>> > +    cookie_offset = compose_sample_action(ctx,
>> > +                                          dpif_sflow_get_probability(sflow),
>> > +                                          &cookie, sizeof cookie.sflow,
>> > +                                          ODPP_NONE, true);
>> > +
>> > +    /* Return cookie offset only for input sampling */
>> > +    return output_odp_port == ODPP_NONE ? cookie_offset : 0;
>> >  }
>> >
>> >  /* If flow IPFIX is enabled, make sure IPFIX flow sample action
>> > @@ -3836,6 +3873,7 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port,
>> >                                  odp_tnl_port);
>> >
>> >          } else {
>> > +            compose_sflow_action(ctx, out_port);
>> >              /* Tunnel push-pop action is not compatible with
>> >               * IPFIX action. */
>> >              compose_ipfix_action(ctx, out_port);
>> > @@ -7065,8 +7103,9 @@ xlate_actions(struct xlate_in *xin, struct xlate_out *xout)
>> >      } else {
>> >          /* Sampling is done on initial reception; don't redo after thawing. */
>> >          unsigned int user_cookie_offset = 0;
>> > +
>> >          if (!xin->frozen_state) {
>> > -            user_cookie_offset = compose_sflow_action(&ctx);
>> > +            user_cookie_offset = compose_sflow_action(&ctx, ODPP_NONE);
>> >              compose_ipfix_action(&ctx, ODPP_NONE);
>> >          }
>> >          size_t sample_actions_len = ctx.odp_actions->size;
>> > diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h
>> > index 1e48e19..0d6566f 100644
>> > --- a/ofproto/ofproto.h
>> > +++ b/ofproto/ofproto.h
>> > @@ -70,6 +70,8 @@ struct ofproto_sflow_options {
>> >      uint32_t sub_id;
>> >      char *agent_device;
>> >      char *control_ip;
>> > +    bool enable_input_sampling;
>> > +    bool enable_output_sampling;
>> >  };
>> >
>> >  struct ofproto_ipfix_bridge_exporter_options {
>> > diff --git a/utilities/ovs-vsctl.8.in b/utilities/ovs-vsctl.8.in
>> > index e5e565e..6707451 100644
>> > --- a/utilities/ovs-vsctl.8.in
>> > +++ b/utilities/ovs-vsctl.8.in
>> > @@ -694,11 +694,12 @@ the NetFlow record (since it is now unreferenced):
>> >  .B "ovs\-vsctl clear Bridge br0 netflow"
>> >  .SS "sFlow"
>> >  .PP
>> > -Configure bridge \fBbr0\fR to send sFlow records to a collector on
>> > -10.0.0.1 at port 6343, using \fBeth1\fR\'s IP address as the source,
>> > -with specific sampling parameters:
>> > +Configure bridge \fBbr0\fR to enable input packet flow sampling (sampling on
>> > +input and output port is enabled by default if not disabled) and send sFlow
>> > +records to a collector on 10.0.0.1 at port 6343, using \fBeth1\fR\'s IP address
>> > +as the source, with specific sampling parameters:
>> >  .IP
>> > -.B "ovs\-vsctl \-\- \-\-id=@s create sFlow agent=eth1 target=\(rs\(dq10.0.0.1:6343\(rs\(dq header=128 sampling=64 polling=10
>> \(rs"
>> > +.B "ovs\-vsctl \-\- \-\-id=@s create sFlow agent=eth1 target=\(rs\(dq10.0.0.1:6343\(rs\(dq header=128 sampling=64 polling=10
>> other_config:enable-output-sampling=false \(rs"
>> >  .IP
>> >  .B "\-\- set Bridge br0 sflow=@s"
>> >  .PP
>> > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>> > index a8cbae7..8ccab94 100644
>> > --- a/vswitchd/bridge.c
>> > +++ b/vswitchd/bridge.c
>> > @@ -1191,6 +1191,11 @@ bridge_configure_sflow(struct bridge *br, int *sflow_bridge_number)
>> >          oso.sampling_rate = *cfg->sampling;
>> >      }
>> >
>> > +    oso.enable_input_sampling = smap_get_bool(&cfg->other_config,
>> > +                                              "enable-input-sampling", true);
>> > +    oso.enable_output_sampling = smap_get_bool(&cfg->other_config,
>> > +                                               "enable-output-sampling", true);
>> > +
>> >      oso.polling_interval = SFL_DEFAULT_POLLING_INTERVAL;
>> >      if (cfg->polling) {
>> >          oso.polling_interval = *cfg->polling;
>> > diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
>> > index 90e50b6..7d691f7 100644
>> > --- a/vswitchd/vswitch.ovsschema
>> > +++ b/vswitchd/vswitch.ovsschema
>> > @@ -1,6 +1,6 @@
>> >  {"name": "Open_vSwitch",
>> >   "version": "7.15.1",
>> > - "cksum": "3682332033 23608",
>> > + "cksum": "973392103 23737",
>> >   "tables": {
>> >     "Open_vSwitch": {
>> >       "columns": {
>> > @@ -475,6 +475,9 @@
>> >       "columns": {
>> >         "targets": {
>> >           "type": {"key": "string", "min": 1, "max": "unlimited"}},
>> > +       "other_config": {
>> > +         "type": {"key": "string", "value": "string",
>> > +                  "min": 0, "max": "unlimited"}},
>> >         "sampling": {
>> >           "type": {"key": "integer", "min": 0, "max": 1}},
>> >         "polling": {
>> > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> > index 074535b..40dc42c 100644
>> > --- a/vswitchd/vswitch.xml
>> > +++ b/vswitchd/vswitch.xml
>> > @@ -4962,6 +4962,18 @@
>> >        <code><var>ip</var>:<var>port</var></code>.
>> >      </column>
>> >
>> > +    <column name="other_config" key="enable-input-sampling"
>> > +            type='{"type": "boolean"}'>
>> > +      By default, sFlow sampling is enabled for both ingress and egress. Set
>> > +      this column to <code>false</code> to disable input sampling.
>> > +    </column>
>> > +
>> > +    <column name="other_config" key="enable-output-sampling"
>> > +            type='{"type": "boolean"}'>
>> > +      By default, sFlow sampling is enabled for both ingress and egress. Set
>> > +      this column to <code>false</code> to disable input sampling.
>> > +    </column>
>> > +
>> >      <group title="Common Columns">
>> >        The overall purpose of these columns is described under <code>Common
>> >        Columns</code> at the beginning of this document.
>> > --
>> > 1.8.3.1
>> >
>> > _______________________________________________
>> > dev mailing list
>> > dev at openvswitch.org
>> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list