[ovs-dev] [PATCH] Initial implementation of sFlow.

Justin Pettit jpettit at nicira.com
Thu Jan 7 21:21:46 UTC 2010


Thanks, Ben and Neil.  I'm excited about getting sFlow support integrated into OVS!

A few comments inline:

On Jan 4, 2010, at 1:24 PM, Ben Pfaff wrote:

> -(e.g. NetFlow, RSPAN, ERSPAN, IOS-like CLI), and opens the forwarding
> -functions to programmatic extension and control.
> +(e.g. NetFlow, sFlow, RSPAN, ERSPAN, IOS-like CLI), and opens the

According to the license, I think this should be "sFlow(R)".  I don't think we need to do it throughout the source code...probably just in these high-level "documents".

> -    * Visibility into inter-VM communication via NetFlow, SPAN, and RSPAN
> +    * Visibility into inter-VM communication via NetFlow, sFlow, SPAN,

Same here.

> +	u64 sflow_pool;		/* Packets that could have been sampled. */

I think the sFlow definition for this is only 32-bit.  I'm guessing this is only increasing, since it's never reset any place.  I'm a bit surprised that the counters in sFlow are 32 bit in that case...

By the way, the description in the comment was a bit confusing to me.  The sFlow library is only slightly better, but I do think it helps:

	Total number of packets that could have been sampled (i.e., packets
        skipped by sampling process + total number of samples) 

> + * @sflow_probability: Probability of sampling a packet to the %ODPL_SFLOW
> + * queue, where 0 means never sample, UINT_MAX means always sample, and
> + * other values are intermediate probabilities.

The name and description might indicate to me that the probability is "sflow_probability / 100".  What about a parenthetical comment that the value is "UINT32_MAX / sflow_probability"? 

> +#define ODP_SET_SFLOW_PROBABILITY _IOR('O', 20, int)
> +#define ODP_GET_SFLOW_PROBABILITY _IOW('O', 21, int)

Is there a reason you didn't use 19 and 20?

> +/* Header added to sFlow sampled packet. */
> +struct odp_sflow_sample_header {
> +    __u64 sample_pool;          /* Number of potentially sampled packets. */

The same thing about adding the "i.e." from above.

It looks like the sample_pool in the library is only 32-bits.  If you made it match, the pad wouldn't be necessary.

> +    __u32 n_actions;            /* Number of following "union odp_action"s. */
> +    __u32 reserved;             /* Pad up to 64-bit boundary. */
> +    /* Followed by n_action "union odp_action"s. */

Would it be worth adding that the packet data follows the actions for additional clarity?

> --- a/lib/dpif-provider.h
> +++ b/lib/dpif-provider.h
> @@ -278,6 +278,23 @@ struct dpif_class {
>      * corresponding type when it calls the recv member function. */
>     int (*recv_set_mask)(struct dpif *dpif, int listen_mask);
> 
> +    /* Retrieves 'dpif''s sFlow sampling probability into '*probability'.
> +     * Return value is 0 or a positive errno value.  EOPNOTSUPP indicates that
> +     * the datapath does not support sFlow, as does a null pointer.
> +     *
> +     * A probability of 0 means sample no packets, UINT32_MAX means sample
> +     * every packet, and other values are intermediate probabilities. */

I think it would be useful to describe what that intermediate probability is (i.e., UINT32_MAX / sample_rate).

> +    int (*get_sflow_probability)(const struct dpif *dpif,
> +                                 uint32_t *probability);
> +
> +    /* Sets 'dpif''s sFlow sampling probability to 'probability'.  Return value
> +     * is 0 or a positive errno value.  EOPNOTSUPP indicates that the datapath
> +     * does not support sFlow, as does a null pointer.
> +     *
> +     * A probability of 0 means sample no packets, UINT32_MAX means sample
> +     * every packet, and other values are intermediate probabilities. */

Same here.

> --- a/lib/dpif.c
> +++ b/lib/dpif.c
> @@ -1,5 +1,5 @@
> /*
> - * Copyright (c) 2008, 2009 Nicira Networks.
> + * Copyright (c) 2008, 2009, 2010 Nicira Networks.
>  *
>  * Licensed under the Apache License, Version 2.0 (the "License");
>  * you may not use this file except in compliance with the License.
> @@ -298,6 +298,7 @@ dpif_set_drop_frags(struct dpif *dpif, bool drop_frags)
>     return error;
> }
> 
> +

Just because you love to ding us on these things: I think you've got an extra newline here.  :)

> +/* Retrieve the sFlow sampling probability.  A probability of 0 means sample no
> + * packets, UINT32_MAX means sample every packet, and other values are
> + * intermediate probabilities.

Defining intermediate again.

> +/* Set the sFlow sampling probability.  A probability of 0 means sample no
> + * packets, UINT32_MAX means sample every packet, and other values are
> + * intermediate probabilities.

I know you want to beat me at this point, but I think the definition is non-intuitive to casual readers.

> +static bool
> +ofproto_sflow_options_equal(const struct ofproto_sflow_options *a,
> +                            const struct ofproto_sflow_options *b)
> +{
> +    return (svec_equal(&a->targets, &b->targets)
> +            && a->sampling_rate == b->sampling_rate
> +            && a->polling_interval == b->polling_interval
> +            && a->header_len == b->header_len
> +            && a->sub_id == b->sub_id
> +            && !strcmp(a->agent_device, b->agent_device)
> +            && !strcmp(a->control_ip, b->control_ip));

I believe "agent_device" and "control_ip" can be null based on ofproto_sflow_options_clone() and from the cfg_get_string() calls in bridge.c, which could cause this to puke.

> +static struct ofproto_sflow_options *
> +ofproto_sflow_options_clone(const struct ofproto_sflow_options *old)
> +{
> +    struct ofproto_sflow_options *new = xmemdup(old, sizeof *old);
> +    new->agent_device = old->agent_device ? xstrdup(old->agent_device) : NULL;
> +    new->control_ip = old->control_ip ? xstrdup(old->control_ip) : NULL;
> +    return new;

Did you want to clone the "targets" svec from "options"?

> +static void
> +ofproto_sflow_options_destroy(struct ofproto_sflow_options *options)
> +{
> +    if (options) {
> +        free(options->agent_device);
> +        free(options->control_ip);
> +        free(options);

Did you want to destroy the "targets" svec from "options"?

> +    if (!netdev_get_features(osp->netdev, &current, NULL, NULL, NULL)) {
> +        counters->ifSpeed = netdev_features_to_bps(current);
> +        counters->ifDirection = (netdev_features_is_full_duplex(current)
> +                                 ? 1 : 2);

Since "ifDirection" is overloaded to carry duplex information, I think it would be good to add a comment describing that this is done on purpose.  This is from RFC 3176:

                               /* derived from MAU MIB (RFC 2668)
                                   0 = unknown, 1=full-duplex,
                                   2=half-duplex, 3 = in, 4=out */

> +    } else {
> +        counters->ifSpeed = 100000000;
> +        counters->ifDirection = 1;
> +    }


Given the above information, shouldn't ifDirection be set to 0?

> +    sfl_receiver_set_sFlowRcvrOwner(receiver, "OpenVSwitch sFlow");

Should we spell this out as "Open vSwitch sFlow"?

> +    sfl_receiver_set_sFlowRcvrTimeout(receiver, 0xffffffff);
> +
> +    /* Add a single sampler to represent the whole switch (special <ifIndex>:0

The "whole switch" is a bridge/datapath correct?  This sounds to me like it's representing the entire physical box, which I don't think is true.

> +    /* Set the sampling_rate down in the datapath. */
> +    dpif_set_sflow_probability(os->dpif,
> +                               MAX(1, UINT32_MAX / options->sampling_rate));

Won't this cause a divide by zero crash if someone sets that value in the config file?

> +    if (n_actions > 65536 / sizeof *actions) {

Where did 65536 come from?  In dpif-netdev, it seems like we limit the value to 4096.  It would probably be a good idea to standardize on a size and just define it in datapath-protocol.h.

> +        VLOG_WARN_RL(&rl, "too many actions in sFlow packet (%"PRIu32" > %zu)",
> +                     65536 / sizeof *actions, n_actions);

Same here.

> +    switchElem.flowType.sw.src_vlan = flow.dl_vlan;

My guess is that this is expecting things in host-byte order and does the conversion in the library.  If so, I think you need to do a ntohs() here.

> +    /* Figure out the output ports. */
> +    n_outputs = 0;
> +    for (i = 0; i < n_actions; i++) {
> +        const union odp_action *a = &actions[i];
> +
> +        switch (a->type) {
> +        case ODPAT_OUTPUT:
> +            fs.output = a->output.port;
> +            n_outputs++;
> +            break;
> +
> +        case ODPAT_OUTPUT_GROUP:
> +            n_outputs += (a->output_group.group == DP_GROUP_FLOOD ? os->n_flood
> +                          : a->output_group.group == DP_GROUP_ALL ? os->n_all
> +                          : 0);
> +            break;
...
> +    }
> +    if (n_outputs > 1 || !fs.output) {
> +        /* Setting the high bit means "multiple output ports". */
> +        fs.output = 0x80000000 | n_outputs;
> +    }

I think describing the behavior of the fs.output field a little better would be good, since it's not very intuitive.  Here's the description from RFC 3176:

                                 /* SNMP ifIndex of output interface,
                                     0 if interface is not known.
                                     Set most significant bit to
                                     indicate multiple destination
                                     interfaces (i.e., in case of
                                     broadcast or multicast)
                                     and set lower order bits to
                                     indicate number of destination
                                     interfaces.
                                     Examples:
                                        0x00000002  indicates ifIndex =
                                                    2
                                        0x00000000  ifIndex unknown.
                                        0x80000007  indicates a packet
                                                    sent to 7
                                                    interfaces.
                                        0x80000000  indicates a packet
                                                    sent to an unknown
                                                    number of interfaces
                                                    greater than 1. */

Based on this I don't think the "!fs.output" case will set the correct answer.  Rather than "0x80000000", I think it would be "0x00000000".

> +            oso.agent_device = (char *) cfg_get_string(0, "sflow.%s.agent",
> +                                                       br->name);

The "agent" config parameter doesn't seem to be described in the man page.

> +            oso.control_ip = (char *) cfg_get_string(0,
> +                                                     "bridge.%s.controller.ip",
> +                                                     br->name);

Generally, the local IP address is not explicitly set through this config mechanism.  It doesn't appear that sFlow will get setup if neither this key or the "agent" key is set.  It seems like it would be good for us to attempt to determine this local IP if its not set.

> +By default, 1 out of every 400 packets is sent to the configured sFlow
> +collector.  To override this, set \fBsflow.\fIbridge\fB.sampling\fR to
> +the number of switched packets out of which 1, on average, will be

To me, changing this "1" to "one" would make it read clearer...but maybe that's just me.

--Justin






More information about the dev mailing list