[ovs-dev] [PATCH 3/3] cfm: Expose detailed fault status in the database.

Ben Pfaff blp at nicira.com
Wed Feb 8 18:14:05 UTC 2012


On Tue, Feb 07, 2012 at 07:28:54PM -0800, Ethan Jackson wrote:
> The cfm_fault column of the database is the logical OR of a number
> of reasons that CFM can be in a faulted state.  A controller may
> want to have more specific information in which case it can look at
> the cfm_fault_status column which this patch adds.
> 
> Signed-off-by: Ethan Jackson <ethan at nicira.com>

A map from string to bool is almost like a set of strings that
contains only the keys whose values would be "true".  Did you consider
that representation?  The only difference is that the map form makes
it clear that the "false" keys are supported.

Did you think about writing a function to convert a single CFM_FAULT_*
bit to a string?  Then you would avoid duplicating the string names in
more than one place and you could write ds_put_cfm_fault() and the CFM
fault status code in iface_refresh_cfm_stats() as loops.

If I'm reading the code correctly, turning on the override will now
cause a fault to be written into the database but without anything in
cfm_fault_status.  If I'm right about that, then that's pretty
confusing.  I'd suggest allowing/requiring whoever sets the override
to specify the fault reasons to report, and possibly to report in the
database that the fault status was overridden.

(Should the admin be able to override a CFM to *not* faulted?  That
seems reasonably useful.)

It *might* be slightly cleaner to replace maid_recv and overflow_recv
by a single "recv_fault" or similar that is also a bitmap of
CFM_FAULT_*.  Then you could, instead of:

> +        cfm->fault = 0;
> +
> +        if (cfm->maid_recv) {
> +            cfm->maid_recv = false;
> +            cfm->fault |= CFM_FAULT_MAID;
> +        }
> +
> +        if (cfm->overflow_recv) {
> +            cfm->overflow_recv = false;
> +            cfm->fault |= CFM_FAULT_OVERFLOW;
> +        }

Just do:

        cfm->fault = cfm->recv_fault;
        cfm->recv_fault = 0;

Some more inline comments below:

> @@ -1554,10 +1555,24 @@ iface_refresh_cfm_stats(struct iface *iface)
>      fault = ofproto_port_get_cfm_fault(iface->port->bridge->ofproto,
>                                         iface->ofp_port);
>      if (fault >= 0) {
> +        char *keys[]= {"recv", "maid", "rdi", "loopback", "overflow"};
>          bool fault_bool = fault;
> +        size_t i = 0;
> +
> +        bool vals[ARRAY_SIZE(keys)];
> +
> +        vals[i++] = fault & CFM_FAULT_RECV;
> +        vals[i++] = fault & CFM_FAULT_MAID;
> +        vals[i++] = fault & CFM_FAULT_RDI;
> +        vals[i++] = fault & CFM_FAULT_LOOPBACK;
> +        vals[i++] = fault & CFM_FAULT_OVERFLOW;

If you keep the above as-is, then I'd consider adding:
        assert(i == ARRAY_SIZE(keys));
here.

> diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
> index 6f2c458..993dd5d 100644
> --- a/vswitchd/vswitch.ovsschema
> +++ b/vswitchd/vswitch.ovsschema
> @@ -1,6 +1,6 @@
>  {"name": "Open_vSwitch",
> - "version": "6.5.0",
> - "cksum": "2847700438 16419",
> + "version": "7.0.0",
> + "cksum": "3395223101 16535",

Why did you increment the major version?  As far as I can tell, the
new schema is compatible with the old one, because it adds new fields
but does not remove any old ones.

> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 7be7891..812caa4 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -1699,6 +1699,35 @@
>          </p>
>        </column>
>  
> +      <column name="cfm_fault_status" key="recv">
> +        When true, indicates a CFM fault was triggered due to a lack fo CCMs

s/fo/of/

> +        received on the <ref table="Interface"/>.
> +      </column>
> +
> +      <column name="cfm_fault_status" key="rdi">
> +        When true, indicates a CFM fault was triggered due to the reception of
> +        a CCM with the RDI bit flagged.  Endpoints set the RDI bit in their
> +        CCMs when they are not receiving CCMs themselves.  This typically
> +        indicates a unidirectinoal connectivity failure.
> +      </column>
> +
> +      <column name="cfm_fault_status" key="maid">
> +        When true, indicates a CFM fault was triggered due to the reception of
> +        a CCM with a MAID other than the one Open vSwitch uses.

Would you mind adding a few words here describing the implications of
"a MAID other than the one Open vSwitch uses"?

> +      </column>
> +
> +      <column name="cfm_fault_status" key="loopback">
> +        When true, indicates a CFM fault was triggered due to the reception of
> +        a CCM advertising the same MPID configured in the
> +        <ref column="cfm_mpid"/> of this <ref table="Interface"/>  this may

"this" starts a new sentence so use a period and a capital letter
here?

> +        indicate a loop in the network.
> +      </column>
> +
> +      <column name="cfm_fault_status" key="overflow">
> +        When true, indicates a CFM fault was triggered because the CFM module
> +        received CCMs from more remote endpoints than it can keep track of.
> +      </column>



More information about the dev mailing list