[ovs-dev] [PATCH 2/2] secchan: Better tolerate failing controller admission control in fail-open.

Ben Pfaff blp at nicira.com
Tue Sep 15 17:26:44 UTC 2009


Keith Amidon <keith at nicira.com> writes:

> I've a couple of alternative suggestions.  These may already have been
> considered and rejected for good reasons, but I wanted to make sure:
>
>   • Could we depend on the openflow protocol connection establishment protocol?
>       It seems like the controller could reject the openflow connection at
>     three points 1) before any openflow messages are sent (for example if the
>     TLS handshake fails), 2) if the hello handshake does not contain compatible
>     openflow versions, 3) if there is something in the response to the features
>     request the controller doesn't like (for example a duplicate DPID). 
>     Unfortunately, as I read the spec at least, there is no positive
>     acknowledgment of any of these stages.  Instead, the controller sends
>     errors and terminates the connection if it doesn't like how things are
>     proceeding.  However, we could do something like have a user-configurable
>     timeout for which the switch will wait at each of these stages.  If the
>     switch gets past sending the features response (stage 3) and the timeout
>     expires without the controller terminating the connection, the switch could
>     conclude the connection is good and then start using it.

We already handle stages #1 and #2 at a level low enough that the
secchan code won't even know if a connection gets rejected at one
of those stages; the "vconn" code reports it as a failed
connection before it percolates up through rconn into ofproto.

#3 is the trickiest one, as you say.  We could use a timer; we do
use a timer as last resort, in fact: if the connection lasts 30
seconds, we consider it successful.  But we have other heuristics
that generally get the job done much faster (e.g. if the
controller sets up a flow or sends a packet-out, we consider it a
successful connection).

All this has been the same way for some time.  Months, anyhow.

That's not really the issue here.  We already do a good job of
figuring out whether the controller has accepted us.  The
refinement we're now adding is: until the controller has accepted
us, carry along with setting up flows in fail-open mode.  That's
a little harder, and that's Jesse and I are proposing these
various tricky little dances.

>   • Could we use OpenFlow echo request/reply messages?  We're using these now
>     to determine the health of the connection when no messages have been sent
>     for a period of time anyway.  We could send one as early as immediately
>     after the initial hello if that isn't a protocol violation and depend on
>     the reply coming within a user configurable time as well.

NOX and Open vSwitch (and OpenFlow ref. impl., I think, too) all
happily reply to echo requests regardless of whether they've
decided to admit the connection, up until the point of that
decision.

> Regardless of these mechanisms, we still probably need a
> mechanism for timing out connections for which the controller
> never sends any data but holds the TCP connection open, etc.
> I'm sure these already exist and would be addressed by the
> changes already suggested, but include them for completeness.

Yes, we already do that, using echo requests and replies if the
OpenFlow handshake has completed, and otherwise with timers if
the SSL or OpenFlow handshake is still in progress.




More information about the dev mailing list