[ovs-dev] [RFC 0/2] 'Graceful restart' of OvS

Sun Jan 28 14:30:09 UTC 2018

On 01/24/2018 04:02 PM, Aaron Conole wrote:
> Ben Pfaff <blp at ovn.org> writes:
> 
>> What I'd really like to start from is a high-level description of how an
>> upgrade would take place.  This patch set covers one low-level part of
>> that upgrade, but (as you recognize in your description) there is a
>> bigger set of issues.  There have to be handoffs at multiple levels
>> (datapath, controller connections, database connections, ...) and I'd
>> really like to think the whole thing through.
> 
> I think there are a few 'types' of upgrade.  I had asked this internally
> from the various projects who were requesting this feature, but hadn't
> gotten anything concrete for requirements or use cases.  Here's my
> rough guess on what exists for upgrade scenarios:
> 
> * 'Solution level' upgrades.
>    These are the kinds of upgrades where a large project, which
>    integrates Open vSwitch for the networking layer, provide upgrades for
>    their nodes.  In these cases, I understand that usually the idea is to
>    migrate, evacuate, or whatever the correct terminology would be for
>    moving VMs and traffic to a 'different node.'
>    
>    I envision that the 'different node' would be running the upgraded
>    version of Open vSwitch userspace (and possibly kernel space)
>    already.  This scenario is preferred because:
>      - It is deterministic (we know how migration / evacuation behaves,
>        and we know how a node getting new traffic behaves)
>      - It is the recommended way from multiple projects (OpenStack and
>        Open Shift)
>      - It provides a path to downgrade if something goes wrong (the
>        original node hasn't upgraded yet)
> 
>    It does have some drawbacks.  Chiefly, it requires having an available
>    node to use as the migration node.  I am told that such a requirement
>    is a really high burden to place on some of the customers who deploy
>    these setups.
> 
> * 'Node level' upgrades.
>    These are the kinds of upgrades that spawned the RFC.  These are
>    upgrades where customers want to run 'apt-get upgrade' or 'yum
>    upgrade' and have the new software start up, not lose any flows, and
>    none of their vms/pods/containers have to be migrated (no standby
>    required).
> 
>    I don't know what it means to 'not lose any flows,' though.  I think
>    that for the new software to start up and control the kernel datapath,
>    the old software needs to have been shutdown.  Open vSwitch does
>    provide a mechanism to save/restore the OpenFlow rules (the ovs-save
>    script), and will do so when a restart is called.  That means we can
>    make sure that the OpenFlow rules and the datapath rules are preserved
>    by applying this series.
> 
>    But, as we need to preserve information that means additional
>    serialization (whether it be as a shell script in the case of
>    ovs-save, or json data, or even some kind of binary format),
>    deserialization (even if it is executing a script or series of
>    scripts), and a stable format for that information (and if something
>    like the mac table changes, it will impose requirements on the upgrade
>    / downgrade formats that need to be used).
> 
>    I'm not sure what the great advantage is - obviously we can tell
>    users "hey just upgrade, even while traffic is running... mostly
>    nothing bad will happen?"  There isn't a requirement to have an
>    migration node, which probably has a real $ benefit to customers who
>    have large data centers and don't need to tie up hardware.
> 
> * OVN / orchestration upgrades
>    I'm not as familiar with OVN - is there anything *active* that gets
>    handled?  Can whatever orchestration tool just be torn-down and
>    restarted without impacting the network (not just OVN, but say some
>    neutron API back-end that calls into OVS)?

As far as the data plane is concerned, ovn-controller is responsible for 
handling certain types of messages (DHCP, ARP, IPv6 NS, etc.). In 
conversations I've had about upgrades, the downtime of a restart of OVN 
is not a concern. This is because the types of packets that 
ovn-controller handles are infrequent, and even if we did miss a packet 
because we are down, the endpoint would resend that packet type again 
evnetually.

I can't speak to how neutron is effected by an OVN restart.

> 
> * Any other users who will upgrade?
>    I'm not sure.  Do we need to classify distros as a different upgrade
>    case?  Maybe.  After all, each distribution packages things a bit
>    differently and perhaps layers their cloud offerings, or OpenStack, or
>    kuberenetes with Open vSwitch slightly differently.  Maybe that can be
>    lumped into the other buckets.  Maybe each needs to be broken down.
> 
> Sorry - it looks like I haven't even come close to an answer for
> anything.
> 
>> I guess the other part that I'd like to think through is, what is the
>> actual goal?  It's one thing to not lose packet flows but we also need
>> to make sure that the new ovs-vswitchd gets the same OpenFlow flows,
>> etc. and that its internal state (MAC tables etc.) get populated from
>> the old ovs-vswitchd's state, otherwise when the new one takes over
>> there will be blips due to that change.
> 
> It's probably good to also understand which blips will always exist
> (there will be some performance degradation while upgrading equivalent
> to XXX, because of the YYY), and which can be handled gracefully.
> 
>> The other aspect I'd like to think about is downgrades.  One would like
>> to believe that every upgrade goes perfectly, but of course it's not
>> true, and users may be more reluctant to upgrade if they believe that
>> reverting to the previous version is disruptive.  I am not sure that
>> downgrades are more difficult, in most ways, but at least they should be
>> considered.
> 
> Thanks for this, Ben!  It's a lot to digest, and I'll be asking even
> more questions now. :)
> 
>> On Fri, Jan 12, 2018 at 02:19:33PM -0500, Aaron Conole wrote:
>>> IMPORTANT:  Please remember this is simply a strawman to frame a discussion
>>>              around a concept called 'graceful restart.'  More to be explained.
>>>
>>> Now that 2.9 work is frozen and the tree will be forked off, I assumed
>>> more extreme and/or interesting ideas might be welcome.  As such, here's
>>> something fairly small-ish that provides an interesting behavior called
>>> 'Graceful Restart.'  The idea is that when the OvS userspace is being
>>> upgraded, we can leave the existing flows installed in the datapath allowing
>>> existing flows to continue.  Once the new versions of the daemons take over,
>>> the standard dump/sweep operations of the revalidator threads will resume
>>> and "Everything Will Just Work(tm)."
>>>
>>> Of course, there are some important corner cases and side effects that
>>> need to be thought out.  I've listed the ones I know of here (no particular
>>> order, though):
>>>
>>>
>>> 1. Only the active datapath flows (those installed in the kernel datapath
>>>     at the time of 'reload') will remain while the daemons are down.  This
>>>     means *any* new traffic (possibly even new connections between the same
>>>     endpoints) will fail to pass.  This even means a ping between endpoints
>>>     could start failing (ie: if neighbor entries expire, no ARP/ND can pass
>>>     and the neighbor will not be resolved causing send failures - unless
>>>     those flows are luckily still in the kernel datapath).
>>>
>>>     1a.  This also means that some protocol exchanges might *seem* to
>>>          work on first glance, but won't actually proceed.  I'm thinking
>>>          cases where pings are used as 'keep alives.'  That's no different
>>>          than existing system.  What will be different is the user expectation.
>>>          The expectation with a "graceful" restart may be that no such failures
>>>          would exist.
>>>
>>> 2. This is a strong knob that a user may accidentally trigger.  If they do,
>>>     flows will *NEVER* die from the kernel datapath while the daemons are
>>>     running.  This might be acceptable to keep around.  After all, it isn't
>>>     a persistent database entry or anything.  The flag only exists for the
>>>     lifetime of the userspace process (so a restart can also be an effect
>>>     which 'clears' the behavior).  I'm not sure if this would be acceptable.
>>>
>>> 3. Traffic will pass with no userspace knowledge for a time.  I think this
>>>     is okay - after all if the OvS daemon is killed flows will stick around.
>>>     However, this behavior would go from "well, sometimes it could happen," to
>>>     "we plan and/or expect such to happen."
>>>
>>> 4. This only covers the kernel datapath.  Userspace datapath implementations
>>>     will still lose the entire datapath during restart.
>>>
>>>
>>> There probably exists a better/more efficient/more functionally appropriate
>>> way of achieving the desired effect.  This is simply to spawn some discussion
>>> in the upstream community to see if there's a way to achieve this "graceful
>>> restart" effect (ie: not losing existing packet flow) during planned
>>> outages (upgrades, reloads, etc.)
>>>
>>> Since the implementation is subject to complete and total change, I haven't
>>> written any documentation for this feature yet.  I'm saving that work for
>>> another spin after getting some feedback.  There may be other opportunity,
>>> for instance, to integrate with something like ovs-ctl for a system-agnostic
>>> implementation.
>>>
>>> Aaron Conole (2):
>>>    datapath: prevent deletion of flows / datapaths
>>>    rhel: tell ovsctl to freeze the datapath
>>>
>>>   lib/dpctl.c                                        | 27 +++++++++
>>>   lib/dpif-netdev.c                                  |  2 +
>>>   lib/dpif-netlink.c                                 | 65 ++++++++++++++++------
>>>   lib/dpif-provider.h                                |  8 +++
>>>   lib/dpif.c                                         | 22 ++++++++
>>>   lib/dpif.h                                         |  2 +
>>>   .../usr_lib_systemd_system_ovs-vswitchd.service.in |  2 +-
>>>   utilities/ovs-ctl.in                               |  4 ++
>>>   8 files changed, 115 insertions(+), 17 deletions(-)
>>>
>>> -- 
>>> 2.14.3
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>