[ovs-dev] [RFC 0/2] 'Graceful restart' of OvS
Mark Michelson
mmichels at redhat.com
Sun Jan 28 14:30:09 UTC 2018
On 01/24/2018 04:02 PM, Aaron Conole wrote:
> Ben Pfaff <blp at ovn.org> writes:
>
>> What I'd really like to start from is a high-level description of how an
>> upgrade would take place. This patch set covers one low-level part of
>> that upgrade, but (as you recognize in your description) there is a
>> bigger set of issues. There have to be handoffs at multiple levels
>> (datapath, controller connections, database connections, ...) and I'd
>> really like to think the whole thing through.
>
> I think there are a few 'types' of upgrade. I had asked this internally
> from the various projects who were requesting this feature, but hadn't
> gotten anything concrete for requirements or use cases. Here's my
> rough guess on what exists for upgrade scenarios:
>
> * 'Solution level' upgrades.
> These are the kinds of upgrades where a large project, which
> integrates Open vSwitch for the networking layer, provide upgrades for
> their nodes. In these cases, I understand that usually the idea is to
> migrate, evacuate, or whatever the correct terminology would be for
> moving VMs and traffic to a 'different node.'
>
> I envision that the 'different node' would be running the upgraded
> version of Open vSwitch userspace (and possibly kernel space)
> already. This scenario is preferred because:
> - It is deterministic (we know how migration / evacuation behaves,
> and we know how a node getting new traffic behaves)
> - It is the recommended way from multiple projects (OpenStack and
> Open Shift)
> - It provides a path to downgrade if something goes wrong (the
> original node hasn't upgraded yet)
>
> It does have some drawbacks. Chiefly, it requires having an available
> node to use as the migration node. I am told that such a requirement
> is a really high burden to place on some of the customers who deploy
> these setups.
>
> * 'Node level' upgrades.
> These are the kinds of upgrades that spawned the RFC. These are
> upgrades where customers want to run 'apt-get upgrade' or 'yum
> upgrade' and have the new software start up, not lose any flows, and
> none of their vms/pods/containers have to be migrated (no standby
> required).
>
> I don't know what it means to 'not lose any flows,' though. I think
> that for the new software to start up and control the kernel datapath,
> the old software needs to have been shutdown. Open vSwitch does
> provide a mechanism to save/restore the OpenFlow rules (the ovs-save
> script), and will do so when a restart is called. That means we can
> make sure that the OpenFlow rules and the datapath rules are preserved
> by applying this series.
>
> But, as we need to preserve information that means additional
> serialization (whether it be as a shell script in the case of
> ovs-save, or json data, or even some kind of binary format),
> deserialization (even if it is executing a script or series of
> scripts), and a stable format for that information (and if something
> like the mac table changes, it will impose requirements on the upgrade
> / downgrade formats that need to be used).
>
> I'm not sure what the great advantage is - obviously we can tell
> users "hey just upgrade, even while traffic is running... mostly
> nothing bad will happen?" There isn't a requirement to have an
> migration node, which probably has a real $ benefit to customers who
> have large data centers and don't need to tie up hardware.
>
> * OVN / orchestration upgrades
> I'm not as familiar with OVN - is there anything *active* that gets
> handled? Can whatever orchestration tool just be torn-down and
> restarted without impacting the network (not just OVN, but say some
> neutron API back-end that calls into OVS)?
As far as the data plane is concerned, ovn-controller is responsible for
handling certain types of messages (DHCP, ARP, IPv6 NS, etc.). In
conversations I've had about upgrades, the downtime of a restart of OVN
is not a concern. This is because the types of packets that
ovn-controller handles are infrequent, and even if we did miss a packet
because we are down, the endpoint would resend that packet type again
evnetually.
I can't speak to how neutron is effected by an OVN restart.
>
> * Any other users who will upgrade?
> I'm not sure. Do we need to classify distros as a different upgrade
> case? Maybe. After all, each distribution packages things a bit
> differently and perhaps layers their cloud offerings, or OpenStack, or
> kuberenetes with Open vSwitch slightly differently. Maybe that can be
> lumped into the other buckets. Maybe each needs to be broken down.
>
> Sorry - it looks like I haven't even come close to an answer for
> anything.
>
>> I guess the other part that I'd like to think through is, what is the
>> actual goal? It's one thing to not lose packet flows but we also need
>> to make sure that the new ovs-vswitchd gets the same OpenFlow flows,
>> etc. and that its internal state (MAC tables etc.) get populated from
>> the old ovs-vswitchd's state, otherwise when the new one takes over
>> there will be blips due to that change.
>
> It's probably good to also understand which blips will always exist
> (there will be some performance degradation while upgrading equivalent
> to XXX, because of the YYY), and which can be handled gracefully.
>
>> The other aspect I'd like to think about is downgrades. One would like
>> to believe that every upgrade goes perfectly, but of course it's not
>> true, and users may be more reluctant to upgrade if they believe that
>> reverting to the previous version is disruptive. I am not sure that
>> downgrades are more difficult, in most ways, but at least they should be
>> considered.
>
> Thanks for this, Ben! It's a lot to digest, and I'll be asking even
> more questions now. :)
>
>> On Fri, Jan 12, 2018 at 02:19:33PM -0500, Aaron Conole wrote:
>>> IMPORTANT: Please remember this is simply a strawman to frame a discussion
>>> around a concept called 'graceful restart.' More to be explained.
>>>
>>> Now that 2.9 work is frozen and the tree will be forked off, I assumed
>>> more extreme and/or interesting ideas might be welcome. As such, here's
>>> something fairly small-ish that provides an interesting behavior called
>>> 'Graceful Restart.' The idea is that when the OvS userspace is being
>>> upgraded, we can leave the existing flows installed in the datapath allowing
>>> existing flows to continue. Once the new versions of the daemons take over,
>>> the standard dump/sweep operations of the revalidator threads will resume
>>> and "Everything Will Just Work(tm)."
>>>
>>> Of course, there are some important corner cases and side effects that
>>> need to be thought out. I've listed the ones I know of here (no particular
>>> order, though):
>>>
>>>
>>> 1. Only the active datapath flows (those installed in the kernel datapath
>>> at the time of 'reload') will remain while the daemons are down. This
>>> means *any* new traffic (possibly even new connections between the same
>>> endpoints) will fail to pass. This even means a ping between endpoints
>>> could start failing (ie: if neighbor entries expire, no ARP/ND can pass
>>> and the neighbor will not be resolved causing send failures - unless
>>> those flows are luckily still in the kernel datapath).
>>>
>>> 1a. This also means that some protocol exchanges might *seem* to
>>> work on first glance, but won't actually proceed. I'm thinking
>>> cases where pings are used as 'keep alives.' That's no different
>>> than existing system. What will be different is the user expectation.
>>> The expectation with a "graceful" restart may be that no such failures
>>> would exist.
>>>
>>> 2. This is a strong knob that a user may accidentally trigger. If they do,
>>> flows will *NEVER* die from the kernel datapath while the daemons are
>>> running. This might be acceptable to keep around. After all, it isn't
>>> a persistent database entry or anything. The flag only exists for the
>>> lifetime of the userspace process (so a restart can also be an effect
>>> which 'clears' the behavior). I'm not sure if this would be acceptable.
>>>
>>> 3. Traffic will pass with no userspace knowledge for a time. I think this
>>> is okay - after all if the OvS daemon is killed flows will stick around.
>>> However, this behavior would go from "well, sometimes it could happen," to
>>> "we plan and/or expect such to happen."
>>>
>>> 4. This only covers the kernel datapath. Userspace datapath implementations
>>> will still lose the entire datapath during restart.
>>>
>>>
>>> There probably exists a better/more efficient/more functionally appropriate
>>> way of achieving the desired effect. This is simply to spawn some discussion
>>> in the upstream community to see if there's a way to achieve this "graceful
>>> restart" effect (ie: not losing existing packet flow) during planned
>>> outages (upgrades, reloads, etc.)
>>>
>>> Since the implementation is subject to complete and total change, I haven't
>>> written any documentation for this feature yet. I'm saving that work for
>>> another spin after getting some feedback. There may be other opportunity,
>>> for instance, to integrate with something like ovs-ctl for a system-agnostic
>>> implementation.
>>>
>>> Aaron Conole (2):
>>> datapath: prevent deletion of flows / datapaths
>>> rhel: tell ovsctl to freeze the datapath
>>>
>>> lib/dpctl.c | 27 +++++++++
>>> lib/dpif-netdev.c | 2 +
>>> lib/dpif-netlink.c | 65 ++++++++++++++++------
>>> lib/dpif-provider.h | 8 +++
>>> lib/dpif.c | 22 ++++++++
>>> lib/dpif.h | 2 +
>>> .../usr_lib_systemd_system_ovs-vswitchd.service.in | 2 +-
>>> utilities/ovs-ctl.in | 4 ++
>>> 8 files changed, 115 insertions(+), 17 deletions(-)
>>>
>>> --
>>> 2.14.3
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
More information about the dev
mailing list