[ovs-dev] [RFC 0/2] dpdk: minor refactor of the initialization step

Mooney, Sean K sean.k.mooney at intel.com
Wed Apr 11 15:54:32 UTC 2018



> -----Original Message-----
> From: Kevin Traynor [mailto:ktraynor at redhat.com]
> Sent: Wednesday, April 11, 2018 4:03 PM
> To: Aaron Conole <aconole at redhat.com>; Mooney, Sean K
> <sean.k.mooney at intel.com>
> Cc: dev at openvswitch.org; Ilya Maximets <i.maximets at samsung.com>; Assaf
> Muller <assaf at redhat.com>
> Subject: Re: [ovs-dev] [RFC 0/2] dpdk: minor refactor of the
> initialization step
> 
> On 04/11/2018 02:21 PM, Aaron Conole wrote:
> > "Mooney, Sean K" <sean.k.mooney at intel.com> writes:
> >
> >>> -----Original Message-----
> >>> From: Aaron Conole [mailto:aconole at redhat.com]
> >>> Sent: Monday, April 9, 2018 4:32 PM
> >>> To: Mooney, Sean K <sean.k.mooney at intel.com>
> >>> Cc: dev at openvswitch.org; Stokes, Ian <ian.stokes at intel.com>; Kevin
> >>> Traynor <ktraynor at redhat.com>; Ilya Maximets
> >>> <i.maximets at samsung.com>; Loftus, Ciara <ciara.loftus at intel.com>;
> >>> Terry Wilson <twilson at redhat.com>; Assaf Muller <assaf at redhat.com>
> >>> Subject: Re: [RFC 0/2] dpdk: minor refactor of the initialization
> >>> step
> >>>
> >>> "Mooney, Sean K" <sean.k.mooney at intel.com> writes:
> >>>
> >>>> So just from a deployment tools point of view I would like to
> point
> >>>> out that This change could break existing workflow that deploy ovs
> >>>> in
> >>> a docker container.
> >>>> Kolla ansible assumes that if the docker ovs_vswitchd container is
> >>>> still running that the is infact running in dpdk mode when we set
> >>>> dpdk-init=true.
> >>>
> >>> Is there a way to test this out and see the behavior?
> >> [Mooney, Sean K] well you could use kolla to deploy ovs-dpdk :) Am
> >> when I wrote the code I relied on the existing behavior.
> >> when kolla ansible is deploying openstack we first deploy the ovsdb.
> >> https://github.com/openstack/kolla-
> ansible/blob/4c39ea7eccd9467757226
> >> 46eec19f9ea5cbe6eb5/ansible/roles/ovs-dpdk/handlers/main.yml#L2-L37
> >> Then we start the ovs-vswitchd container
> >> https://github.com/openstack/kolla-
> ansible/blob/4c39ea7eccd9467757226
> >> 46eec19f9ea5cbe6eb5/ansible/roles/ovs-dpdk/handlers/main.yml#L55-L73
> >> finally we configure the bridges and physical interfaces.
> >> https://github.com/openstack/kolla-
> ansible/blob/4c39ea7eccd9467757226
> >> 46eec19f9ea5cbe6eb5/ansible/roles/ovs-dpdk/handlers/main.yml#L75-L90
> >>
> >> the "- name: Ensuring ovsdpdk bridges are properly setup named" task
> >> does not use --no-wait when creating bridges and adding interfaces
> so
> >> it will fail if the vswitchd is not running. This will result in
> >> ansible stopping to run any futher task on that node and reporting
> >> the error to the user. If for some reason the ensure bridge task
> >> passed the next task that check an ip is assigned to the ovs bride
> >> would fail.
> >
> > Okay - thanks for the pointers, I'll check it out.
> >
> >>>
> >>> It does seem strange that for a possible configuration error we
> >>> abort()
> >> [Mooney, Sean K] why I would expect this to be standard behavior for
> any deamon.
> >> e.g. the damon would validate it config is correct and exit if
> invalid.
> >> If we don't abort the vswitch is in an undefined state. Is is still
> >> using hugepages For example if the eal init fails after they are
> >> allocated.
> >
> > I see.  I think as of 18.11 (which won't come for 6 months), it will
> > always 'fail' in a known state.  But I'll confirm.
> >
> > One of the reasons I want this field is because there will be ways of
> > 'uninitializing'.  I think it's nice to have the ability for the user
> > to dynamically enable *and* disable dpdk.  Also, just from a
> 'selfish'
> > view, I recently was fixing a bug where dpdk initialization would
> fail
> > early with a particular hardware, and it wasn't nice to watch the
> > respawns (racy gdb attach, and coredumps all over the drive).
> >
> >>> running the vswitchd (and with --monitor set, it will continue to
> >>> abort() over and over - so I guess you're also not using the
> monitor
> >>> thread?).  In the case that an abort does happen, does the Kolla
> >>> script distinguish between issues where dpdk setup failed vs. some
> >>> other software issue?
> >>>
> >>>> Can I request that if you make this change you add something along
> >>> the
> >>>> lines of dpdk-init-is-fatal=true/false so that we can explicitly
> >>>> say
> >>> which behavior we want.
> >>>> I would not be surprised if people have built monitoring around
> "is
> >>>> the ovs-vswitchd running"
> >>>
> >>> I think they have, but I don't know that they use it to infer such
> >>> low- level details (meaning a crash implies that dpdk configuration
> >>> is wrong).
> >> [Mooney, Sean K] they don't use is to infer that dpdk configuring is
> >> wronge But rather that some configuration was wrong. Dpdk-init is
> >> currently considered Fatal if it fails so it was treated the same as
> >> any other error that would have caused the vsiwtchd process to exit.
> >> I belive in the opnfv community they used the liveleyness of the
> >> vswitchd process and in the future dpdk keepalive functionality to
> set the datapalne status filed in neutron for the host.
> >> this allows openstack to be aware of dataplane outages.
> >
> > One thing I'll think about more is the dpdk-init.  Maybe, since it is
> > stored as a string in the database, we can add a few other values
> there.
> >
> > For instance,
> >   dpdk-init=try
> >
> > It's like 'true' - only it will continue if there's a failure.  Maybe
> > that works?  Just spit-balling.  Thanks for providing insight into
> the
> > way these behaviors are used.
> >
> 
> It seems like a good idea to me. Preserves the existing behavior and
> gives the opportunity for using in the other model too.
[Mooney, Sean K] yes I tend to agree. Dpdk is opt-in anyway and by adding a new
Value to the existing dpdk-init we get to keep the existing behavior and support
The new behavior without requiring change to existing tooling. Tooling that
Understands the new functionality can then opt in by seting dpdk-init=try when
That support is added.
> 
> >>>
> >>>> To infer at least at a highlevel that "everything is fine" where
> as
> >>>> the log message/db field proposed Here will invalidate that.
> >>>
> >>> I've added Assaf Mueller from our Open Stack team as well - maybe
> he
> >>> has some additional details on those mechanisms outside of Kolla
> >>> (maybe it exists in some kind of director / other software too, as
> >>> you point out).
> >>>
> >>>> it would be ease to check that field but its work that needs to be
> >>>> done in multiple places.
> >>>
> >>> I think such a knob wouldn't be useful.  I believe it would either
> >>> have to be defaulted to 'dpdk-init-is-fatal=true' to abort on
> >>> failure (which most users would want to change making it an
> >>> undesirable default)
> >> [Mooney, Sean K] I would argue against that I would never deploy
> with
> >> dpdk-init-is-fatal=false. If your datapane does not start what is
> the
> >> point of running ovs at all? It will not be able to forward packets.
> >> , or
> >>> the Kolla ansible scripts (and other detection mechanisms for dpdk
> >>> failure - if they exist) would need to change.  Maybe there's
> >>> another approach, though?
> >>>
> >>>>> -----Original Message-----
> >>>>> From: Aaron Conole [mailto:aconole at redhat.com]
> >>>>> Sent: Thursday, April 5, 2018 10:23 PM
> >>>>> To: dev at openvswitch.org
> >>>>> Cc: Stokes, Ian <ian.stokes at intel.com>; Kevin Traynor
> >>>>> <ktraynor at redhat.com>; Ilya Maximets <i.maximets at samsung.com>;
> >>>>> Loftus, Ciara <ciara.loftus at intel.com>; Mooney, Sean K
> >>>>> <sean.k.mooney at intel.com>; Terry Wilson <twilson at redhat.com>
> >>>>> Subject: [RFC 0/2] dpdk: minor refactor of the initialization
> step
> >>>>>
> >>>>> Sometimes, DPDK initialization can fail, but ovs-vswitchd will
> >>>>> abort in that case.  When that occurs, ovs-vswitchd will be
> >>>>> restarted by the monitor and immediately abort.  This is rather
> >>>>> unfriendly to users, who would prefer to possibly correct the
> >>>>> issue or at least, not have lots of processes continually
> spawning.
> >>>>>
> >>>>> This series accepts that rte_eal_init() can and does fail for
> real.
> >>>>> It reflects the initialization status in the database, as well as
> >>>>> adding the DPDK version (where appropriate).
> >>>>>
> >>>>> Submitted as RFC to spawn discussion around the type to reflect
> >>>>> for the initialized information.  Presented here as a boolean -
> >>>>> however, it might be more interesting to be a 'string' and have
> >>>>> more
> >>> elaborate
> >>>>> details (ex: 'failed - ovs_strerror(rte_errno)' or
> 'uninitialized'
> >>> or
> >>>>> 'initialized').
> >>>>>
> >>>>> Aaron Conole (2):
> >>>>>   dpdk: allow init to fail
> >>>>>   dpdk: reflect status and version in the database
> >>>>>
> >>>>>  lib/dpdk-stub.c            | 10 ++++++++++
> >>>>>  lib/dpdk.c                 | 31 +++++++++++++++++++++++++------
> >>>>>  lib/dpdk.h                 |  3 ++-
> >>>>>  vswitchd/bridge.c          |  5 +++++
> >>>>>  vswitchd/vswitch.ovsschema | 11 ++++++++---
> >>>>>  vswitchd/vswitch.xml       | 11 +++++++++++
> >>>>>  6 files changed, 61 insertions(+), 10 deletions(-)
> >>>>>
> >>>>> --
> >>>>> 2.14.3
> >> _______________________________________________
> >> dev mailing list
> >> dev at openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >



More information about the dev mailing list