[ovs-dev] [RFC 0/2] dpdk: minor refactor of the initialization step

Mooney, Sean K sean.k.mooney at intel.com
Tue Apr 10 11:36:39 UTC 2018



> -----Original Message-----
> From: Aaron Conole [mailto:aconole at redhat.com]
> Sent: Monday, April 9, 2018 4:32 PM
> To: Mooney, Sean K <sean.k.mooney at intel.com>
> Cc: dev at openvswitch.org; Stokes, Ian <ian.stokes at intel.com>; Kevin
> Traynor <ktraynor at redhat.com>; Ilya Maximets <i.maximets at samsung.com>;
> Loftus, Ciara <ciara.loftus at intel.com>; Terry Wilson
> <twilson at redhat.com>; Assaf Muller <assaf at redhat.com>
> Subject: Re: [RFC 0/2] dpdk: minor refactor of the initialization step
> 
> "Mooney, Sean K" <sean.k.mooney at intel.com> writes:
> 
> > So just from a deployment tools point of view I would like to point
> > out that This change could break existing workflow that deploy ovs in
> a docker container.
> > Kolla ansible assumes that if the docker ovs_vswitchd container is
> > still running that the is infact running in dpdk mode when we set
> > dpdk-init=true.
> 
> Is there a way to test this out and see the behavior?
[Mooney, Sean K] well you could use kolla to deploy ovs-dpdk :)
Am when I wrote the code I relied on the existing behavior.
when kolla ansible is deploying openstack we first deploy the ovsdb.
https://github.com/openstack/kolla-ansible/blob/4c39ea7eccd946775722646eec19f9ea5cbe6eb5/ansible/roles/ovs-dpdk/handlers/main.yml#L2-L37
Then we start the ovs-vswitchd container
https://github.com/openstack/kolla-ansible/blob/4c39ea7eccd946775722646eec19f9ea5cbe6eb5/ansible/roles/ovs-dpdk/handlers/main.yml#L55-L73
finally we configure the bridges and physical interfaces.
https://github.com/openstack/kolla-ansible/blob/4c39ea7eccd946775722646eec19f9ea5cbe6eb5/ansible/roles/ovs-dpdk/handlers/main.yml#L75-L90

the "- name: Ensuring ovsdpdk bridges are properly setup named" task does not use --no-wait when creating bridges and adding interfaces so it
will fail if the vswitchd is not running. This will result in ansible stopping to run any futher task on that node and reporting the error
to the user. If for some reason the ensure bridge task passed the next task that check an ip is assigned to the ovs bride would fail.
> 
> It does seem strange that for a possible configuration error we abort()
[Mooney, Sean K] why I would expect this to be standard behavior for any deamon.
e.g. the damon would validate it config is correct and exit if invalid.
If we don't abort the vswitch is in an undefined state. Is is still using hugepages
For example if the eal init fails after they are allocated.
> running the vswitchd (and with --monitor set, it will continue to
> abort() over and over - so I guess you're also not using the monitor
> thread?).  In the case that an abort does happen, does the Kolla script
> distinguish between issues where dpdk setup failed vs. some other
> software issue?
> 
> > Can I request that if you make this change you add something along
> the
> > lines of dpdk-init-is-fatal=true/false so that we can explicitly say
> which behavior we want.
> > I would not be surprised if people have built monitoring around "is
> > the ovs-vswitchd running"
> 
> I think they have, but I don't know that they use it to infer such low-
> level details (meaning a crash implies that dpdk configuration is
> wrong).
[Mooney, Sean K] they don't use is to infer that dpdk configuring is wronge
But rather that some configuration was wrong. Dpdk-init is currently considered
Fatal if it fails so it was treated the same as any other error that would have
caused the vsiwtchd process to exit. I belive in the opnfv community they used
the liveleyness of the vswitchd process and in the future dpdk keepalive
functionality to set the datapalne status filed in neutron for the host.
this allows openstack to be aware of dataplane outages.
> 
> > To infer at least at a highlevel that "everything is fine" where as
> > the log message/db field proposed Here will invalidate that.
> 
> I've added Assaf Mueller from our Open Stack team as well - maybe he
> has some additional details on those mechanisms outside of Kolla (maybe
> it exists in some kind of director / other software too, as you point
> out).
> 
> > it would be ease to check that field but its work that needs to be
> > done in multiple places.
> 
> I think such a knob wouldn't be useful.  I believe it would either have
> to be defaulted to 'dpdk-init-is-fatal=true' to abort on failure (which
> most users would want to change making it an undesirable default)
[Mooney, Sean K] I would argue against that I would never deploy with
dpdk-init-is-fatal=false. If your datapane does not start what is the point
of running ovs at all? It will not be able to forward packets.
, or
> the Kolla ansible scripts (and other detection mechanisms for dpdk
> failure - if they exist) would need to change.  Maybe there's another
> approach, though?
> 
> >> -----Original Message-----
> >> From: Aaron Conole [mailto:aconole at redhat.com]
> >> Sent: Thursday, April 5, 2018 10:23 PM
> >> To: dev at openvswitch.org
> >> Cc: Stokes, Ian <ian.stokes at intel.com>; Kevin Traynor
> >> <ktraynor at redhat.com>; Ilya Maximets <i.maximets at samsung.com>;
> >> Loftus, Ciara <ciara.loftus at intel.com>; Mooney, Sean K
> >> <sean.k.mooney at intel.com>; Terry Wilson <twilson at redhat.com>
> >> Subject: [RFC 0/2] dpdk: minor refactor of the initialization step
> >>
> >> Sometimes, DPDK initialization can fail, but ovs-vswitchd will abort
> >> in that case.  When that occurs, ovs-vswitchd will be restarted by
> >> the monitor and immediately abort.  This is rather unfriendly to
> >> users, who would prefer to possibly correct the issue or at least,
> >> not have lots of processes continually spawning.
> >>
> >> This series accepts that rte_eal_init() can and does fail for real.
> >> It reflects the initialization status in the database, as well as
> >> adding the DPDK version (where appropriate).
> >>
> >> Submitted as RFC to spawn discussion around the type to reflect for
> >> the initialized information.  Presented here as a boolean - however,
> >> it might be more interesting to be a 'string' and have more
> elaborate
> >> details (ex: 'failed - ovs_strerror(rte_errno)' or 'uninitialized'
> or
> >> 'initialized').
> >>
> >> Aaron Conole (2):
> >>   dpdk: allow init to fail
> >>   dpdk: reflect status and version in the database
> >>
> >>  lib/dpdk-stub.c            | 10 ++++++++++
> >>  lib/dpdk.c                 | 31 +++++++++++++++++++++++++------
> >>  lib/dpdk.h                 |  3 ++-
> >>  vswitchd/bridge.c          |  5 +++++
> >>  vswitchd/vswitch.ovsschema | 11 ++++++++---
> >>  vswitchd/vswitch.xml       | 11 +++++++++++
> >>  6 files changed, 61 insertions(+), 10 deletions(-)
> >>
> >> --
> >> 2.14.3


More information about the dev mailing list