[ovs-dev] [RFC 0/2] dpdk: minor refactor of the initialization step

Aaron Conole aconole at redhat.com
Wed Apr 11 13:21:05 UTC 2018


"Mooney, Sean K" <sean.k.mooney at intel.com> writes:

>> -----Original Message-----
>> From: Aaron Conole [mailto:aconole at redhat.com]
>> Sent: Monday, April 9, 2018 4:32 PM
>> To: Mooney, Sean K <sean.k.mooney at intel.com>
>> Cc: dev at openvswitch.org; Stokes, Ian <ian.stokes at intel.com>; Kevin
>> Traynor <ktraynor at redhat.com>; Ilya Maximets <i.maximets at samsung.com>;
>> Loftus, Ciara <ciara.loftus at intel.com>; Terry Wilson
>> <twilson at redhat.com>; Assaf Muller <assaf at redhat.com>
>> Subject: Re: [RFC 0/2] dpdk: minor refactor of the initialization step
>> 
>> "Mooney, Sean K" <sean.k.mooney at intel.com> writes:
>> 
>> > So just from a deployment tools point of view I would like to point
>> > out that This change could break existing workflow that deploy ovs in
>> a docker container.
>> > Kolla ansible assumes that if the docker ovs_vswitchd container is
>> > still running that the is infact running in dpdk mode when we set
>> > dpdk-init=true.
>> 
>> Is there a way to test this out and see the behavior?
> [Mooney, Sean K] well you could use kolla to deploy ovs-dpdk :)
> Am when I wrote the code I relied on the existing behavior.
> when kolla ansible is deploying openstack we first deploy the ovsdb.
> https://github.com/openstack/kolla-ansible/blob/4c39ea7eccd946775722646eec19f9ea5cbe6eb5/ansible/roles/ovs-dpdk/handlers/main.yml#L2-L37
> Then we start the ovs-vswitchd container
> https://github.com/openstack/kolla-ansible/blob/4c39ea7eccd946775722646eec19f9ea5cbe6eb5/ansible/roles/ovs-dpdk/handlers/main.yml#L55-L73
> finally we configure the bridges and physical interfaces.
> https://github.com/openstack/kolla-ansible/blob/4c39ea7eccd946775722646eec19f9ea5cbe6eb5/ansible/roles/ovs-dpdk/handlers/main.yml#L75-L90
>
> the "- name: Ensuring ovsdpdk bridges are properly setup named" task
> does not use --no-wait when creating bridges and adding interfaces so
> it
> will fail if the vswitchd is not running. This will result in ansible
> stopping to run any futher task on that node and reporting the error
> to the user. If for some reason the ensure bridge task passed the next
> task that check an ip is assigned to the ovs bride would fail.

Okay - thanks for the pointers, I'll check it out.

>> 
>> It does seem strange that for a possible configuration error we abort()
> [Mooney, Sean K] why I would expect this to be standard behavior for any deamon.
> e.g. the damon would validate it config is correct and exit if invalid.
> If we don't abort the vswitch is in an undefined state. Is is still
> using hugepages
> For example if the eal init fails after they are allocated.

I see.  I think as of 18.11 (which won't come for 6 months), it will
always 'fail' in a known state.  But I'll confirm.

One of the reasons I want this field is because there will be ways of
'uninitializing'.  I think it's nice to have the ability for the user to
dynamically enable *and* disable dpdk.  Also, just from a 'selfish'
view, I recently was fixing a bug where dpdk initialization would fail
early with a particular hardware, and it wasn't nice to watch the
respawns (racy gdb attach, and coredumps all over the drive).

>> running the vswitchd (and with --monitor set, it will continue to
>> abort() over and over - so I guess you're also not using the monitor
>> thread?).  In the case that an abort does happen, does the Kolla script
>> distinguish between issues where dpdk setup failed vs. some other
>> software issue?
>> 
>> > Can I request that if you make this change you add something along
>> the
>> > lines of dpdk-init-is-fatal=true/false so that we can explicitly say
>> which behavior we want.
>> > I would not be surprised if people have built monitoring around "is
>> > the ovs-vswitchd running"
>> 
>> I think they have, but I don't know that they use it to infer such low-
>> level details (meaning a crash implies that dpdk configuration is
>> wrong).
> [Mooney, Sean K] they don't use is to infer that dpdk configuring is wronge
> But rather that some configuration was wrong. Dpdk-init is currently considered
> Fatal if it fails so it was treated the same as any other error that would have
> caused the vsiwtchd process to exit. I belive in the opnfv community they used
> the liveleyness of the vswitchd process and in the future dpdk keepalive
> functionality to set the datapalne status filed in neutron for the host.
> this allows openstack to be aware of dataplane outages.

One thing I'll think about more is the dpdk-init.  Maybe, since it is
stored as a string in the database, we can add a few other values there.

For instance,
  dpdk-init=try

It's like 'true' - only it will continue if there's a failure.  Maybe
that works?  Just spit-balling.  Thanks for providing insight into the
way these behaviors are used.

>> 
>> > To infer at least at a highlevel that "everything is fine" where as
>> > the log message/db field proposed Here will invalidate that.
>> 
>> I've added Assaf Mueller from our Open Stack team as well - maybe he
>> has some additional details on those mechanisms outside of Kolla (maybe
>> it exists in some kind of director / other software too, as you point
>> out).
>> 
>> > it would be ease to check that field but its work that needs to be
>> > done in multiple places.
>> 
>> I think such a knob wouldn't be useful.  I believe it would either have
>> to be defaulted to 'dpdk-init-is-fatal=true' to abort on failure (which
>> most users would want to change making it an undesirable default)
> [Mooney, Sean K] I would argue against that I would never deploy with
> dpdk-init-is-fatal=false. If your datapane does not start what is the point
> of running ovs at all? It will not be able to forward packets.
> , or
>> the Kolla ansible scripts (and other detection mechanisms for dpdk
>> failure - if they exist) would need to change.  Maybe there's another
>> approach, though?
>> 
>> >> -----Original Message-----
>> >> From: Aaron Conole [mailto:aconole at redhat.com]
>> >> Sent: Thursday, April 5, 2018 10:23 PM
>> >> To: dev at openvswitch.org
>> >> Cc: Stokes, Ian <ian.stokes at intel.com>; Kevin Traynor
>> >> <ktraynor at redhat.com>; Ilya Maximets <i.maximets at samsung.com>;
>> >> Loftus, Ciara <ciara.loftus at intel.com>; Mooney, Sean K
>> >> <sean.k.mooney at intel.com>; Terry Wilson <twilson at redhat.com>
>> >> Subject: [RFC 0/2] dpdk: minor refactor of the initialization step
>> >>
>> >> Sometimes, DPDK initialization can fail, but ovs-vswitchd will abort
>> >> in that case.  When that occurs, ovs-vswitchd will be restarted by
>> >> the monitor and immediately abort.  This is rather unfriendly to
>> >> users, who would prefer to possibly correct the issue or at least,
>> >> not have lots of processes continually spawning.
>> >>
>> >> This series accepts that rte_eal_init() can and does fail for real.
>> >> It reflects the initialization status in the database, as well as
>> >> adding the DPDK version (where appropriate).
>> >>
>> >> Submitted as RFC to spawn discussion around the type to reflect for
>> >> the initialized information.  Presented here as a boolean - however,
>> >> it might be more interesting to be a 'string' and have more
>> elaborate
>> >> details (ex: 'failed - ovs_strerror(rte_errno)' or 'uninitialized'
>> or
>> >> 'initialized').
>> >>
>> >> Aaron Conole (2):
>> >>   dpdk: allow init to fail
>> >>   dpdk: reflect status and version in the database
>> >>
>> >>  lib/dpdk-stub.c            | 10 ++++++++++
>> >>  lib/dpdk.c                 | 31 +++++++++++++++++++++++++------
>> >>  lib/dpdk.h                 |  3 ++-
>> >>  vswitchd/bridge.c          |  5 +++++
>> >>  vswitchd/vswitch.ovsschema | 11 ++++++++---
>> >>  vswitchd/vswitch.xml       | 11 +++++++++++
>> >>  6 files changed, 61 insertions(+), 10 deletions(-)
>> >>
>> >> --
>> >> 2.14.3
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list