[ovs-dev] [PATCH v8 0/5] Convert DPDK configuration from command line to DB based

Mooney, Sean K sean.k.mooney at intel.com
Wed Feb 17 00:00:50 UTC 2016


Hi sorry to jump back to this mail as I know the v9 patches are in flight.
I was on vacation for the last few days but my responses are inline.
I'll try to look at the revised patches also but I think you have addressed most of my concerns. 

> -----Original Message-----
> From: Aaron Conole [mailto:aconole at redhat.com]
> Sent: Thursday, February 11, 2016 6:13 PM
> To: Mooney, Sean K <sean.k.mooney at intel.com>
> Cc: Traynor, Kevin <kevin.traynor at intel.com>; dev at openvswitch.org;
> Flavio Leitner <fbl at sysclose.org>
> Subject: Re: [ovs-dev] [PATCH v8 0/5] Convert DPDK configuration from
> command line to DB based
> 
> Hi Sean,
> 
> "Mooney, Sean K" <sean.k.mooney at intel.com> writes:
> 
> > Overall I like this change but as a user of ovs with dpdk and A
> > maintainer of a plugin to deploy it I have some comments inline.
> 
> AWESOME! Seriously, I will probably be bothering you a lot because you
> are the person I want to hear from the most on this patchset. :)
> 
> >> -----Original Message-----
> >> From: Aaron Conole [mailto:aconole at bytheb.org]
> >> Sent: Wednesday, February 10, 2016 1:23 PM
> >> To: Traynor, Kevin
> >> Cc: dev at openvswitch.org; Flavio Leitner; Mooney, Sean K
> >> Subject: Re: [ovs-dev] [PATCH v8 0/5] Convert DPDK configuration from
> >> command line to DB based
> >>
> >> "Traynor, Kevin" <kevin.traynor at intel.com> writes:
> >> >> -----Original Message-----
> >> >> From: Aaron Conole [mailto:aconole at redhat.com]
> >> >> Sent: Friday, January 29, 2016 5:57 PM
> >> >> To: dev at openvswitch.org
> >> >> Cc: Flavio Leitner <fbl at sysclose.org>; Panu Matilainen
> >> >> <pmatilai at redhat.com>; Traynor, Kevin <kevin.traynor at intel.com>;
> >> >> Zoltan Kiss <zoltan.kiss at linaro.org>; Christian Ehrhardt
> >> >> <christian.ehrhardt at canonical.com>
> >> >> Subject: [PATCH v8 0/5] Convert DPDK configuration from command
> >> >> line to DB based
> >> >>
> >> >> Currently, configuration of DPDK parameters is done via the
> >> >> command line through a --dpdk **OPTIONS** -- command line
> >> >> argument. This has a number of challenges, including:
> >> >> * It must be the first option passed to ovs-vswitchd
> >> >> * It breaks from the way most other things are configured in OVS
> >> >> * It doesn't allow an easy way to populate defaults
> >> >>
> >> >>
> >> >> This series brings the following changes to openvswitch:
> >> >> * All DPDK options are taken from the ovs database rather than the
> >> >>   command line
> >> >> * DPDK lcores are optionally auto-assigned to a single core based
> >> >> on
> >> the
> >> >>   bridge coremask.
> >> >> * Updated documentation
> >> >>
> >> >> v2:
> >> >> * Dropped the vhost-user socket configuration options. Those can
> >> >> be
> >> re-added
> >> >>   as an extension
> >> >> * Incorporated feedback from Kevin Traynor.
> >> >>
> >> >> v3:
> >> >> * Went back to a global dpdk-init
> >> >> * Language cleanup and various minor fixes
> >> >>
> >> >> v4:
> >> >> * Added a way to pass arbitrary eal arguments
> >> >>
> >> >> v5:
> >> >> * Restore the socket-mem default, and fix up the ovs-dev.py
> >> >> script,
> >> along
> >> >>   with the manpage for ovsdb-server
> >> >>
> >> >> v6:
> >> >> * Correct a documentation issue with INSTALL.DPDK.md
> >> >> * Correct a non-dpdk enabled OVS incorrect warning variable
> >> >> * Remove an excess whitespace
> >> >>
> >> >> v7:
> >> >> * After testing by Christian with dpdk-alloc-mem
> >> >>
> >> >> v8:
> >> >> * Confirmed ``make check`` operation with and without dpdk.
> >> >>   Retested on live-host
> >> >
> >> > Hi,
> >> >
> >> > I've done some testing on this patchset and I couldn't find any
> >> > issues.
> >>
> >> Cool; does that mean I have your Tested-by? :)
> >>
> >> >  - tested that -c and -n defaults and explicit values are catered
> >> > for
> >> >  - tested dpdk-init=t/f leads to dpdk initialization or not
> >> >  - tested that use of both dpdk-socket-mem and dpdk-alloc-mem is
> >> > caught
> >> >  - tested that a string can be passed in through extra_args
> >> >  - tested the code won't catch using a db entry dpdk-socket-mem and
> >> also
> >> >    putting --socket-mem in extra_args, however dpdk will barf
> >> >
> >> > On command line args vs. db entries vs. a string of args in the db,
> >> > if there is doubt on this then let's debate further. This will
> >> > change how ovs with dpdk is used, so better debate it out and get
> it right.
> >>
> >> I don't think there's any real doubt. I think the approach is the
> >> best way to do this. I have agreement from almost everyone else, I
> think?
> >> Anyone still need to be convinced?
> > From an orchestration point of view the explcit values are nice but
> > form a Deployment point of view I favor only having
> > dpdk_init:true/false and dpdk_extra:<cmd line sting>
> 
> Okay, but I will point out an advantage of doing it the other way means
> saving off DBs and replicating them in other environments. While it's
> true you can do that with just the dpdk_extra flag, it means you have to
> read the extras, find the exact part in the string to change, and write
> back. With enumerated db entries, you can just modify those things you
> care to change. I think it's just a 'getting used to it' rather than
> 'omg a workflow has changed', but that's only my opinion.
> 
> > In particular if is set a value in dpdk_extra and in the future you
> > add a explicit entry in the db to Hold that value I never want it to
> > brake my deployment code which still uses dpdk_extra. Where db entries
> > Exist for an item it would make my life eaiser if they were always set
> > automatically when the vswitchd starts
> 
> Okay, I've rebased my series and included this as patch 5/6. Just doing
> some brief testing now, and I'll repost to the list.
> 
> > e.g. if I never spcify the vhostuser_socket_dir it would be nice if it
> > was set by ovsvswitchd to the compile time default.
> > Similar if I override the default of a value in dpdk_extra then it
> > would be nice if when an explicit value for that item exits In the db
> > that it be updated to match the value specified in the dpdk_extra
> args.
> 
> I would rather not try to do this. Instead, my approach is emit a
> warning that the value in the dpdk_extras string was used, and let the
> user do such management. There's only so much intelligence that I think
> OVS should provide.

For context I am considering the usecase where ovs is managed via openstack,
either directly or via odl. In the odl case OpenStack neutron does not have any
direct access to the host or the ovs instance that it is running. As a result neutron
can only discover information about how ovs is configured by querying the odl topology api
Which has a copy of the ovsdb tables. As such I cannot programmatically retrieve 
that warning and use it to determine which value to use. This is not an issue provided
there is a rule that either the explicit db entry or the dpdk_extra string takes precedence
I just need to know which one to prefer.

The reason I would like this to always be set is, if I have ovs with dpdk support 
and I am running it with the defaults install from the package manager e.g. I do not 
Manually configure the vhostuser_socket_dir, I would still like it to be reported in
the ovsdb. The issue that I would like to address is that the vhostuser_socket_dir
default location depends on os as it is tied to the default rundir location. 
On fedora 20+ it is /var/run/openvswitch on centos 7 and  Ubuntu 14.04+ it
is /usr/var/run/openvswitch. 

When ovs is managed via the OpenStack neutron agent we added a per host config option 
to the agent to specify the vhostuser_socket_dir. This allows openstack neutron to tell nova
what the path should be when spawning a vm. In the case of odl there is no local agent 
on the node and also no config file. Currently as there is no way to discover the vhostuser_socket_dir
via ovsdb or openflow we are forced to assume it is /var/run/openvswitch and that either 
ovs was compiled with --with-rundir=/var/run/openvswitch or that it is started with
 -vhost_sock_dir /var/run/openvswitch

If the vhostuser_socket_dir was always reported in the ovsdb then we could always
Read this and support multiple distros more easily. We can work around this by falling back to
/var/run/openvswitch if the value is not found in the ovsdb but where these values have compilation
Time defaults it would be nice if they were reflected in the ovsdb also. Maybe the compile
Time default could be set as part of ovsdb-tool create? That way the ovsvswitchd can be kept simpler
As it would not have to set them.


> 
> > If this is not the case I cannot true the explicit values in the ovsdb
> > as they may not be valid.
> 
> Not sure what you mean here. Are you saying that you won't know where to
> look to get the real values? I agree - that's why I emit the warning
> saying where the value comes from and hope that such log will be
> watched. Does that sound unreasonable?

I did not phrase that well. Basically if a key is set both in the dpdk
Extra string and as an explicit value I need to know which one reflects the
Actual runtime configuration of ovs. E.g. always prefer the explicit value
Or always prefer the dpdk_extra value.


> 
> >>
> >> > There's one or two of the db entries that may be able to reused
> >> > later for other things e.g. vhostuser socket location, so that
> >> > would be a +
> >> for them.
> >> > Backwards compatibility would be a + for command line args. Daniele
> >> > has mentioned scripting also. I'm sure there's other +/-'s.
> >>
> >> I don't know - scripting vswitchd? I think that sounds a little
> >> strange;
> > It depend on your usecase. I am a developer on neutron integrating ovs
> > with dpkd with Openstack. I also maintain a devstack plugin to deploy
> > ovs with dpdk from source.
> >
> > In keeping with other service in devstack we run the ovsvswitchd in a
> > screen session.
> > https://github.com/openstack/networking-ovs-dpdk/blob/master/devstack/
> > ovs-dpdk/ovs-dpdk-init#L528 we heavily script the compilation and
> > deployment of ovs and treat it like any other ephemeral service.
> > Our ovs-dpdk service file is 700 lines long and we have separate logic
> > to compile and install it.
> >
> > In production sure it won't be started and stopped frequently but when
> > used in a development environment It can  have a very sort lifetime.
> 
> Wow. That sounds kindof horrible, actually. There's a script included
> with OVS which compiles and starts OVS (see utilities/ovs-dev.py). I
> think that should be preferred, and will let you specify the arguments
> you wish, I think.

Actually our code to build and install ovs is only about 30 lines so the
utilities/ovs-dev.py script would not  dramatically reduces our maintenance
burden in that respect but I will look into it. the original deployment code
we have predates dpdk support in ovs and utilities/ovs-dev.py by several months
as we used it internally to deploy ovs-dpdk when dpdk support was being added.

A large portion of the complexity in those 700 lines exist because of how
dpdk physical port have to be added to ovs. Adding dpdk phy ports is my
number 1 useablity issue with ovs-dpdk.

The current condensed workflow is as follows

Map netdev names to pci address for each interface
Bind interface to dpdk driver
Generate ovs port name for each netdev as dpdkX where X is the index into the array
Of dpdk ports sorted by pciaddress
Start ovsdb
Create bridges,set datapath=netdev and add port with --no-wait as ovsvswitchd is not running.
And finally start the ovsvswitchd with the dpdk interface in the eal pci whitelist.
Note: the whitelist is need to ensure that the array of dpdk pmds create in the eal init is the
Same as the one we used earlier to calculate the dpdkX names

If we could simply add an dpdk phy port while the ovsvswitchd is running
By simply using the following workflow it would improve the usability greatly.
Start ovsdb and ovsvswitchd normally
Bind interface to dpdk driver then run
ovs-vsctl add-port <bridge> <arbitrary port name> -- set interface <arbitrary port name> type=dpdk other-config="pci-address=xyz"

or if I was even bolder just
Start ovsdb and ovsvswitchd normally
ovs-vsctl add-port <bridge> <arbitrary port name> -- set interface <arbitrary port name> type=dpdk other-config="pci-address=xyz,driver=igb_uio,olddriver=ixgbe"

though I understand why ovs does not want to manage the bining/unbining of the interface to the dpdk driver but that is
the hardest part to script correctly.

Sorry for the mini rant but I like the direction that this patch chain is takeing ovs configuration
It will improve usability with dpdk a lot. The physical port issue is just one I have been dealing with
Since before ovs had support for dpdk in tree.

> 
> Regardless, sure that script would have to change a bit, but I think
> you'll want to change it anyway (there's a new user feature in OVS, plus
> I'm adding support for vhost-user socket group ids, so the 'sg' will not
> be the advised way of starting ovs-vswitchd, for instance). There have


That really good to hear as that was an annoying pain point to work around.
Will this by default allow read write access to the same group? If not
I would still need to change the umask to 002. It would be nice to be able
To specify the vhost-user socket permissions also in the event that the group
Will not have read write access by default.

> been some new feature developments with OVS that you could leverage to
> get the same effect with less commands. I would suggest just changing
> L511 to do the db sets there. It has the added effect of shrinking L520
> and L528.

Yes this is what I had in mind. We would simply set these values when we
Create the ovsdb. The actually changes required on my side are minimal.

> 
> Not to mention -vhost_sock_dir is not actually an EAL argument (which is
> why I'm changing that whole infrastructure). If that got passed to dpdk
> directly, it would not know what to do with it.
> 
> Anyway, I realize that there could be some breakage, and would happily
> work with you to resolve it :)
> 


Thanks I think the changes should be easy enough. 
The only tricky bit is ensuring I can deploy ovs commits before and after this
Change with the same init script. I am thinking of just greping the vswitch.ovsschema
File for the new dpdk_init value and then fall back to the old logic
if it was not found and use the new logic if it was.


> >> it isn't some kind of ephemeral service that comes and goes. And it's
> >> not like this patch prevents the same kinds of arbitrary commands to
> >> be passed to the EAL (since 4/5 does precisely that). The only change
> >> required is doing ovs-vsctl before ovs-vswitch in the 'starting
> >> vswitchd' case. Is that really a huge deal?
> >>
> >> There's always pros and cons. I haven't heard any explict NAK, or any
> >> explicit ACK. It would be nice for that to happen, since I can't
> >> maintain this series out-of-tree forever, and there are other things
> >> I'd really like to get to (as fun as db entries may be).
> >>
> >> Any suggestions on how to move forward? I was planning on posting a
> >> rebased v9 - should I still do that?
> >>
> >> -Aaron
> >>
> >> > Kevin.
> >> >
> >> >>
> >> >> Aaron Conole (5):
> >> >>   netdev-dpdk: Restore thread affinity after DPDK init
> >> >>   netdev-dpdk: Convert initialization from cmdline to db
> >> >>   netdev-dpdk: Autofill lcore coremask if absent
> >> >>   netdev-dpdk: Allow arbitrary eal arguments
> >> >>   NEWS: Announce the DPDK EAL configuration change
> >> >>
> >> >>  FAQ.md                     |   6 +-
> >> >>  INSTALL.DPDK.md            |  90 ++++++++++---
> >> >>  NEWS                       |   5 +
> >> >>  lib/netdev-dpdk.c          | 327
> >> ++++++++++++++++++++++++++++++++++++++-----
> >> >> --
> >> >>  lib/netdev-dpdk.h          |  22 ++-
> >> >>  utilities/ovs-dev.py       |   7 +-
> >> >>  vswitchd/bridge.c          |   3 +
> >> >>  vswitchd/ovs-vswitchd.8.in |   5 +-
> >> >>  vswitchd/ovs-vswitchd.c    |  25 +---
> >> >>  vswitchd/vswitch.xml       | 128 +++++++++++++++++-
> >> >>  10 files changed, 513 insertions(+), 105 deletions(-)
> >> >>
> >> >> --
> >> >> 2.5.0
> >> >
> >> > _______________________________________________
> >> > dev mailing list
> >> > dev at openvswitch.org
> >> > http://openvswitch.org/mailman/listinfo/dev



More information about the dev mailing list