[ovs-dev] [PATCH v6 7/8] Documentation: Update DPDK doc with Keepalive feature.

Aaron Conole aconole at redhat.com
Wed Jan 24 20:36:29 UTC 2018


Hi Bhanu,

This time around I just reviewed the documentation.  I'll spend some
time on the other patches as I can, but I think at least the
documentation needs more work.

Comments inline.

Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com> writes:

> Keepalive feature is aimed at achieving Fastpath Service Assurance
> in OVS-DPDK deployments. It adds support for monitoring the packet
> processing threads by dispatching heartbeats at regular intervals.

This belongs in the document.

> The implementation uses OvSDB for reporting the health of the PMD threads.

s/implementation/keepalive framework/

> Any external monitoring application can query the OvSDB for status
> at regular intervals (or) subscribe to OvSDB updates.

                                              ^ for

>
> keepalive feature can be enabled through below OVSDB settings.

 ^ The

>
>     enable-keepalive=true
>       - Keepalive feature is disabled by default and should be enabled
>         at startup before ovs-vswitchd daemon is started.
>
>     keepalive-interval="5000"
>       - Timer interval in milliseconds for monitoring the packet
>         processing cores.
>
> When KA is enabled, 'ovs-keepalive' thread shall be spawned that wakes
> up at regular intervals to update the timestamp and status of pmd
> threads in process map. This information shall be read by vswitchd thread

s/shall/will/

> and written in to 'keepalive' column of Open_vSwitch table in OVSDB.
>
> An external monitoring framework like collectd with ovs events support

s/like collectd/(such as collectd)/

> can read (or) subscribe to the datapath status changes in ovsdb. When the state

s/(or)/or/

Everything after 'When the state is updated' is specific to
collectd/ceilometer.  Should it go into a separate document?

> is updated, the collectd shall be notified and will eventually relay the status
> to ceilometer service running in the controller. Below is the high level
> overview of deployment model.
>
>         Compute Node            Controller            Compute Node
>
>         Collectd  <----------> Ceilometer <-------->   Collectd
>
>         OvS DPDK                                       OvS DPDK
>
>         +-----+
>         | VM  |
>         +--+--+
>        \---+---/
>            |
>         +--+---+       +------------+----------+     +------+-------+
>         | OVS  |-----> |   ovsevents plugin    | --> |   collectd   |
>         +--+---+       +------------+----------+     +------+-------+
>
>         +------+-----+     +---------------+------------+     |
>         | Ceilometer | <-- | collectd ceilometer plugin |  <---
>         +------+-----+     +---------------+------------+
>
> Performance impact:
>   No noticeable performance or latency impact is observed with
>   KA feature enabled.
>
> Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com>
> ---

A good chunk of this commit message should be in the documentation.
Either as a duplicate, or just moved there.  While commit messages are
meant to describe the 'why' of a change, this commit merely adds
documentation for the keepalive feature.  The actual description of the
feature should live on in the documentation, so that when a user is
trying to figure out what it is, they can read it.

I hope this feature isn't exclusively for ceilometer / collectd.

>  Documentation/howto/dpdk.rst | 112 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 112 insertions(+)

Is the dpdk documentation the correct place for this?  Will this
framework ever be applied to more than just PMDs?  Isn't that a goal of
monitoring the health of various tasks?

> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index d123819..e7a2b27 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -439,6 +439,118 @@ For certain traffic profiles with many parallel flows, it's recommended to set
>  
>  For more information on the EMC refer to :doc:`/intro/install/dpdk` .
>  
> +.. _dpdk_keepalive:
> +
> +Keepalive
> +---------
> +

This needs a bit more description.  I read this section, and I don't
know what the Keepalive(KA) feature is - is it merely a reporting
framework for the last time threads 'checked in'?  Does it take any
actions?  What is a user meant to do with this data?  Are there cases
where this framework gives false positives / negatives?  Downsides to
enabling it?  Alternatives?

Some of that is explained in the commit message - but it really belongs
here.

> +OvS Keepalive(KA) feature is disabled by default. To enable KA feature::
> +
> +    $ ovs-vsctl --no-wait set Open_vSwitch . other_config:enable-keepalive=true
> +
> +The KA feature can't be enabled at run time and should be done at startup
> +before ovs-vswitchd daemon is started.
> +
> +The default timer interval for monitoring packet processing threads is 1000ms.
> +To set a different timer value, run::
> +
> +    $ ovs-vsctl --no-wait set Open_vSwitch . \
> +        other_config:keepalive-interval="5000"
> +

Can this be changed at run-time?  Such information should be included.

> +The events comprise of thread states and the last seen timestamps. The events
> +are written in to process map periodically by keepalive thread.
> +
> +The events in the process map are retrieved by main(vswitchd) thread and
> +updated in to keepalive column of Open_vSwitch table in OVSDB. Any external
> +monitoring application can read the status from OVSDB at intervals or subscribe
> +to the updates so that they get notified when the changes happen on OvSDB.
> +
> +To monitor the datapath status using ovsdb-client, run::

Will this monitor revalidator threads?  RCU related threads?  Is it only
ever datapath status?  Maybe should read 'To monitor the keepalive status'.

> +    $ ovsdb-client monitor Open_vSwitch
> +    $ ovsdb-client monitor Open_vSwitch Open_vSwitch keepalive
> +
> +The datapath thread states are explained below::

s/datapath/keepalive/

> +
> +      KA_STATE_UNUSED  - Not registered to KA framework.

Maybe "KA_STATE_NONE"?  Or better yet - why should we have an enum for
this?  After all... the thread isn't tracked?

> +      KA_STATE_ALIVE   - Thread alive.

--
> +      KA_STATE_MISSING - Thread missed first heartbeat.
> +      KA_STATE_DEAD    - Thread missed two heartbeats.
> +      KA_STATE_GONE    - Thread missed two or more heartbeats and burried.
--

Is it useful to distinguish between 'missed one', 'missed two', and
'missed two or more?'  Further, what does 'burried' mean?  I hope we
don't have to dig a thread out from the under a vast field of
permafrost.

Maybe it's better to have a state for MISSED, and show the missed
counter.

> +      KA_STATE_SLEEP   - Thread is sleeping.
> +
> +To query the datapath status, run::
> +
> +    $ ovs-appctl keepalive/pmd-health-show
> +

The following section seems to be more about installation and
configuration of collectd.  Perhaps that should be split into a
different document?

> +`collectd <https://collectd.org/>`__ has built-in support for DPDK and provides
> +a `ovs_events` and `ovs_stats` plugin that can be enabled to relay the datapath
> +status and the PMD status to OpenStack service `Ceilometer
> +<https://docs.openstack.org/developer/ceilometer/>`__.
> +
> +To install and configure `collectd`, run::
> +
> +    # Clone collectd from Git repository
> +    $ git clone https://github.com/collectd/collectd.git
> +
> +    # configure and install collectd
> +    $ cd collectd
> +    $ ./build.sh
> +    $ ./configure --enable-syslog --enable-logfile --with-libdpdk=/usr
> +    $ make
> +    $ make install
> +
> +`collectd` is installed in ``/opt/collectd`` by default. Edit the configuration
> +file in ``/opt/collectd/etc/collectd.conf`` to enable logfile, dpdkevents
> +and csv plugin::
> +
> +   LoadPlugin logfile
> +   <Plugin logfile>
> +       LogLevel debug
> +       File "/var/log/collectd/collectd.log"
> +       Timestamp true
> +       PrintSeverity false
> +   </Plugin>
> +
> +   <Plugin syslog>
> +       LogLevel info
> +   </Plugin>
> +
> +Enable `ovs_events` plugin and update the plugindetails as below::
> +
> +   LoadPlugin ovs_events
> +
> +   <Plugin ovs_events>
> +     Port "6640"
> +     Address "127.0.0.1"
> +     Socket "/usr/local/var/run/openvswitch/db.sock"
> +     SendNotification true
> +     DispatchValues false
> +   </Plugin>
> +
> +Enable `ovs_stats` plugin and update the plugindetails as below::
> +
> +   LoadPlugin ovs_stats
> +
> +   <Plugin ovs_stats>
> +     Port "6640"
> +     Address "127.0.0.1"
> +     Socket "/usr/local/var/run/openvswitch/db.sock"
> +     Bridges "br0"
> +   </Plugin>
> +
> +Enable ``csv`` plugin as below::
> +
> +   LoadPlugin csv
> +
> +   <Plugin csv>
> +       DataDir "/var/log/collectd/csv"
> +       StoreRates false
> +   </Plugin>
> +
> +With csv plugin enabled, *meter* or *gauge* file is created and timestamp
> +and thread status gets updated which are sent to ceilometer service.
> +
 .. _dpdk-ovs-in-guest:
 
 OVS with DPDK Inside VMs
-- 
2.4.11


More information about the dev mailing list