[ovs-dev] [PATCH v6 0/8] Add OVS DPDK keep-alive functionality.

Bodireddy, Bhanuprakash bhanuprakash.bodireddy at intel.com
Tue Jan 16 10:43:41 UTC 2018


>Hi,
>
>Sorry to jump on this at v6 only, but I skimmed over the code and I am
>struggling to understand what problem you're trying to solve. Yes, I realize
>you want some sort of feedback about the PMD processing, but it's not clear
>to me what exactly you want from it.
>
>This last patchset uses a separate thread just to monitor the PMD threads
>which can update their status in the core busy loop.  I guess it tells you if the
>PMD thread is stuck or not, but not really if it's processing packets.  That's
>again, my question above.
>
>If you need to know if the thread is running, I think any OVS can provide you
>the process stats which should be more reliable and doesn't depend on OVS
>at all.
>
>I appreciate if you could elaborate more on the use-case.

Intel SA team has been working on  SA Framework for NFV environment and has defined interfaces
for the base platform(aligned with ETSI GS NFV 002)  which includes compute, storage, NW, virtual switch, OS and hypervisor.
The core idea here is to monitor and detect the service impacting faults on the Base platform. 
Both reactive and pro-active fault detection techniques are employed and faults are reported to
higher level layers for corrective actions. The corrective actions for example here can be migrating
the workloads, marking the compute offline and is based on the policies enforced at higher layers.

One aspect of larger SA framework is monitoring virtual switch health. Some of the events of interest here
are link status, OvS DB connection status, packet statistics(drops/errors), PMD health. 

This patch series has only implemented *PMD health* monitoring and reporting mechanism and the details are
already in the patch. The other interesting events of virtual switch are already implemented as part of collectd plugin.

On your questions:

> I guess it tells you if the PMD thread is stuck or not, but not really if it's processing packets.  That's
>again, my question above.

The functionality to check if the PMD is processing the packets was implemented way back in v3.
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336789.html

For easier review, the patch series was split up in v4 to get the basic functionality in. This is mentioned in version change log below.
https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337702.html

>If you need to know if the thread is running, I think any OVS can provide you
>the process stats which should be more reliable and doesn't depend on OVS
>at all.

There is a problem here and I did simulate the case to show that the stats reported by OS aren't accurate in the below thread.
https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338388.html

Check the details on /proc/[pid]/stats. Though the PMD thread is stalled, OS reports the thread as *Running (R)* state.

- Bhanuprakash.

>
>
>On Fri, Dec 08, 2017 at 12:04:19PM +0000, Bhanuprakash Bodireddy wrote:
>> Keepalive feature is aimed at achieving Fastpath Service Assurance in
>> OVS-DPDK deployments. It adds support for monitoring the packet
>> processing threads by dispatching heartbeats at regular intervals.
>>
>> keepalive feature can be enabled through below OVSDB settings.
>>
>>     enable-keepalive=true
>>       - Keepalive feature is disabled by default and should be enabled
>>         at startup before ovs-vswitchd daemon is started.
>>
>>     keepalive-interval="5000"
>>       - Timer interval in milliseconds for monitoring the packet
>>         processing cores.
>>
>> TESTING:
>>     The testing of keepalive is done using stress cmd (simulating the stalls).
>>       - pmd-cpu-mask=0xf [MQ enabled on DPDK ports]
>>       - stress -c 1 &          [tid is usually the __tid + 1 of the output]
>>       - chrt -r -p 99 <tid>    [set realtime priority for stress thread]
>>       - taskset -p 0x8 <tid>   [Pin the stress thread to the core PMD is running]
>>       - PMD thread will be descheduled due to its normal priority and yields
>>         core to stress thread.
>>
>>       - ovs-appctl keepalive/pmd-health-show   [Display that the thread is
>GONE]
>>       - ./ovsdb/ovsdb-client monitor Open_vSwitch  [Should update the
>> status]
>>
>>       - taskset -p 0x10 <tid>  [This brings back pmd thread to life as stress
>thread
>>                                 is moved to idle core]
>>
>>       (watch out for stress threads, and carefully pin them to core not to hang
>your DUTs
>>        during tesing).
>>
>> v5 -> v6
>>   * Remove 2 patches from series
>>      - xnanosleep was applied to master as part of high resolution timeout
>support.
>>      - Extend get_process_info() API was also applied to master earlier.
>>   * Remove KA_STATE_DOZING as it was initially meant to handle Core C
>states, not needed
>>     for now.
>>   * Fixed ka_destroy(), to fix unit test cases 536, 537.
>>   * A minor performance degradation(0.5%) is observed with Keepalive
>enabled.
>>     [Tested with loopback case using 1000 IXIA streams/64 byte udp pkts and
>>     1 PMD thread(9.239 vs 9.177Mpps) at 10ms ka-interval timeout]
>>   * Verified with sparse, MSVC compilers(appveyor).
>>
>> v4 -> v5
>>   * Add 3 more patches to the series
>>      - xnanosleep()
>>      - Documentation
>>      - Update to NEWS
>>   * Remove all references to core_id and instead implemented thread based
>tracking.
>>   * Addressed most of the comments in v4.
>>
>> v3 -> v4
>>   * Split the functionality in to 2 parts. This patch series only updates
>>     PMD status to OVSDB. The incremental patch series to handle false
>positives,
>>     negatives and more checking and stats.
>>   * Remove code from netdev layer and dependency on rte_keepalive lib.
>>   * Merged few patches and simplified the patch series.
>>   * Timestamp in human readable form.
>>
>> v2 -> v3
>>   * Rebase.
>>   * Verified with dpdk-stable-17.05.1 release.
>>   * Fixed build issues with MSVC and cross checked with appveyor.
>>
>> v1 -> v2
>>   * Rebase
>>   * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
>>     is already applied as separate patch.
>>
>> RFCv3 -> v1
>>   * Made changes to fix failures in some unit test cases.
>>   * some more code cleanup w.r.t process related APIs.
>>
>> RFCv2 -> RFCv3
>>   * Remove POSIX shared memory block implementation (suggested by
>Aaron).
>>   * Rework the logic to register and track threads instead of cores. This way
>>     in the future any thread can be registered to KA framework. For now only
>PMD
>>     threads are tracked (suggested by Aaron).
>>   * Refactor few APIs and further clean up the code.
>>
>> RFCv1 -> RFCv2
>>   * Merged the xml and schema commits to later commit where the actual
>>     implementation is done(suggested by Ben).
>>   * Fix ovs-appctl keepalive/* hang issue when KA disabled.
>>   * Fixed memory leaks with appctl commands for keepalive/pmd-health-
>show,
>>     pmd-xstats-show.
>>   * Refactor code and fixed APIs dealing with PMD health monitoring.
>>
>>
>> Bhanuprakash Bodireddy (8):
>>   Keepalive: Add initial keepalive configuration.
>>   dpif-netdev: Register packet processing cores to KA framework.
>>   dpif-netdev: Enable heartbeats for DPDK datapath.
>>   keepalive: Retrieve PMD status periodically.
>>   bridge: Update keepalive status in OVSDB.
>>   keepalive: Add support to query keepalive status and statistics.
>>   Documentation: Update DPDK doc with Keepalive feature.
>>   NEWS: Add keepalive support information in NEWS.
>>
>>  Documentation/howto/dpdk.rst | 112 +++++++++
>>  NEWS                         |   2 +
>>  lib/automake.mk              |   2 +
>>  lib/dpif-netdev.c            |  92 ++++++++
>>  lib/keepalive.c              | 552
>+++++++++++++++++++++++++++++++++++++++++++
>>  lib/keepalive.h              | 109 +++++++++
>>  lib/ovs-thread.c             |   6 +
>>  lib/ovs-thread.h             |   1 +
>>  lib/util.c                   |  22 ++
>>  lib/util.h                   |   1 +
>>  vswitchd/bridge.c            |  29 +++
>>  vswitchd/vswitch.ovsschema   |   8 +-
>>  vswitchd/vswitch.xml         |  49 ++++
>>  13 files changed, 983 insertions(+), 2 deletions(-)  create mode
>> 100644 lib/keepalive.c  create mode 100644 lib/keepalive.h
>>
>> --
>> 2.4.11
>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>--
>Flavio



More information about the dev mailing list