[ovs-dev] [PATCH v6 0/8] Add OVS DPDK keep-alive functionality.

Aaron Conole aconole at redhat.com
Wed Jan 24 17:02:01 UTC 2018


"Bodireddy, Bhanuprakash" <bhanuprakash.bodireddy at intel.com> writes:

>>Hi,
>>
>>Sorry to jump on this at v6 only, but I skimmed over the code and I am
>>struggling to understand what problem you're trying to solve. Yes, I realize
>>you want some sort of feedback about the PMD processing, but it's not clear
>>to me what exactly you want from it.
>>
>>This last patchset uses a separate thread just to monitor the PMD threads
>>which can update their status in the core busy loop.  I guess it tells you if the
>>PMD thread is stuck or not, but not really if it's processing packets.  That's
>>again, my question above.
>>
>>If you need to know if the thread is running, I think any OVS can provide you
>>the process stats which should be more reliable and doesn't depend on OVS
>>at all.
>>
>>I appreciate if you could elaborate more on the use-case.
>
> Intel SA team has been working on SA Framework for NFV environment and
> has defined interfaces
> for the base platform(aligned with ETSI GS NFV 002) which includes
> compute, storage, NW, virtual switch, OS and hypervisor.
> The core idea here is to monitor and detect the service impacting
> faults on the Base platform.
> Both reactive and pro-active fault detection techniques are employed
> and faults are reported to
> higher level layers for corrective actions. The corrective actions for
> example here can be migrating
> the workloads, marking the compute offline and is based on the
> policies enforced at higher layers.
>
> One aspect of larger SA framework is monitoring virtual switch
> health. Some of the events of interest here
> are link status, OvS DB connection status, packet
> statistics(drops/errors), PMD health.
>
> This patch series has only implemented *PMD health* monitoring and
> reporting mechanism and the details are
> already in the patch. The other interesting events of virtual switch
> are already implemented as part of collectd plugin.
>
> On your questions:
>
>> I guess it tells you if the PMD thread is stuck or not, but not
>> really if it's processing packets.  That's
>>again, my question above.
>
> The functionality to check if the PMD is processing the packets was
> implemented way back in v3.
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/336789.html
>
> For easier review, the patch series was split up in v4 to get the
> basic functionality in. This is mentioned in version change log below.
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337702.html
>
>>If you need to know if the thread is running, I think any OVS can provide you
>>the process stats which should be more reliable and doesn't depend on OVS
>>at all.
>
> There is a problem here and I did simulate the case to show that the
> stats reported by OS aren't accurate in the below thread.
> https://mail.openvswitch.org/pipermail/ovs-dev/2017-September/338388.html
>
> Check the details on /proc/[pid]/stats. Though the PMD thread is
> stalled, OS reports the thread as *Running (R)* state.

Just responding to this point, and I assumed you would follow up to my
statement when I wrote:

  That said, I suspect that when those times were increasing, they really
  were getting some system / user time.  Was your worker thread
  occasionally making system calls (or even doing sched_yield()?)  In
  those cases, it likely was letting the pmd thread move on.

You are observing that the thread is still running, and still getting time.
Your experiment merely shows that the scheduler works as intended.  If
you clocked the percentage (as I just did), you'll see that it
balances.  ex, when I run two RT threads on the same core as a PMD
thread (and use a simple 'top' to show the data):

 5905 root      10 -10 1996516 141900  11128 R 60.1  0.9   3:09.08 pmd20        
 6361 aconole   20   0    7896    104      0 R 19.9  0.0   1:03.46 stress       
 6362 aconole   20   0    7896    104      0 R 19.9  0.0   1:06.95 stress       

Note the CPU utilization percentage.  When I stop these threads:

 5905 root      10 -10 1996516 141900  11128 R 99.7  0.9   4:22.77 pmd20        

That means, if you simply calculate the time taken by the thread as a
percentage of the time available to take, you'll detect stalls by seeing
if the percentage drops below, say, 90%.

This can still be done using procfs.  I reject the assertion that the
kernel doesn't account for process time properly.  That would be a
scheduler bug or an accounting bug.

I disagree with your statement:

  "The other fields related to time in stats file can't be completely
   relied due to below test case."

Did I misunderstand your assertion and test?

> - Bhanuprakash.
>
>>
>>
>>On Fri, Dec 08, 2017 at 12:04:19PM +0000, Bhanuprakash Bodireddy wrote:
>>> Keepalive feature is aimed at achieving Fastpath Service Assurance in
>>> OVS-DPDK deployments. It adds support for monitoring the packet
>>> processing threads by dispatching heartbeats at regular intervals.
>>>
>>> keepalive feature can be enabled through below OVSDB settings.
>>>
>>>     enable-keepalive=true
>>>       - Keepalive feature is disabled by default and should be enabled
>>>         at startup before ovs-vswitchd daemon is started.
>>>
>>>     keepalive-interval="5000"
>>>       - Timer interval in milliseconds for monitoring the packet
>>>         processing cores.
>>>
>>> TESTING:
>>>     The testing of keepalive is done using stress cmd (simulating the stalls).
>>>       - pmd-cpu-mask=0xf [MQ enabled on DPDK ports]
>>>       - stress -c 1 &          [tid is usually the __tid + 1 of the output]
>>>       - chrt -r -p 99 <tid>    [set realtime priority for stress thread]
>>>       - taskset -p 0x8 <tid>   [Pin the stress thread to the core PMD is running]
>>>       - PMD thread will be descheduled due to its normal priority and yields
>>>         core to stress thread.
>>>
>>>       - ovs-appctl keepalive/pmd-health-show   [Display that the thread is
>>GONE]
>>>       - ./ovsdb/ovsdb-client monitor Open_vSwitch  [Should update the
>>> status]
>>>
>>>       - taskset -p 0x10 <tid>  [This brings back pmd thread to life as stress
>>thread
>>>                                 is moved to idle core]
>>>
>>>       (watch out for stress threads, and carefully pin them to core not to hang
>>your DUTs
>>>        during tesing).
>>>
>>> v5 -> v6
>>>   * Remove 2 patches from series
>>>      - xnanosleep was applied to master as part of high resolution timeout
>>support.
>>>      - Extend get_process_info() API was also applied to master earlier.
>>>   * Remove KA_STATE_DOZING as it was initially meant to handle Core C
>>states, not needed
>>>     for now.
>>>   * Fixed ka_destroy(), to fix unit test cases 536, 537.
>>>   * A minor performance degradation(0.5%) is observed with Keepalive
>>enabled.
>>>     [Tested with loopback case using 1000 IXIA streams/64 byte udp pkts and
>>>     1 PMD thread(9.239 vs 9.177Mpps) at 10ms ka-interval timeout]
>>>   * Verified with sparse, MSVC compilers(appveyor).
>>>
>>> v4 -> v5
>>>   * Add 3 more patches to the series
>>>      - xnanosleep()
>>>      - Documentation
>>>      - Update to NEWS
>>>   * Remove all references to core_id and instead implemented thread based
>>tracking.
>>>   * Addressed most of the comments in v4.
>>>
>>> v3 -> v4
>>>   * Split the functionality in to 2 parts. This patch series only updates
>>>     PMD status to OVSDB. The incremental patch series to handle false
>>positives,
>>>     negatives and more checking and stats.
>>>   * Remove code from netdev layer and dependency on rte_keepalive lib.
>>>   * Merged few patches and simplified the patch series.
>>>   * Timestamp in human readable form.
>>>
>>> v2 -> v3
>>>   * Rebase.
>>>   * Verified with dpdk-stable-17.05.1 release.
>>>   * Fixed build issues with MSVC and cross checked with appveyor.
>>>
>>> v1 -> v2
>>>   * Rebase
>>>   * Drop 01/20 Patch "Consolidate process related APIs" of V1 as it
>>>     is already applied as separate patch.
>>>
>>> RFCv3 -> v1
>>>   * Made changes to fix failures in some unit test cases.
>>>   * some more code cleanup w.r.t process related APIs.
>>>
>>> RFCv2 -> RFCv3
>>>   * Remove POSIX shared memory block implementation (suggested by
>>Aaron).
>>>   * Rework the logic to register and track threads instead of cores. This way
>>>     in the future any thread can be registered to KA framework. For now only
>>PMD
>>>     threads are tracked (suggested by Aaron).
>>>   * Refactor few APIs and further clean up the code.
>>>
>>> RFCv1 -> RFCv2
>>>   * Merged the xml and schema commits to later commit where the actual
>>>     implementation is done(suggested by Ben).
>>>   * Fix ovs-appctl keepalive/* hang issue when KA disabled.
>>>   * Fixed memory leaks with appctl commands for keepalive/pmd-health-
>>show,
>>>     pmd-xstats-show.
>>>   * Refactor code and fixed APIs dealing with PMD health monitoring.
>>>
>>>
>>> Bhanuprakash Bodireddy (8):
>>>   Keepalive: Add initial keepalive configuration.
>>>   dpif-netdev: Register packet processing cores to KA framework.
>>>   dpif-netdev: Enable heartbeats for DPDK datapath.
>>>   keepalive: Retrieve PMD status periodically.
>>>   bridge: Update keepalive status in OVSDB.
>>>   keepalive: Add support to query keepalive status and statistics.
>>>   Documentation: Update DPDK doc with Keepalive feature.
>>>   NEWS: Add keepalive support information in NEWS.
>>>
>>>  Documentation/howto/dpdk.rst | 112 +++++++++
>>>  NEWS                         |   2 +
>>>  lib/automake.mk              |   2 +
>>>  lib/dpif-netdev.c            |  92 ++++++++
>>>  lib/keepalive.c              | 552
>>+++++++++++++++++++++++++++++++++++++++++++
>>>  lib/keepalive.h              | 109 +++++++++
>>>  lib/ovs-thread.c             |   6 +
>>>  lib/ovs-thread.h             |   1 +
>>>  lib/util.c                   |  22 ++
>>>  lib/util.h                   |   1 +
>>>  vswitchd/bridge.c            |  29 +++
>>>  vswitchd/vswitch.ovsschema   |   8 +-
>>>  vswitchd/vswitch.xml         |  49 ++++
>>>  13 files changed, 983 insertions(+), 2 deletions(-)  create mode
>>> 100644 lib/keepalive.c  create mode 100644 lib/keepalive.h
>>>
>>> --
>>> 2.4.11
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>--
>>Flavio
>
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list