[ovs-dev] [RFE] Event mechanism for a pro-active packet drop detection and recovery

Ben Pfaff blp at ovn.org
Fri Jul 5 19:53:40 UTC 2019

Wow.  There's a lot here.

Some of my reactions:

- It's good to increase visibility.

- I don't know much about the importance of different kinds of
  visibility or what kinds of tools will consume them downstream.  I
  don't know the ultimate goals.

- OVS doesn't currently implement d-bus.  I don't know anything about
  d-bus, such as how much work it is to implement, whether OVS would
  need new library dependencies or how demanding those are, or whether
  it could be cross-platform (i.e. also support Windows).

- One can introduce new individual features for tracking different kinds
  of drops.  One can also introduce different frameworks for reporting
  them and alerting/alarming on them.  I guess that these can probably
  be separated.

- There are multiple levels at which drops can happen or be detected.  I
  wonder whether all of these can be addressed by a single framework.

Do you have an idea for next steps?  Sometimes it helps to have a
specific proposal to discuss, even if it is a straw man.  Maybe writing
something up would help.

On Fri, Jun 28, 2019 at 11:10:14AM +0530, Gowrishankar Muthukrishnan wrote:
> Today (*), when a packet journey in the data path is disrupted and leading
> towards its drop, we have OVS counters to auto-detect it and show at the
> request of user space commands. Some category of drops are related to the
> interfaces that can be queried from OVS DB table for that interface [2],
> while some are available in real time in the data path through respective
> OVS commands (eg, ovs-appctl coverage/show as in [3] and ovs-appctl
> dpctl/show as in [4]). It is unavoidable that the drop stats is split
> across multiple sources, but at the end of the day the user has to query
> by different ways to figure out:
>   (1) there is packet drop
>   (2) reason for the drop
>   (3) miss precious opportunity to correct available resources in the
>       data path to prevent further drops.
> To ease the difficulty in monitoring these data, we already have tool
> such as collectd [1] to record the events but IMHO there is slight async
> between what we have today and what we develop in our upstream, meaning
> collectd can know packet drops only in the context of interface table.
> However, the other category of drops (related to QoS, metering, tunnel,
> up call, re-circulation, mtu mismatch and even invalid packet etc) can not
> be monitored by collectd because, neither the association with
> the Interface table nor a separate table itself exist today.
> However, there is an indirect association for eg Flow_Table represents
> all the packet flow rules, and when a packet is dropped, it can only be
> checked in Flow_Table for any drop action but it is not unified attempt
> to quickly detect and correct resources. Thanks to our developers that
> these drops are someway recorded now but, in the field, the time to
> recover from the drops easily elapses given that, these stats first to be
> collected, be analysed by experts and then recovery action be applied.
> Also, there could be a pressing need to have very very minimal packet
> drops per million (ppm) .
> Hence, I would like to request suggestions from experts for how we can
> handle this situation through OVS and my humble ideas are below.
> (1) Unify the data collection into a common place:
>   We can think of having a separate Data path table to record necessary
>   contexts of a packet (drop reason and its count to start with). This
>   will lead very minimal changes in the eco-system like collectd to sync.
>   Work around until then is to continue using existing tables where ever
>   possible, with additional statistics row if not exist.
> (2) Notify drop very soon or never!
>   Instead of detecting DB records update (even after (1) above) with some
>   latency in DB transactions to be in sync with real time data, why not OVS
>   generate events to the consuming eco-system pro-actively ? I can think
>   of D-bus for an instance to broadcast packet drop notifications.
>   As a disclaimer, I'm not d-bus expert :) but it is just an idea to
>   brainstorm.
>   An analogy in terms of cli (though using its library it is good):
>   <broadcasting event for every packet may be too much exhausting
>   resources in the notification chain instead, follow guidelines set by
>   the user. eg above allowable drop ppm ?? or even wait for signal to
>   enable broadcast from registered monitoring agent in dbus).
>   OVS: dbus-send --system --dest=net.ovsmon/net.ovsmon.Datapath.SetProperty
> string:Qfull variant:string:<port_name_that_packet_arrived>
>   Monitor: dbus-monitor type=signal interface="net.ovsmon.Datapath"
>              signal sender=net.ovsmon.Datapath -> dest=:1.102
> path=/net/ovsmon/Datapath; interface=net.ovsmon.Datapath; member=Qfull
> string "vhost-port-1"
>   Monitor: dbus-send --system
> --dest=net.ovsmon/net.ovsmon.Interface.SetProperty string:<port_name>
> variant:string:"queue_size=<new value>"
>   OVS: <to monitor and apply corrective action>
> If you think this sounds good, I can further think on prototyping it
> for a better demonstration or if it is other way, please suggest any
> better approach as well.
> * Below patches are in upstream as accepted/under review at present:
> [1]
> https://wiki.opnfv.org/display/fastpath/Open+vSwitch+plugins+High+Level+Design
> [2] https://patchwork.ozlabs.org/patch/1123287/
> [3] https://patchwork.ozlabs.org/patch/1111568/
> [4] https://patchwork.ozlabs.org/patch/1115978/
> [5] http://www.openvswitch.org//ovs-vswitchd.conf.db.5.pdf
> Respective developers from above mail chains are CC'd however, others are
> also more welcome. Also, I think it is ovs-dev as appropriate ML for this
> discussion.
> Kind regards,
> Gowrishankar M

More information about the dev mailing list