[ovs-discuss] Packet drops with high rate of Packet_In

Fri Nov 22 15:44:41 UTC 2013

Does the controller get any error replies from Open vSwitch?
What's in the ovs-vswitchd log?  (Not in debug mode, that's too big.)

On Fri, Nov 22, 2013 at 04:15:20PM +0100, Anton Matsiuk wrote:
> Dear Ben,
> 
> I figured out that drops occur inside OVS. I see all packets entering one
> interface of OVS, Packet_In generated for every packet, then Flow_Mods (or
> Packet_Out in other tests) generated and sent for every Packet_In by
> external controller and all this rules are installed to OVS. Namely 500
> Packet_In  --> 500 flows in OVS, but only part of ingress packets is
> processed through their corresponding flow rules and leaves OVS.
> (dump-ports and dump-flows both in kermel and user-space modules show this).
> Drops occur only after some threshold of Packet_In per msec, that's why it
> seems like OVS drops some packets due to buffer overloads (or probably due
> to expired timeouts for arrived packets).
> 
> I read logs up to dbg level but the only thing that I figured out (in
> ovs-vswitchd.log) that governor periodically expands hash table in response
> to flow_mods increasing frequency.
> 
> Is there possibility to track drops in internal buffers of OVS or somehow
> to debug it?
> 
> Or, probably, does OVS drop packets after expired timeout for Packet_In
> residing in buffer? And what is the default value for such timeout if any?
> 
> -- 
> Best regards,
> Anton Matsiuk
> 
> On 21 November 2013 17:56, Ben Pfaff <blp at nicira.com> wrote:
> 
> > Please don't drop the mailing list.
> >
> > You have begun to narrow down where the drops occur, but it's still not
> > clear exactly where.  I suggest following the troubleshooting procedure
> > in the FAQ.
> >
> > Q: I have a sophisticated network setup involving Open vSwitch, VMs or
> >    multiple hosts, and other components.  The behavior isn't what I
> >    expect.  Help!
> >
> > A: To debug network behavior problems, trace the path of a packet,
> >    hop-by-hop, from its origin in one host to a remote host.  If
> >    that's correct, then trace the path of the response packet back to
> >    the origin.
> >
> >    Usually a simple ICMP echo request and reply ("ping") packet is
> >    good enough.  Start by initiating an ongoing "ping" from the origin
> >    host to a remote host.  If you are tracking down a connectivity
> >    problem, the "ping" will not display any successful output, but
> >    packets are still being sent.  (In this case the packets being sent
> >    are likely ARP rather than ICMP.)
> >
> >    Tools available for tracing include the following:
> >
> >        - "tcpdump" and "wireshark" for observing hops across network
> >          devices, such as Open vSwitch internal devices and physical
> >          wires.
> >
> >        - "ovs-appctl dpif/dump-flows <br>" in Open vSwitch 1.10 and
> >          later or "ovs-dpctl dump-flows <br>" in earlier versions.
> >          These tools allow one to observe the actions being taken on
> >          packets in ongoing flows.
> >
> >          See ovs-vswitchd(8) for "ovs-appctl dpif/dump-flows"
> >          documentation, ovs-dpctl(8) for "ovs-dpctl dump-flows"
> >          documentation, and "Why are there so many different ways to
> >          dump flows?" above for some background.
> >
> >        - "ovs-appctl ofproto/trace" to observe the logic behind how
> >          ovs-vswitchd treats packets.  See ovs-vswitchd(8) for
> >          documentation.  You can out more details about a given flow
> >          that "ovs-dpctl dump-flows" displays, by cutting and pasting
> >          a flow from the output into an "ovs-appctl ofproto/trace"
> >          command.
> >
> >        - SPAN, RSPAN, and ERSPAN features of physical switches, to
> >          observe what goes on at these physical hops.
> >
> >    Starting at the origin of a given packet, observe the packet at
> >    each hop in turn.  For example, in one plausible scenario, you
> >    might:
> >
> >        1. "tcpdump" the "eth" interface through which an ARP egresses
> >           a VM, from inside the VM.
> >
> >        2. "tcpdump" the "vif" or "tap" interface through which the ARP
> >           ingresses the host machine.
> >
> >        3. Use "ovs-dpctl dump-flows" to spot the ARP flow and observe
> >           the host interface through which the ARP egresses the
> >           physical machine.  You may need to use "ovs-dpctl show" to
> >           interpret the port numbers.  If the output seems surprising,
> >           you can use "ovs-appctl ofproto/trace" to observe details of
> >           how ovs-vswitchd determined the actions in the "ovs-dpctl
> >           dump-flows" output.
> >
> >        4. "tcpdump" the "eth" interface through which the ARP egresses
> >           the physical machine.
> >
> >        5. "tcpdump" the "eth" interface through which the ARP
> >           ingresses the physical machine, at the remote host that
> >           receives the ARP.
> >
> >        6. Use "ovs-dpctl dump-flows" to spot the ARP flow on the
> >           remote host that receives the ARP and observe the VM "vif"
> >           or "tap" interface to which the flow is directed.  Again,
> >           "ovs-dpctl show" and "ovs-appctl ofproto/trace" might help.
> >
> >        7. "tcpdump" the "vif" or "tap" interface to which the ARP is
> >           directed.
> >
> >        8. "tcpdump" the "eth" interface through which the ARP
> >           ingresses a VM, from inside the VM.
> >
> >    It is likely that during one of these steps you will figure out the
> >    problem.  If not, then follow the ARP reply back to the origin, in
> >    reverse.
> >
> >
> > On Thu, Nov 21, 2013 at 04:55:13PM +0100, Anton Matsiuk wrote:
> > > I request log files up to debug level, namely:
> > > ovs-vswitchd.log
> > > ovs-dpctl.log
> > > ovs-ofctl.log
> > > but none of them shows any messages related to packet drops. All the
> > > statistics shows that correct number of flows was installed and only part
> > > of packets was processed.
> > > That's why I am asking, is there any else possibilities (beyond log
> > files)
> > > to track packet drops in input buffers and probably to fix them? Or at
> > > least in which direction I should search for a solution?
> > >
> > >
> > > On 20 November 2013 18:13, Ben Pfaff <blp at nicira.com> wrote:
> > >
> > > > On Wed, Nov 20, 2013 at 12:35:25PM +0100, Anton Matsiuk wrote:
> > > > > I test Open vSwitch in the following scheme: I use 2 hosts directly
> > > > > connected to OVS and external OpenFlow Controller. Host1 generates
> > UDP
> > > > > datagrams with sequential ports towards Host2, Host 2 listens for
> > these
> > > > UDP
> > > > > datagrams. In responce to every UDP datagram OVS generates Packet_In
> > and
> > > > > Controller sends Flow_Mod back with L4 granularity (so for every
> > pair of
> > > > > UDP port numbers it installs separate flow). I send bunch of UDP
> > > > datagrams
> > > > > from Host1 and calculate how many of them arrived to Host2. I tried
> > both
> > > > > with detached controller and running in the same machine as OVS. I
> > tested
> > > > > it on different machines (in Mininet and with separated real hosts).
> > I
> > > > use
> > > > > out-of-band option for controller and disable-in-band=true.
> > > > >
> > > > >
> > > > > Starting  some number of packets ( around >300) packet drops are
> > > > observed.
> > > > > For instance, if I generate 500 UDP packets in 120 ms only around
> > 350 of
> > > > > them arrive to Host2 (Subsequent packets of the same flow can arrive
> > to
> > > > > Host2, but first packets of flows always experience drops)
> > > > >
> > > > >
> > > > > ovs-ofctl dump-aggregate show that all the flows are installed but
> > only
> > > > > part of packets are processed through them:
> > > > >
> > > > > NXST_AGGREGATE reply (xid=0x4): packet_count=356 byte_count=42364
> > > > > flow_count=500
> > > > >
> > > > >
> > > > > ovs-ofctl dump-ports also shows that 500 packets arrive on ingress
> > > > > interface and only 356 leave egress.
> > > > >
> > > > >
> > > > > ovs-dpctl show ?s shows the same ?  500 flows installed and 356
> > packets
> > > > > processed.
> > > > >
> > > > >
> > > > > Also I tried to replace Flow_Mods with Packet_Out messages for every
> > > > > packet, but I experienced the same drops. It seems like OVS starts
> > > > dropping
> > > > > packets after some threshold (or buffer overload).
> > > > >
> > > > >
> > > > > Is there any possibility to debug these drops and maybe to manipulate
> > > > > ingress buffer sizes (or queue priorities) in order to avoid such
> > drops?
> > > >
> > > > Yes, I think you will have to do the initial debugging yourself, to
> > find
> > > > out where the drop is occurring.  When you report that back to us, we
> > > > can help you figure out how to fix it.
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Anton Matsiuk
> >