[ovs-discuss] Packet drops with high rate of Packet_In
Ben Pfaff
blp at nicira.com
Fri Nov 22 15:44:41 UTC 2013
Does the controller get any error replies from Open vSwitch?
What's in the ovs-vswitchd log? (Not in debug mode, that's too big.)
On Fri, Nov 22, 2013 at 04:15:20PM +0100, Anton Matsiuk wrote:
> Dear Ben,
>
> I figured out that drops occur inside OVS. I see all packets entering one
> interface of OVS, Packet_In generated for every packet, then Flow_Mods (or
> Packet_Out in other tests) generated and sent for every Packet_In by
> external controller and all this rules are installed to OVS. Namely 500
> Packet_In --> 500 flows in OVS, but only part of ingress packets is
> processed through their corresponding flow rules and leaves OVS.
> (dump-ports and dump-flows both in kermel and user-space modules show this).
> Drops occur only after some threshold of Packet_In per msec, that's why it
> seems like OVS drops some packets due to buffer overloads (or probably due
> to expired timeouts for arrived packets).
>
> I read logs up to dbg level but the only thing that I figured out (in
> ovs-vswitchd.log) that governor periodically expands hash table in response
> to flow_mods increasing frequency.
>
> Is there possibility to track drops in internal buffers of OVS or somehow
> to debug it?
>
> Or, probably, does OVS drop packets after expired timeout for Packet_In
> residing in buffer? And what is the default value for such timeout if any?
>
> --
> Best regards,
> Anton Matsiuk
>
> On 21 November 2013 17:56, Ben Pfaff <blp at nicira.com> wrote:
>
> > Please don't drop the mailing list.
> >
> > You have begun to narrow down where the drops occur, but it's still not
> > clear exactly where. I suggest following the troubleshooting procedure
> > in the FAQ.
> >
> > Q: I have a sophisticated network setup involving Open vSwitch, VMs or
> > multiple hosts, and other components. The behavior isn't what I
> > expect. Help!
> >
> > A: To debug network behavior problems, trace the path of a packet,
> > hop-by-hop, from its origin in one host to a remote host. If
> > that's correct, then trace the path of the response packet back to
> > the origin.
> >
> > Usually a simple ICMP echo request and reply ("ping") packet is
> > good enough. Start by initiating an ongoing "ping" from the origin
> > host to a remote host. If you are tracking down a connectivity
> > problem, the "ping" will not display any successful output, but
> > packets are still being sent. (In this case the packets being sent
> > are likely ARP rather than ICMP.)
> >
> > Tools available for tracing include the following:
> >
> > - "tcpdump" and "wireshark" for observing hops across network
> > devices, such as Open vSwitch internal devices and physical
> > wires.
> >
> > - "ovs-appctl dpif/dump-flows <br>" in Open vSwitch 1.10 and
> > later or "ovs-dpctl dump-flows <br>" in earlier versions.
> > These tools allow one to observe the actions being taken on
> > packets in ongoing flows.
> >
> > See ovs-vswitchd(8) for "ovs-appctl dpif/dump-flows"
> > documentation, ovs-dpctl(8) for "ovs-dpctl dump-flows"
> > documentation, and "Why are there so many different ways to
> > dump flows?" above for some background.
> >
> > - "ovs-appctl ofproto/trace" to observe the logic behind how
> > ovs-vswitchd treats packets. See ovs-vswitchd(8) for
> > documentation. You can out more details about a given flow
> > that "ovs-dpctl dump-flows" displays, by cutting and pasting
> > a flow from the output into an "ovs-appctl ofproto/trace"
> > command.
> >
> > - SPAN, RSPAN, and ERSPAN features of physical switches, to
> > observe what goes on at these physical hops.
> >
> > Starting at the origin of a given packet, observe the packet at
> > each hop in turn. For example, in one plausible scenario, you
> > might:
> >
> > 1. "tcpdump" the "eth" interface through which an ARP egresses
> > a VM, from inside the VM.
> >
> > 2. "tcpdump" the "vif" or "tap" interface through which the ARP
> > ingresses the host machine.
> >
> > 3. Use "ovs-dpctl dump-flows" to spot the ARP flow and observe
> > the host interface through which the ARP egresses the
> > physical machine. You may need to use "ovs-dpctl show" to
> > interpret the port numbers. If the output seems surprising,
> > you can use "ovs-appctl ofproto/trace" to observe details of
> > how ovs-vswitchd determined the actions in the "ovs-dpctl
> > dump-flows" output.
> >
> > 4. "tcpdump" the "eth" interface through which the ARP egresses
> > the physical machine.
> >
> > 5. "tcpdump" the "eth" interface through which the ARP
> > ingresses the physical machine, at the remote host that
> > receives the ARP.
> >
> > 6. Use "ovs-dpctl dump-flows" to spot the ARP flow on the
> > remote host that receives the ARP and observe the VM "vif"
> > or "tap" interface to which the flow is directed. Again,
> > "ovs-dpctl show" and "ovs-appctl ofproto/trace" might help.
> >
> > 7. "tcpdump" the "vif" or "tap" interface to which the ARP is
> > directed.
> >
> > 8. "tcpdump" the "eth" interface through which the ARP
> > ingresses a VM, from inside the VM.
> >
> > It is likely that during one of these steps you will figure out the
> > problem. If not, then follow the ARP reply back to the origin, in
> > reverse.
> >
> >
> > On Thu, Nov 21, 2013 at 04:55:13PM +0100, Anton Matsiuk wrote:
> > > I request log files up to debug level, namely:
> > > ovs-vswitchd.log
> > > ovs-dpctl.log
> > > ovs-ofctl.log
> > > but none of them shows any messages related to packet drops. All the
> > > statistics shows that correct number of flows was installed and only part
> > > of packets was processed.
> > > That's why I am asking, is there any else possibilities (beyond log
> > files)
> > > to track packet drops in input buffers and probably to fix them? Or at
> > > least in which direction I should search for a solution?
> > >
> > >
> > > On 20 November 2013 18:13, Ben Pfaff <blp at nicira.com> wrote:
> > >
> > > > On Wed, Nov 20, 2013 at 12:35:25PM +0100, Anton Matsiuk wrote:
> > > > > I test Open vSwitch in the following scheme: I use 2 hosts directly
> > > > > connected to OVS and external OpenFlow Controller. Host1 generates
> > UDP
> > > > > datagrams with sequential ports towards Host2, Host 2 listens for
> > these
> > > > UDP
> > > > > datagrams. In responce to every UDP datagram OVS generates Packet_In
> > and
> > > > > Controller sends Flow_Mod back with L4 granularity (so for every
> > pair of
> > > > > UDP port numbers it installs separate flow). I send bunch of UDP
> > > > datagrams
> > > > > from Host1 and calculate how many of them arrived to Host2. I tried
> > both
> > > > > with detached controller and running in the same machine as OVS. I
> > tested
> > > > > it on different machines (in Mininet and with separated real hosts).
> > I
> > > > use
> > > > > out-of-band option for controller and disable-in-band=true.
> > > > >
> > > > >
> > > > > Starting some number of packets ( around >300) packet drops are
> > > > observed.
> > > > > For instance, if I generate 500 UDP packets in 120 ms only around
> > 350 of
> > > > > them arrive to Host2 (Subsequent packets of the same flow can arrive
> > to
> > > > > Host2, but first packets of flows always experience drops)
> > > > >
> > > > >
> > > > > ovs-ofctl dump-aggregate show that all the flows are installed but
> > only
> > > > > part of packets are processed through them:
> > > > >
> > > > > NXST_AGGREGATE reply (xid=0x4): packet_count=356 byte_count=42364
> > > > > flow_count=500
> > > > >
> > > > >
> > > > > ovs-ofctl dump-ports also shows that 500 packets arrive on ingress
> > > > > interface and only 356 leave egress.
> > > > >
> > > > >
> > > > > ovs-dpctl show ?s shows the same ? 500 flows installed and 356
> > packets
> > > > > processed.
> > > > >
> > > > >
> > > > > Also I tried to replace Flow_Mods with Packet_Out messages for every
> > > > > packet, but I experienced the same drops. It seems like OVS starts
> > > > dropping
> > > > > packets after some threshold (or buffer overload).
> > > > >
> > > > >
> > > > > Is there any possibility to debug these drops and maybe to manipulate
> > > > > ingress buffer sizes (or queue priorities) in order to avoid such
> > drops?
> > > >
> > > > Yes, I think you will have to do the initial debugging yourself, to
> > find
> > > > out where the drop is occurring. When you report that back to us, we
> > > > can help you figure out how to fix it.
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Anton Matsiuk
> >
More information about the discuss
mailing list