[ovs-discuss] RFC - OVN end to end packet tracing - ovn-global-trace

Tim Rozet trozet at redhat.com
Wed Jun 10 13:22:48 UTC 2020


On Wed, Jun 10, 2020 at 3:36 AM Dumitru Ceara <dceara at redhat.com> wrote:

> On 6/9/20 3:47 PM, Tim Rozet wrote:
> > Hi Dumitru,
>
> Hi Tim,
>
> > Thanks for the detailed explanation. It makes sense and would like to
> > comment on a few things you touched on:
> > 1. I do think we need to somehow functionally trigger conntrack when we
> > do ofproto-trace. It's the only way to know what the real session state
> > ends up being, and we need to be able to follow that for some of the
> > complex bugs where packets are getting dropped after they enter a CT
> > based flow.
> > 2. For your ovn-global-trace, it would be great if that could return a
> > json or other parsable format, so that we could build on top of it with
> > a tool + GUI to graphically show where the problem is in the network.
>
> Ack.
>
> > 3. We really need better user guides on this stuff. Your email is the
> > best tutorial I've seen yet :) I didn't even know about the
> > ovs-tcpundump command, or ovn-detrace (until you told me previously). It
> > would be great to add an ovn troubleshooting guide or something to the
> docs.
> >
>
> I was planning on sending a patch to update the OVN docs but didn't get
> the chance to do it yet.
>
> > As an administrator I would like to have GUI showing all of the logical
> > switch ports (skydive as an example, already does this) and then click
> > on a specific port that someone has reported an issue on. At that point
> > I can click on the port and ask it to tcpdump me the traffic coming out
> > of it. From there, I can select which packet I care about and attempt to
> > do an ovn-global-trace on it, which will then show me where the packet
> > is getting dropped and why. I think this would be the ideal behavior.
> >
>
> That would be cool. Using your example (skydive) though, I guess one
> could also come up with a solution that directly uses the tools already
> existing in OVS/OVN essentially performing the steps that something like
> ovn-global-trace would do.
>

They could, but I think it would be better off living in OVN and then
consumed by something above it.


>
> Thanks,
> Dumitru
>
> > Tim Rozet
> > Red Hat CTO Networking Team
> >
> >
> > On Mon, Jun 8, 2020 at 7:53 AM Dumitru Ceara <dceara at redhat.com
> > <mailto:dceara at redhat.com>> wrote:
> >
> >     Hi everyone,
> >
> >     CC-ing ovn-kubernetes mailing list as I know there's interest about
> this
> >     there too.
> >
> >     OVN currently has a couple of tools that help
> >     tracing/tracking/simulating what would happen to packets within OVN,
> >     some examples:
> >
> >     1. ovn-trace
> >     2. ovs-appctl ofproto/trace ... | ovn-detrace
> >
> >     They're both really useful and provide lots of information but with
> both
> >     of them quite it's hard to get an overview of the end-to-end packet
> >     processing in OVN for a given packet. Therefore both solutions have
> >     disadvantages when trying to troubleshoot production deployments.
> Some
> >     examples:
> >
> >     a. ovn-trace will not take into account any potential issues with
> >     translating logical flows to openflow so if there's a bug in the
> >     translation we'll not be able to detect it by looking at ovn-trace
> >     output. There is the --ovs switch but the user would have to somehow
> >     determine on which hypervisor to query for the openflows
> corresponding
> >     to logical flows/SB entities.
> >
> >     b. "ovs-appctl ofproto/trace ... | ovn-detrace" works quite well when
> >     used on a single node but as soon as traffic gets tunneled to a
> >     different hypervisor the user has to figure out the changes that were
> >     performed on the packet on the source hypervisor and adapt the
> >     packet/flow to include the tunnel information to be used when running
> >     ofproto/trace on the destination hypervisor.
> >
> >     c. both ovn-trace and ofproto/trace support minimal hints to specify
> the
> >     new conntrack state after conntrack recirculation but that turns out
> to
> >     be not enough even in simple scenarios when NAT is involved [0].
> >
> >     In a production deployment one of the scenarios one would have to
> >     troubleshoot is:
> >
> >     "Given this OVN deployment on X nodes why isn't this specific
> >     packet/traffic that is received on logical port P1 doesn't
> reach/reach
> >     port P2."
> >
> >     Assuming that point "c" above is addressed somehow (there are a few
> >     suggestions on how to do that [1]) it's still quite a lot of work for
> >     the engineer doing the troubleshooting to gather all the interesting
> >     information. One would probably do something like:
> >
> >     1. connect to the node running the southbound database and get the
> >     chassis where the logical port is bound:
> >
> >     chassis=$(ovn-sbctl --bare --columns chassis list port_binding P1)
> >     hostname=$(ovn-sbctl --bare --columns hostname list chassis $chassis)
> >
> >     2. connect to $hostname and determine the OVS ofport id of the
> interface
> >     corresponding to P1:
> >
> >     in_port=$(ovs-vsctl --bare --columns ofport find interface
> >     external_ids:iface-id=P1)
> >     iface=$(ovs-vsctl --bare --columns name find interface
> >     external_ids:iface-id=P1)
> >
> >     3. get a hexdump of the packet to be traced (or the flow), for
> example,
> >     on $hostname:
> >     flow=$(tcpdump -xx -c 1 -i $iface $pkt_filter | ovs-tcpundump)
> >
> >     3. run ofproto/trace on $hostname (potentially piping output to
> >     ovn-detrace):
> >
> >     ovs-appctl ofproto/trace br-int in_port=$in_port $flow | ovn-detrace
> >     --ovnnb=$NB_CONN --ovnsb=$SB_CONN
> >
> >     4. In the best case the packet is fully processed on the current node
> >     (e.g., is dropped or forwarded out a local VIF).
> >
> >     5. In the worst case the packet needs to be tunneled to a remote
> >     hypervisor for egress on a remote VIF. The engineer needs to
> identify in
> >     the ofproto/trace output the metadata that would be passed through
> the
> >     tunnel along with the packet and also the changes that would happen
> to
> >     the packet payload (e.g. NAT) on the local hypervisor.
> >
> >     6. Determine the hostname of the chassis hosting the remote tunnel
> >     destination based on "tun_dst" from the ofproto/trace output at
> point 3
> >     above:
> >
> >     chassis_name=$(ovn-sbctl --bare --columns chassis_name find encap
> >     ip=$tun_dst)
> >     hostname=$(ovn-sbctl --bare --columns hostname find chassis
> >     name=$chassis_name)
> >
> >     7. Rerun the ofproto/trace on the remote chassis (basically go back
> to
> >     step #3 above).
> >
> >     My initial thought was that all the work above can be automated as
> all
> >     the information we need is either in the Southbound DB or in OVS DB
> on
> >     the hypervisors and the output of ofproto/trace contains all the
> packet
> >     modifications and tunnel information we need. I had started working
> on a
> >     tool, "ovn-global-trace", that would do all the work above but I hit
> a
> >     few blocking issues:
> >
> >     - point "c" above, i.e., conntrack related packet modifications: this
> >     will require some work in OVS ofproto/trace to either support
> additional
> >     conntrack hints or to actually run the trace against conntrack on
> >     the node.
> >
> >     - if we choose to query conntrack during ofproto/trace we'd probably
> >     need a way to also update the conntrack records the trace is run
> >     against. This would turn out useful for cases when we troubleshoot
> >     session establishment, e.g., with TCP: first run a trace for the SYN
> >     packet, then run a a trace for the SYN-ACK packet in the other
> direction
> >     but for this second trace we'd need the conntrack entry to have been
> >     created by the initial trace.
> >
> >     - ofproto/trace output is plain text: while a tool could parse the
> >     information from the text output it would probably be easier if
> >     ofproto/trace would dump the trace information in a structured way
> >     (e.g., json).
> >
> >     It would be great to get some feedback from the community about other
> >     aspects that I might have missed regarding end-to-end packet tracing
> and
> >     how we could aggregate current utilities into a single easier to use
> >     tool like I was hoping "ovn-global-trace" would end up.
> >
> >     Thanks,
> >     Dumitru
> >
> >     [0]
> >
> https://patchwork.ozlabs.org/project/openvswitch/patch/1578648883-1145-1-git-send-email-dceara@redhat.com/
> >     [1]
> >
> https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/366571.html
> >
> >     --
> >     You received this message because you are subscribed to the Google
> >     Groups "ovn-kubernetes" group.
> >     To unsubscribe from this group and stop receiving emails from it,
> >     send an email to ovn-kubernetes+unsubscribe at googlegroups.com
> >     <mailto:ovn-kubernetes%2Bunsubscribe at googlegroups.com>.
> >     To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/ovn-kubernetes/543bf38d-0578-7f6f-2eef-206d84026a3e%40redhat.com
> .
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20200610/a4785433/attachment.html>


More information about the discuss mailing list