[ovs-discuss] Enable sFlow on br-tun cause OVS 2.5.0 crashed

张东亚 fortitude.zhang at gmail.com
Tue Aug 30 02:08:39 UTC 2016


Hi Neil,

Thanks a lot for your prompt reply.

We are glad that ovs 2.5.0 report outer tunnel information which can help
us to locate physical node of the flows, by which we can pin down network
problems easily, that's why we are evaluating it, thanks a lot for the work
of you and community.

Regarding the reproducement of the issue, after read the code, I think it
also is caused by multiple tunnels shared same dp if port, it should be
reproduced by the following step:

1. configure sflow on a OVS bridge.
2. configure vxlan tunnel 1 on the bridge.
3. configure vxlan tunnel 2 on the bridge, since dpif_sflow_add_port call
dpif_sflow_del_port every time, this actually make ds->ports have only 1
node corresponding tunnel 2 using hash of the same vxlan_sys_4789, this DO
cause inconsistency between userspace ofport and odp port.
4. remove 1 tunnel out of the 2, since there is only 1 node in the
ds->ports, but there is only one odp port shared by them, entry in
ds->ports will be removed, at this time, ds->ports does not have ports
corresponding to the vxlan_sys_4789 datapath port.
5. if there are any sampled packet upcall from kernel space, it can't find
in_dsp, which will make OVS crash.

Hope this will help you to fix the problem, I wonder that we should prevent
inconsistency between ofport and odp port in ds->ports.

I have also have a look at the ofproto-dpif.at, seems it only test the
cause that there is only 1 tunnel configured, and besides, it does not
cover the case that packet is sampled on the tunnel ingress, I thinks that
might be the reason it does not be produced.

2016-08-30 2:04 GMT+08:00 Neil McKee <neil.mckee at inmon.com>:

> I have submitted a patch to fix this.  Sorry for the trouble.
>
> You are right that it is not essential for OVS to report the tunnel
> type.  Just the fact that there was a tunnel at all is the most
> important information because it tells you the source/destination
> addresses you see in the sampled packet are from the "inner" address
> space.  And the full inner+outer details of the tunnel will be visible
> on transit physical switches if they support sFlow too:
> http://sflow.org/products/network.php
> and also visible on downstream OVS bridges where sFlow is enabled.  So
> your measurement system should be able to put together the full
> picture.
>
> (If there is an efficient way to populate the tunnel type when the
> input datapath ports is unknown,  then we should add that,  but my
> first concern is to avoid segfault.)
>
> The sFlow feature has been stable since it was first introduced back
> in 2008,  and has been running at scale in many deployments.  However
> this extension to report tunnel details is relatively new.   I am
> surprised that we did not detect this bug when we tested with OVN,
> but looking at the unit tests that were added at the same time I can
> see that this bug would not be caught by them:
>
> ofproto-dpif - sFlow packet sampling - tunnel set
> ofproto-dpif - sFlow packet sampling - tunnel push
> ofproto-dpif - sFlow packet sampling - MPLS
>
> For details,  see tests/ofproto-dpif.at.  If you want to suggest a new
> test,  please do.
>
> I would also like to reproduce what you are seeing,  so if you can
> share a few more details about your setup,  please do.
>
> For sFlow configuration I suggest you run hsflowd on your servers.  As
> well as reporting a rich set of performance metrics in a standard
> format, it will automatically share its sFlow configuration with the
> local OVS bridges if you include "ovs{ }" in /etc/hsflowd.conf:
> http://sflow.net
>
> The use of dpif_sflow_add_port and dpif_sflow_del_port on a
> configuration change dates back to the original implementation.  It
> doesn't affect OVS ports,  just their representation in the sFlow
> agent.   I believe the intention was to make sure that all internal
> agent state was reset cleanly on a configuration change.   For
> example,  the sequence number for packet and counter samples is
> required to reset to 0.
>
> Regards,
> Neil
>
> ------
> Neil McKee
> InMon Corp.
> http://www.inmon.com
>
>
> On Mon, Aug 29, 2016 at 2:31 AM, 张东亚 <fortitude.zhang at gmail.com> wrote:
> > Hi Neil,
> >
> > There is another question regarding the issue, seems dpif_sflow_add_port
> > will call dpif_sflow_del_port port every time, does this is intended to
> > remove previously added dsp since dp_port might be same for different
> tunnel
> > port?
> >
> >
> > 2016-08-29 11:16 GMT+08:00 张东亚 <fortitude.zhang at gmail.com>:
> >>
> >> Hi Neil,
> >>
> >> Seems that's a possible fix because in most case, there might not be
> mixed
> >> tunnel types, set tunnel type to unknown will not affect analysis much.
> >>
> >> However, seems this patch is also present on master branch, I am
> wondering
> >> the stability and test coverity of sFlow feature of OVS.
> >>
> >> Since we are evaluating using sFlow as a network monitoring in our
> cloud,
> >> do you have any advice about the using that feature with OVS?
> >>
> >>
> >>
> >>
> >> 2016-08-26 18:08 GMT+08:00 Neil McKee <neil.mckee at inmon.com>:
> >>>
> >>> I think it would be OK to do this:
> >>>
> >>> tnlInProto = in_dsp ? dpif_sflow_tunnel_proto(in_dsp->tunnel_type) :
> 0;
> >>>
> >>> Neil
> >>>
> >>>
> >>> ------
> >>> Neil McKee
> >>> InMon Corp.
> >>> http://www.inmon.com
> >>>
> >>> On Fri, Aug 26, 2016 at 2:30 AM, 张东亚 <fortitude.zhang at gmail.com>
> wrote:
> >>>>
> >>>> Hi List,
> >>>>
> >>>> Recently we are testing sFlow on OVS 2.5.0, since OVS decide whether
> do
> >>>> sample based on ingress bridge, we then enable sFlow on br-tun, and
> >>>> encounter following crash:
> >>>>
> >>>> #0  0x0000000000434ca8 in dpif_sflow_received (ds=0x4266100,
> >>>> packet=packet at entry=0x7f3cb3fc8e18, flow=flow at entry=0x7f3cb3fd8810,
> >>>> odp_in_port=<optimized out>, cookie=cookie at entry=0x7f3cb3fc3440,
> >>>>     sflow_actions=0x7f3cb3fc36c0) at ofproto/ofproto-dpif-sflow.c:
> 1292
> >>>> #1  0x00000000004366c0 in process_upcall (udpif=udpif at entry
> =0x23bb570,
> >>>> upcall=upcall at entry=0x7f3cb3fe1210,
> >>>> odp_actions=odp_actions at entry=0x7f3cb3fe1280, wc=wc at entry
> =0x7f3cb3fe12c0)
> >>>>     at ofproto/ofproto-dpif-upcall.c:1236
> >>>> #2  0x0000000000437087 in recv_upcalls (handler=<error reading
> variable:
> >>>> Unhandled dwarf expression opcode 0xfa>, handler=<error reading
> variable:
> >>>> Unhandled dwarf expression opcode 0xfa>)
> >>>>     at ofproto/ofproto-dpif-upcall.c:778
> >>>> #3  0x000000000043752a in udpif_upcall_handler (arg=0x24a3490) at
> >>>> ofproto/ofproto-dpif-upcall.c:696
> >>>> #4  0x00000000004bbc54 in ovsthread_wrapper (aux_=<optimized out>) at
> >>>> lib/ovs-thread.c:340
> >>>> #5  0x00007f3cb9f96e0e in start_thread () from
> >>>> /lib/x86_64-linux-gnu/libpthread.so.0
> >>>> #6  0x00007f3cba2940fd in clone () from /lib/x86_64-linux-gnu/libc.so.
> 6
> >>>>
> >>>>
> >>>> After navigate the code, we think the following commit cause the
> crash:
> >>>>
> >>>> 7321bda384c366ae36bbca445f235a65d8f2b1f8
> >>>>
> >>>> Extend sFlow agent to report tunnel and MPLS structures
> >>>>
> >>>> It seems that in the following code assume in vsp have a value,
> however
> >>>> for tunnel port, in vsp will always be NULL since there is only on
> >>>> vxlan_sys_4789 datapath port.
> >>>>
> >>>>
> >>>> if (flow->tunnel.ip_dst) {
> >>>> memset(&tnlInElem, 0, sizeof(tnlInElem));
> >>>> tnlInElem.tag = SFLFLOW_EX_IPV4_TUNNEL_INGRESS;
> >>>> tnlInProto = dpif_sflow_tunnel_proto(in_dsp->tunnel_type);   // BUG:
> >>>> in_dsp will be NULL...
> >>>> dpif_sflow_tunnel_v4(tnlInProto,
> >>>>     &flow->tunnel,
> >>>>     &tnlInElem.flowType.ipv4);
> >>>> SFLADD_ELEMENT(&fs, &tnlInElem);
> >>>> if (flow->tunnel.tun_id) {
> >>>>    memset(&vniInElem, 0, sizeof(vniInElem));
> >>>>    vniInElem.tag = SFLFLOW_EX_VNI_INGRESS;
> >>>>    vniInElem.flowType.tunnel_vni.vni
> >>>> = ntohll(flow->tunnel.tun_id);
> >>>>    SFLADD_ELEMENT(&fs, &vniInElem);
> >>>> }
> >>>>     }
> >>>>
> >>>> Does anyone encounter the same value? I am wondering how we can get
> >>>> tunnel types from the datapath information.
> >>>>
> >>>> I will try to find a fix later.
> >>>>
> >>>> _______________________________________________
> >>>> discuss mailing list
> >>>> discuss at openvswitch.org
> >>>> http://openvswitch.org/mailman/listinfo/discuss
> >>>>
> >>>
> >>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20160830/97915117/attachment-0002.html>


More information about the discuss mailing list