[ovs-discuss] Re:Re: [HELP] Question about icmp pkt marked Invalid by userspace conntrack

Darrell Ball dlu998 at gmail.com
Wed Nov 6 15:49:35 UTC 2019


Hi Timo

On Wed, Nov 6, 2019 at 2:06 AM txfh2007 <txfh2007 at aliyun.com> wrote:

> Hi Darrell:
>
> the flow dump result is as below: Please help to check
> BEFORE:
>
>
> ct_state(-new+est-rel-rpl-inv+trk),ct_label(0/0x1),recirc_id(0x1b),in_port(6),packet_type(ns=0,id=0),eth(src=fa:16:3e:12:d7:77,dst=fa:16:3e:33:02:d8),eth_type(0x0800),ipv4(dst=
> 192.168.1.8/255.255.255.248,proto=6,frag=no),tcp_flags(psh|ack),
> packets:18934, bytes:26602222, used:0.000s, flags:P.,
> actions:ct(zone=1),recirc(0x1c)
>
> ct_state(-new+est-rel-rpl-inv+trk),ct_label(0/0x1),recirc_id(0x1c),in_port(6),packet_type(ns=0,id=0),eth(src=fa:16:3e:12:d7:77,dst=fa:16:3e:33:02:d8),eth_type(0x0800),ipv4(src=192.168.1.10,dst=192.168.1.8,proto=6,frag=no),
> packets:5345996, bytes:7676256441, used:0.000s, flags:P., actions:5
>
> AFTER:
>
>
> ct_state(-new+est-rel-rpl-inv+trk),ct_label(0/0x1),recirc_id(0x19),in_port(6),packet_type(ns=0,id=0),eth(src=fa:16:3e:12:d7:77,dst=fa:16:3e:33:02:d8),eth_type(0x0800),ipv4(dst=
> 192.168.1.8/255.255.255.248,proto=6,frag=no),tcp_flags(ack),
> packets:2473174, bytes:3551472384, used:0.136s, flags:.,
> actions:meter(0),ct(zone=1),recirc(0x1a)
>

meter is being applied by above rule and then the pipeline continues below
to do another pass thru the datapath;
this would likely explain the numbers
Can you double check the Openflow rules and do the metering at output rule.



> ct_state(-new+est-rel-rpl-inv+trk),ct_label(0/0x1),recirc_id(0x1a),in_port(6),packet_type(ns=0,id=0),eth(src=fa:16:3e:12:d7:77,dst=fa:16:3e:33:02:d8),eth_type(0x0800),ipv4(src=192.168.1.10,dst=192.168.1.8,proto=6,frag=no),
> packets:5292889, bytes:7599875381, used:0.046s, flags:P., actions:5
>
>
>
> meter rate is 1Gbps, iperf result is around 800Mbps
>
> [  5]  95.00-96.00  sec   104 MBytes   869 Mbits/sec
> [  5]  96.00-97.00  sec  79.4 MBytes   666 Mbits/sec
> [  5]  97.00-98.00  sec   107 MBytes   896 Mbits/sec
> [  5]  98.00-99.00  sec  75.4 MBytes   632 Mbits/sec
> [  5]  99.00-100.00 sec  98.3 MBytes   824 Mbits/sec
> [  5] 100.00-100.04 sec  0.00 Bytes  0.00 bits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bandwidth
> [  5]   0.00-100.04 sec  0.00 Bytes  0.00 bits/sec                  sender
> [  5]   0.00-100.04 sec  9.29 GBytes   798 Mbits/sec
> receiver
>
>
>
> ------------------------------------------------------------------
> :Darrell Ball <dlu998 at gmail.com>
> :2019年11月6日(星期三) 02:46
> :txfh2007 <txfh2007 at aliyun.com>
> :Ben Pfaff <blp at ovn.org>; ovs-discuss <ovs-discuss at openvswitch.org>
> :Re: [ovs-discuss] Re:Re: [HELP] Question about icmp pkt marked Invalid by
> userspace conntrack
>
>
> Hi Timo
>
> On Mon, Nov 4, 2019 at 11:29 PM txfh2007 <txfh2007 at aliyun.com> wrote:
>
> Hi Darrell:
>     The meter rate limit is set as 1Gbps, but the actual rate is around
> 500Mbps.. I have read the meter patch, but this patch is to prevent delta_t
> changed to 0. But in my case, the delta_t is around 35500ms.
>
>
> It might be good to just include all known related fixes anyways,
> including this other one
>
>
>
> https://github.com/openvswitch/ovs/commit/acc5df0e3cb036524d49891fdb9ba89b609dd26a
>
>
>
>
> For my case, the meter action is on openflow table 46, and the ct action
> is on table 44, the output action is on table 65, so I guess the order is
> right?
>
>
> Could you dump the 'relevant' datapath flows before adding the meter rule
> and after adding the meter rule ?
> ovs-appctl dpif/dump-flows <bridge>
>
>
>
> Thanks
>
> Timo
>
>
>
> ------------------------------------------------------------------
> :Darrell Ball <dlu998 at gmail.com>
> :2019年11月5日(星期二) 06:56
> :txfh2007 <txfh2007 at aliyun.com>
> :Ben Pfaff <blp at ovn.org>; ovs-discuss <ovs-discuss at openvswitch.org>
> :Re: [ovs-discuss] Re:Re: [HELP] Question about icmp pkt marked Invalid by
> userspace conntrack
>
>
> Hi Timo
>
> On Sun, Nov 3, 2019 at 5:12 PM txfh2007 <txfh2007 at aliyun.com> wrote:
>
> Hi Darrell:
>     Sorry for my late reply. Yes, the two VMs under test are on same
> compute node , and pkts rx/tx via vhost user type port.
>
> Got it
>
>
> Firstly if I don't configure meter table, then Iperf TCP bandwidth result
> From VM1 to VM2 is around 5Gbps, then I set the meter entry and constraint
> the rate, and the deviation is larger than I throught.
>
>
> IIUC, pre-meter, you get 5 Gbps, then post-meter 0.5 Gpbs, which is less
> than you expected ?
> What did you expect the metered rate to be ?
> Note Ben pointed you to a meter related bug fix on the alias b4.
>
>     I guess the recalculation of l4 checksum during conntrack would impact
> the actual rate?
>
>
> are you applying the meter rule at end of the complete pipeline ?
>
>
> Thank you
> Timo
>
>
>
>
> txfh2007 <txfh2007 at aliyun.com>
> Ben Pfaff <blp at ovn.org>; ovs-discuss <ovs-discuss at openvswitch.org>
> Re: [ovs-discuss] Re:Re: [HELP] Question about icmp pkt marked Invalid by
> userspace conntrack
>
>
> Hi Timo
>
>
> I read thru this thread to get more context on what you are doing; you
> have a base OVS-DPDK
> use case and are measuring VM to VM performance across 2 compute nodes.
> You are probably using
> vhost-user-client ports ? Pls correct me if I am wrong.
> In this case, "per direction" you have one rx virtual interface to handle
> in OVS; there will be a tradeoff b/w
> checksum validation security and performance.
> JTBC, in terms of your measurements, how did you arrive at the 5Gbps -
> instrumented code or otherwise ?.
> (I can verify that later when I have a setup).
>
>
> Darrell
>
>
>
>
>
>
>
>
>
>
> On Thu, Oct 31, 2019 at 9:23 AM Darrell Ball <dlu998 at gmail.com> wrote:
>
>
>
>
> On Thu, Oct 31, 2019 at 3:04 AM txfh2007 via discuss <
> ovs-discuss at openvswitch.org> wrote:
>
> Hi Ben && Darrell:
>      This patch works, but after merging this patch I have found the iperf
> throughout decrease from 5Gbps+ to 500Mbps.
>
> what is the 5Gbps number ? Is that the number with marking all packets as
> invalid in initial sanity checks ?
>
>
> Typically one wants to offload checksum checks. The code checks whether
> that has been done and skips
> doing it in software; can you verify that you have the capability and are
> using it ?
>
>
> Skipping checksum checks reduces security, of course, but it can be added
> if there is a common case of
> not being able to offload checksumming.
>
>
>
>  I guess maybe we should add a switch to turn off layer4 checksum
> validation when doing userspace conntrack ? I have found for kernel
> conntrack, there is a related button named "nf_conntrack_checksum"  .
>
> Any advice?
>
> Thank you !
>
> ------------------------------------------------------------------
>
> :Ben Pfaff <blp at ovn.org>
> :ovs-discuss <ovs-discuss at openvswitch.org>
> :Re:Re:[ovs-discuss] [HELP] Question about icmp pkt marked Invalid by
> userspace conntrack
>
>
> Hi Ben && Darrell:
>      Thanks, this patch works! Now the issue seems fixed
>
> Timo
>
>
> Re: Re:[ovs-discuss] [HELP] Question about icmp pkt marked Invalid by
> userspace conntrack
>
>
> I see.
>
> It sounds like Darrell pointed out the solution, but please let me know
> if it did not help.
>
> On Fri, Oct 11, 2019 at 08:57:58AM +0800, txfh2007 wrote:
> > Hi Ben:
> >
> >      I just found the GCC_UNALIGNED_ACCESSORS error during gdb trace and
> not sure this is a misaligned error or others. What I can confirm is
> during "extract_l4" of this icmp reply packet, when we do "check_l4_icmp",
> the unaligned error emits and the "extract_l4" returned false. So this
> packet be marked as ct_state=invalid.
> >
> > Thank you for your help.
> >
> > Timo
> >
> > Topic:Re: [ovs-discuss] [HELP] Question about icmp pkt marked Invalid by
> userspace conntrack
> >
> >
> > It's very surprising.
> >
> > Are you using a RISC architecture that insists on aligned accesses?  On
> > the other hand, if you are using x86-64 or some other architecture that
> > ordinarily does not care, are you sure that this is about a misaligned
> > access (it is more likely to simply be a bad pointer)?
> >
> > On Thu, Oct 10, 2019 at 10:50:33PM +0800, txfh2007 via discuss wrote:
> > >
> > > Hi all:
> > >     I was using OVS-DPDK(version 2.10-1), and I have found pinging
> between two VMs on different compute nodes failed. I have checked my env
> and found there is one node's NIC cannot strip CRC of a frame, the other
> node's NIC is normal(I mean it can strip CRC ). And the reason of ping fail
> is the icmp reply pkt (from node whose NIC cannot strip CRC) is marked as
> invalid . So the icmp request From Node A is 64 bytes, but the icmp reply
> From Node B is 68 bytes(with 4 bytes CRC). And when doing "check_l4_icmp",
> when we call csum task(in lib/csum.c). Gcc emits unaligned accessor error.
> The backtrace is as below:
> > >
> > >     I just want to confirm if this phenomenon is reasonable?
> > >
> > > Many thanks
> > >
> > > Timo
> > >
> > >
> > > get_unaligned_be16 (p=0x7f2ad0b1ed5c) at lib/unaligned.h:89
> > > 89 GCC_UNALIGNED_ACCESSORS(ovs_be16, be16);
> > > (gdb) bt
> > > #0  get_unaligned_be16 (p=0x7f2ad0b1ed5c) at lib/unaligned.h:89
> > > #1  0x000000000075a584 in csum_continue (partial=0,
> data_=0x7f2ad0b1ed5c, n=68) at lib/csum.c:46
> > > #2  0x000000000075a552 in csum (data=0x7f2ad0b1ed5c, n=68) at
> lib/csum.c:33
> > > #3  0x00000000008ddf18 in check_l4_icmp (data=0x7f2ad0b1ed5c, size=68,
> validate_checksum=true) at lib/conntrack.c:1638
> > > #4  0x00000000008de650 in extract_l4 (key=0x7f32a20df120,
> data=0x7f2ad0b1ed5c, size=68, related=0x7f32a20df15d, l3=0x7f2ad0b1ed48,
> > >     validate_checksum=true) at lib/conntrack.c:1888
> > > #5  0x00000000008de90d in conn_key_extract (ct=0x7f32b42a2d98,
> pkt=0x7f2ad0b1e9c0, dl_type=8, ctx=0x7f32a20df120, zone=4)
> > >     at lib/conntrack.c:1973
> > > #6  0x00000000008dd49c in conntrack_execute (ct=0x7f32b42a2d98,
> pkt_batch=0x7f32a20e08b0, dl_type=8, force=false, commit=false,
> > >     zone=4, setmark=0x0, setlabel=0x0, tp_src=0, tp_dst=0, helper=0x0,
> nat_action_info=0x0, now=5395897849) at lib/conntrack.c:1318
> > > #7  0x000000000076d651 in dp_execute_cb (aux_=0x7f32a20dfb00,
> packets_=0x7f32a20e08b0, a=0x7f32a20e0ac8, should_steal=false)
> > >     at lib/dpif-netdev.c:6711
> > > #8  0x00000000007b2d49 in odp_execute_actions (dp=0x7f32a20dfb00,
> batch=0x7f32a20e08b0, steal=true, actions=0x7f32a20e0ac8,
> > >     actions_len=20, dp_execute_action=0x76ca60 <dp_execute_cb>) at
> lib/odp-execute.c:726
> > > #9  0x000000000076d71b in dp_netdev_execute_actions
> (pmd=0x7f2a6e1ce010, packets=0x7f32a20e08b0, should_steal=true,
> > >     flow=0x7f32a20dfb60, actions=0x7f32a20e0ac8, actions_len=20) at
> lib/dpif-netdev.c:6754
> > > #10 0x000000000076b900 in handle_packet_upcall (pmd=0x7f2a6e1ce010,
> packet=0x7f2ad0b1e9c0, key=0x7f32a20e1100,
> > >     actions=0x7f32a20e0a40, put_actions=0x7f32a20e0a80) at
> lib/dpif-netdev.c:6056
> > > #11 0x000000000076bdf0 in fast_path_processing (pmd=0x7f2a6e1ce010,
> packets_=0x7f32a20e2b60, keys=0x7f32a20e10c0,
> > >     batches=0x7f32a20e0f90, n_batches=0x7f32a20e13c0, in_port=15) at
> lib/dpif-netdev.c:6153
> > > #12 0x000000000076c3df in dp_netdev_input__ (pmd=0x7f2a6e1ce010,
> packets=0x7f32a20e2b60, md_is_valid=true, port_no=0)
> > >     at lib/dpif-netdev.c:6230
> > > #13 0x000000000076c4d4 in dp_netdev_recirculate (pmd=0x7f2a6e1ce010,
> packets=0x7f32a20e2b60) at lib/dpif-netdev.c:6265
> > > #14 0x000000000076ceae in dp_execute_cb (aux_=0x7f32a20e1db0,
> packets_=0x7f32a20e2b60, a=0x7f32a20e2d78, should_steal=true)
> > >
> > >
> > > _______________________________________________
> > > discuss mailing list
> > > discuss at openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
>
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20191106/4f6cf0e8/attachment-0001.html>


More information about the discuss mailing list