[ovs-dev] [PATCH RFC v2 0/8] Introduce connection tracking tc offload

Marcelo Ricardo Leitner marcelo.leitner at gmail.com
Tue Jul 23 15:42:17 UTC 2019


On Sat, Jul 20, 2019 at 08:26:59AM +0000, Paul Blakey wrote:
> Hi Marcelo, thanks for reporting this, can you dump the datapath rules via ovs-appctl dpctl/dump-flows -m --names
> Also running "tc filter show dev ns2-veth-ab ingress" while it happened can show us more details.
> 
> I'll try and reproduce it on my end.
> How often does this happen? and what is the setup ?

Quite often.
Setup is:
ns1                                                             ns2
--------------.                                     .--------------
              |         v------ br0 ------v         |
 ns1-veth-ba--x----ns1-veth-ab       ns2-veth-ab----x-- ns2-veth-ba
              |                                     |
--------------'                                     '--------------

Seems related to the skb_ext patches.. with a kernel with more
debugging stuff enabled, I'm seeing:

[   17.467576] =============================
[   17.468717] WARNING: suspicious RCU usage
[   17.469687] 5.2.0.c1f3d.g83fb7bc8ff16+ #2 Not tainted
[   17.471496] -----------------------------
[   17.472922] net/sched/sch_ingress.c:52 suspicious rcu_dereference_protected() usage!
[   17.474715]
               other info that might help us debug this:

[   17.476219]
               rcu_scheduler_active = 2, debug_locks = 1
[   17.477408] no locks held by ovs-vswitchd/1063.
[   17.478267]
               stack backtrace:
[   17.479545] CPU: 10 PID: 1063 Comm: ovs-vswitchd Not tainted 5.2.0.c1f3d.g83fb7bc8ff16+ #2
[   17.481932] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[   17.483725] Call Trace:
[   17.484179]  dump_stack+0x85/0xc0
[   17.484782]  ingress_tcf_block+0x4c/0x50 [sch_ingress]
[   17.485717]  __tcf_block_find+0x28/0x80
[   17.486419]  tc_new_tfilter+0x1b6/0x960
[   17.487093]  ? tc_del_tfilter+0x720/0x720
[   17.487854]  rtnetlink_rcv_msg+0x389/0x4b0
[   17.488590]  ? netlink_deliver_tap+0x95/0x400
[   17.489343]  ? rtnl_dellink+0x2d0/0x2d0
[   17.490034]  netlink_rcv_skb+0x49/0x110
[   17.490726]  netlink_unicast+0x171/0x200
[   17.491409]  netlink_sendmsg+0x21e/0x3e0
[   17.492096]  sock_sendmsg+0x5e/0x60
[   17.492792]  ___sys_sendmsg+0x2ae/0x330
[   17.493725]  ? ___sys_recvmsg+0x159/0x1f0
[   17.494734]  ? up_write+0x1c/0xc0
[   17.495515]  ? ext4_file_write_iter+0xd1/0x3b0
[   17.496624]  ? find_held_lock+0x2b/0x80
[   17.497362]  ? ksys_write+0xc0/0xe0
[   17.498000]  __sys_sendmsg+0x59/0xa0
[   17.498653]  do_syscall_64+0x5c/0xa0
[   17.499307]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   17.500225] RIP: 0033:0x7f6bd4ab2b45

And I could trigger a crash with this trace:
(crash was on a different run, while the RCU trace above and outputs below are
from the same run)

[  382.670318] BUG: kernel NULL pointer dereference, address: 0000000000000030
[  382.671927] #PF: supervisor read access in kernel mode
[  382.673070] #PF: error_code(0x0000) - not-present page
[  382.674170] PGD 0 P4D 0
[  382.674873] Oops: 0000 [#1] SMP NOPTI
[  382.675753] CPU: 1 PID: 6988 Comm: nc Kdump: loaded Not tainted 5.2.0.c1f3d.g83fb7bc8ff16+ #1
[  382.677350] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[  382.679255] RIP: 0010:tcf_classify+0x104/0x1e0
[  382.680942] Code: 5c 41 5d 41 5e 41 5f c3 48 8b 55 00 be 01 00 00 00 4c 89 ff 48 89 14 24 e8 79 e3 f9 ff 48 8b 14 24 48 85 c0 0f 84 cb 00 00
 00 <48> 8b 4a 30 8b 49 40 89 08 41 8d 45 01 41 83 fd 03 7e ac e8 84 37
[  382.684792] RSP: 0018:ffffbd19000f4dd0 EFLAGS: 00010282
[  382.685746] RAX: ffff9e04772b0008 RBX: ffff9e0479088360 RCX: 0000000000000001
[  382.686929] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9e047ad9a140
[  382.688144] RBP: ffffbd19000f4e40 R08: ffff9e047b870b60 R09: 0000000000000002
[  382.689347] R10: ffff9e04550ccd68 R11: ffff9e0476050a00 R12: 0000000000000001
[  382.690534] R13: 0000000000000000 R14: ffff9e0479088360 R15: ffff9e0479229ce0
[  382.691730] FS:  00007efd4d75bb80(0000) GS:ffff9e047b840000(0000) knlGS:0000000000000000
[  382.693033] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  382.694061] CR2: 0000000000000030 CR3: 0000000163abc000 CR4: 00000000003406e0
[  382.695283] Call Trace:
[  382.695962]  <IRQ>
[  382.696573]  __netif_receive_skb_core+0x3c0/0xcf0
[  382.697492]  ? reweight_entity+0x15b/0x1a0
[  382.698340]  __netif_receive_skb_one_core+0x37/0x90
[  382.699274]  process_backlog+0x9c/0x150
[  382.700199]  net_rx_action+0xff/0x350
[  382.701093]  __do_softirq+0xee/0x2ff
[  382.701976]  do_softirq_own_stack+0x2a/0x40
[  382.702948]  </IRQ>
[  382.703626]  do_softirq.part.0+0x41/0x50
[  382.704556]  __local_bh_enable_ip+0x4b/0x50
[  382.705536]  ip_finish_output2+0x1a9/0x580
[  382.706518]  ip_output+0x71/0xf0
[  382.707392]  ? __ip_finish_output+0x1c0/0x1c0
[  382.708422]  __ip_queue_xmit+0x16c/0x420
[  382.709423]  ? __kmalloc_reserve.isra.0+0x2d/0x70
[  382.711184]  __tcp_transmit_skb+0x52c/0xab0
[  382.712582]  tcp_connect+0xaed/0xda0
[  382.713403]  ? kvm_clock_get_cycles+0xd/0x10
[  382.714315]  tcp_v4_connect+0x3f1/0x500
[  382.715156]  __inet_stream_connect+0xc5/0x360
[  382.716070]  inet_stream_connect+0x37/0x50
[  382.716947]  __sys_connect+0xd6/0x100
[  382.717766]  ? do_fcntl+0x1a5/0x650
[  382.718588]  ? __x64_sys_fcntl+0x84/0xb0
[  382.719450]  __x64_sys_connect+0x16/0x20
[  382.720310]  do_syscall_64+0x5b/0x180
[  382.721146]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Then on the test:
+ ovs-appctl dpctl/dump-conntrack
tcp,orig=(src=192.168.0.2,dst=192.168.0.1,sport=35948,dport=5001),reply=(src=192.168.0.1,dst=233.185.30.138,sport=5001,dport=35948),zone=2,protoinfo=(state=SYN_SENT)
tcp,orig=(src=192.168.0.2,dst=192.168.0.1,sport=35946,dport=5001),reply=(src=192.168.0.1,dst=192.168.0.2,sport=5001,dport=35946),protoinfo=(state=TIME_WAIT)

+ ovs-appctl dpctl/dump-flows -m --names
ufid:a7998e58-919b-4d05-9cf5-4ab05527bf91, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x20),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ns2-veth-ab),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=6,tos=0/0,ttl=0/0,frag=no),tcp(src=0/0,dst=0/0), packets:3, bytes:180, used:2.890s, dp:tc, actions:ct(zone=2),recirc(0x1)
ufid:89ec315b-9068-4c2c-a795-3ea78c406616, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ns2-veth-ab),packet_type(ns=0/0,id=0/0),eth(src=12:1d:05:89:52:0b,dst=33:33:00:00:00:02),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:56, used:1.480s, dp:tc, actions:br0,ns1-veth-ab
ufid:7becab91-3e24-4ba5-abe0-39b7650243ff, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ns2-veth-ab),packet_type(ns=0/0,id=0/0),eth(src=12:1d:05:89:52:0b,dst=33:33:ff:89:52:0b),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:6.010s, dp:tc, actions:br0,ns1-veth-ab
ufid:868ba402-078f-4220-9865-6c58ee6e02bb, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ns2-veth-ab),packet_type(ns=0/0,id=0/0),eth(src=12:1d:05:89:52:0b,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:6.010s, dp:tc, actions:br0,ns1-veth-ab
ufid:10757d31-40c8-4359-8735-b40e2e16b07c, skb_priority(0/0),skb_mark(0/0),ct_state(0x21/0x21),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0x1),dp_hash(0/0),in_port(ns2-veth-ab),packet_type(ns=0/0,id=0/0),eth(src=00:00:00:00:00:00/00:00:00:00:00:00,dst=00:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=0.0.0.0/0.0.0.0,dst=0.0.0.0/0.0.0.0,proto=6,tos=0/0,ttl=0/0,frag=no),tcp(src=0/0,dst=0/0), packets:3, bytes:180, used:2.890s, dp:tc, actions:ct(commit,zone=2,nat(src=192.168.0.30)),ns1-veth-ab
ufid:9e322377-a849-45d5-8c9e-719bb60bf76a, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ns1-veth-ab),packet_type(ns=0/0,id=0/0),eth(src=0e:78:05:f3:ea:8c,dst=33:33:ff:f3:ea:8c),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:6.010s, dp:tc, actions:br0,ns2-veth-ab
ufid:1c0c9b46-ceb1-4c6f-95ae-f44169f6395b, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ns1-veth-ab),packet_type(ns=0/0,id=0/0),eth(src=0e:78:05:f3:ea:8c,dst=33:33:00:00:00:02),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:1, bytes:56, used:2.500s, dp:tc, actions:br0,ns2-veth-ab
ufid:6e12004f-a24f-42d5-b946-d213491f93c6, skb_priority(0/0),skb_mark(0/0),ct_state(0/0x23),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(ns1-veth-ab),packet_type(ns=0/0,id=0/0),eth(src=0e:78:05:f3:ea:8c,dst=33:33:00:00:00:16),eth_type(0x86dd),ipv6(src=::/::,dst=::/::,label=0/0,proto=0/0,tclass=0/0,hlimit=0/0,frag=no), packets:0, bytes:0, used:6.000s, dp:tc, actions:br0,ns2-veth-ab
+ tc filter show dev ns2-veth-ab ingress
filter protocol ip pref 3 flower chain 0 
filter protocol ip pref 3 flower chain 0 handle 0x1 
  eth_type ipv4
  ip_proto tcp
  ip_flags nofrag
  ct_state -trk
  not_in_hw
        action order 1: ct zone 2 pipe
         index 1 ref 1 bind 1
        cookie 588e99a7054d9b91b04af59c91bf2755

        action order 2: gact action goto chain 1
         random type none pass val 0
         index 1 ref 1 bind 1
        cookie 588e99a7054d9b91b04af59c91bf2755

filter protocol ipv6 pref 4 flower chain 0 
filter protocol ipv6 pref 4 flower chain 0 handle 0x1 
  dst_mac 33:33:00:00:00:02
  src_mac 12:1d:05:89:52:0b
  eth_type ipv6
  ip_flags nofrag
  ct_state -trk-new-est
  not_in_hw
        action order 1: skbedit  ptype host pipe
         index 1 ref 1 bind 1

        action order 2: mirred (Ingress Mirror to device br0) pipe
        index 2 ref 1 bind 1
        cookie 5b31ec892c4c6890a73e95a71666408c

        action order 3: mirred (Egress Redirect to device ns1-veth-ab) stolen
        index 3 ref 1 bind 1
        cookie 5b31ec892c4c6890a73e95a71666408c

filter protocol ipv6 pref 4 flower chain 0 handle 0x2 
  dst_mac 33:33:ff:89:52:0b
  src_mac 12:1d:05:89:52:0b
  eth_type ipv6
  ip_flags nofrag
  ct_state -trk-new-est
  not_in_hw
        action order 1: skbedit  ptype host pipe
         index 2 ref 1 bind 1

        action order 2: mirred (Ingress Mirror to device br0) pipe
        index 4 ref 1 bind 1
        cookie 91abec7ba54b243eb739e0abff430265

        action order 3: mirred (Egress Redirect to device ns1-veth-ab) stolen
        index 5 ref 1 bind 1
        cookie 91abec7ba54b243eb739e0abff430265

filter protocol ipv6 pref 4 flower chain 0 handle 0x3
  dst_mac 33:33:00:00:00:16
  src_mac 12:1d:05:89:52:0b
  eth_type ipv6
  ip_flags nofrag
  ct_state -trk-new-est
  not_in_hw
        action order 1: skbedit  ptype host pipe
         index 3 ref 1 bind 1

        action order 2: mirred (Ingress Mirror to device br0) pipe
        index 6 ref 1 bind 1
        cookie 02a48b8620428f07586c6598bb026eee

        action order 3: mirred (Egress Redirect to device ns1-veth-ab) stolen
        index 7 ref 1 bind 1
        cookie 02a48b8620428f07586c6598bb026eee

filter protocol ip pref 3 flower chain 1
filter protocol ip pref 3 flower chain 1 handle 0x1
  eth_type ipv4
  ip_proto tcp
  ip_flags nofrag
  ct_state +trk+new
  not_in_hw
        action order 1: ct commit zone 2 nat src addr 192.168.0.30-0.0.0.0 pipe
         index 2 ref 1 bind 1
        cookie 317d75105943c8400eb435877cb0162e

        action order 2: mirred (Egress Redirect to device ns1-veth-ab) stolen
        index 1 ref 1 bind 1
        cookie 317d75105943c8400eb435877cb0162e

Test script is attached.



More information about the dev mailing list