[ovs-discuss] Preventing soft lockups due to packet flood attacks

Favyen Bastani fbastani at perennate.com
Thu Dec 17 23:16:19 UTC 2015


Hi,

On OVS 2.4, I am getting soft lockups from medium-sized packet flood
attacks. It seems like during the packet flood, most incoming packets
are not matching any rules, and so new ones need to be created for every
incoming packet, which overloads the system.

To mitigate this problem, I have a daemon monitoring the packets per
second on the physical interface every 500ms; it does a tcpdump if the
pps is too high, finds the destination address of the flood, and blocks
traffic to the destination address (e.g. 1.2.3.4) by adding a flow:

ovs-ofctl add-flow br-ex dl_type=0x0800,nw_dst=1.2.3.4,actions=drop

This is usually working, but sometimes the flood is not detected quickly
enough, and the entire system freezes, so it's only several minutes
later when the daemon can continue with the block.

Is there a way to limit the rate of unmatched packets to prevent this
overloading that locks up the system (so that some packets are dropped)?
Then the daemon would be able to immediately block the targeted address
by adding the drop flow. Alternatively, there may also be a better way
to do this without the custom daemon code.

Here's the vswitchd log (and kernel trace is attached):

2015-12-17T19:55:41.186Z|00137|dpif_netlink(handler10)|WARN|system at ovs-system:
lost packet on port channel 3 of handler 0
2015-12-17T19:55:43.112Z|00146|ovs_rcu(urcu6)|WARN|blocked 1927 ms
waiting for handler10 to quiesce
2015-12-17T19:55:45.248Z|00147|ovs_rcu(urcu6)|WARN|blocked 4063 ms
waiting for handler10 to quiesce
2015-12-17T19:55:45.248Z|00148|ovs_rcu(urcu6)|WARN|blocked 4063 ms
waiting for handler10 to quiesce
2015-12-17T19:57:10.314Z|00149|ovs_rcu(urcu6)|WARN|blocked 89129 ms
waiting for handler10 to quiesce
2015-12-17T19:57:10.314Z|00150|ovs_rcu(urcu6)|WARN|blocked 89129 ms
waiting for handler10 to quiesce
2015-12-17T19:57:10.314Z|00151|ovs_rcu(urcu6)|WARN|blocked 89129 ms
waiting for handler10 to quiesce
2015-12-17T19:57:10.314Z|00152|ovs_rcu(urcu6)|WARN|blocked 89129 ms
waiting for handler10 to quiesce
2015-12-17T19:57:57.234Z|00153|ovs_rcu(urcu6)|WARN|blocked 136049 ms
waiting for handler10 to quiesce
2015-12-17T19:59:58.593Z|00154|ovs_rcu(urcu6)|WARN|blocked 257408 ms
waiting for handler10 to quiesce
2015-12-17T20:01:42.685Z|726399|dpif(revalidator15)|WARN|Dropped 20 log
messages in last 362 seconds (most recently, 362 seconds ago) due to
excessive rate
2015-12-17T20:04:16.689Z|00155|ovs_rcu(urcu6)|WARN|blocked 515504 ms
waiting for handler10 to quiesce
2015-12-17T20:05:42.479Z|00156|timeval(urcu6)|WARN|Unreasonably long
75330ms poll interval (0ms user, 0ms system)
2015-12-17T20:05:42.479Z|00157|timeval(urcu6)|WARN|context switches: 1
voluntary, 0 involuntary
2015-12-17T20:05:42.479Z|00158|coverage(urcu6)|INFO|Dropped 4 log
messages in last 3203 seconds (most recently, 3165 seconds ago) due to
excessive rate
2015-12-17T20:05:42.479Z|00159|coverage(urcu6)|INFO|Skipping details of
duplicate event coverage for hash=8410616e

Thanks!
Favyen


trace.txt

[20195712.090625] BUG: soft lockup - CPU#2 stuck for 22s!
[ovs-vswitchd:8558]
[20195712.104800] CPU: 2 PID: 8558 Comm: ovs-vswitchd Tainted: G
W    3.14.39-031439-generic #201504211206
[20195712.104805] Hardware name: Supermicro X9SRH-7F/7TF/X9SRH-7F/7TF,
BIOS 3.00 07/05/2013
[20195712.104810] task: ffff88017bd31910 ti: ffff8801b92a2000 task.ti:
ffff8801b92a2000
[20195712.104814] RIP: 0010:[<ffffffffa04f7369>]  [<ffffffffa04f7369>]
nf_conntrack_tuple_taken+0x99/0x1b0 [nf_conntrack]
[20195712.104834] RSP: 0018:ffff88207fc437d8  EFLAGS: 00000246
[20195712.104838] RAX: ffff8801167e6c70 RBX: 000000000770c44e RCX:
00000000b612c443
[20195712.104842] RDX: 0000000000000001 RSI: 00000000fa8f0415 RDI:
ffff88207fc43810
[20195712.104845] RBP: ffff88207fc437f8 R08: 00000000fea3ad6e R09:
000000007361453d
[20195712.104849] R10: ffff88207fc43828 R11: ffff88207fc43978 R12:
ffff88207fc43748
[20195712.104852] R13: ffffffff8176d81d R14: ffff88207fc437f8 R15:
ffff88207fc43810
[20195712.104857] FS:  00007fa272770980(0000) GS:ffff88207fc40000(0000)
knlGS:0000000000000000
[20195712.104861] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20195712.104864] CR2: 000012b6d8042000 CR3: 0000001f38f8c000 CR4:
00000000001427e0
[20195712.104868] DR0: 0000000000000003 DR1: 00000000000000b0 DR2:
0000000000000001
[20195712.104871] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[20195712.104874] Stack:
[20195712.104877]  ffff88020de67998 ffff88207fc43a00 0000000000007c5b
000000000000fc00
[20195712.104887]  ffff88207fc43848 ffffffffa051c659 0000000000000000
000000009f66459e
[20195712.104895]  0000000000000000 0166459e00025000 0000000000000000
0106f7ef00000000
[20195712.104903] Call Trace:
[20195712.104907]  <IRQ>
[20195712.104913]  [<ffffffffa051c659>] nf_nat_used_tuple+0x29/0x30 [nf_nat]
[20195712.104934]  [<ffffffffa051d623>]
nf_nat_l4proto_unique_tuple+0xf3/0x190 [nf_nat]
[20195712.104945]  [<ffffffffa051c659>] ? nf_nat_used_tuple+0x29/0x30
[nf_nat]
[20195712.104955]  [<ffffffff8101ec59>] ? sched_clock+0x9/0x10
[20195712.104965]  [<ffffffffa051d7f5>] tcp_unique_tuple+0x15/0x20 [nf_nat]
[20195712.104975]  [<ffffffffa051ce40>] get_unique_tuple+0x110/0x260
[nf_nat]
[20195712.104989]  [<ffffffffa051d017>] nf_nat_setup_info+0x87/0x360
[nf_nat]
[20195712.104998]  [<ffffffff81092215>] ? run_posix_cpu_timers+0x45/0x290
[20195712.105007]  [<ffffffffa0625104>] xt_snat_target_v0+0x34/0x40 [xt_nat]
[20195712.105017]  [<ffffffffa04de645>] ipt_do_table+0x335/0x550 [ip_tables]
[20195712.105031]  [<ffffffffa04f7459>] ?
nf_conntrack_tuple_taken+0x189/0x1b0 [nf_conntrack]
[20195712.105042]  [<ffffffffa05290b8>] nf_nat_rule_find+0x28/0xc0
[iptable_nat]
[20195712.105051]  [<ffffffffa0529319>] nf_nat_ipv4_fn+0x1c9/0x280
[iptable_nat]
[20195712.105062]  [<ffffffff8168c5e0>] ?
ip_finish_output.part.42+0x440/0x440
[20195712.105071]  [<ffffffffa0529504>] nf_nat_ipv4_out.part.6+0x14/0xd0
[iptable_nat]
[20195712.105079]  [<ffffffffa0529605>] nf_nat_ipv4_out+0x45/0x50
[iptable_nat]
[20195712.105089]  [<ffffffff816804ce>] nf_iterate+0x8e/0xd0
[20195712.105097]  [<ffffffff8168c5e0>] ?
ip_finish_output.part.42+0x440/0x440
[20195712.105104]  [<ffffffff8168058d>] nf_hook_slow+0x7d/0x150
[20195712.105111]  [<ffffffff8168c5e0>] ?
ip_finish_output.part.42+0x440/0x440
[20195712.105119]  [<ffffffff8168cf72>] ip_output+0x82/0x90
[20195712.105126]  [<ffffffff81688e02>] ip_forward_finish+0x102/0x130
[20195712.105132]  [<ffffffff816891b9>] ip_forward+0x389/0x440
[20195712.105139]  [<ffffffff81686e51>] ip_rcv_finish+0x121/0x380
[20195712.105145]  [<ffffffff81687726>] ip_rcv+0x286/0x380
[20195712.105168]  [<ffffffffa015683d>] ?
ixgbe_alloc_rx_buffers+0x7d/0xd0 [ixgbe]
[20195712.105177]  [<ffffffff8164e002>] __netif_receive_skb_core+0x5e2/0x730
[20195712.105195]  [<ffffffffa0156a20>] ? ixgbe_clean_rx_irq+0x190/0x260
[ixgbe]
[20195712.105202]  [<ffffffff8164e171>] __netif_receive_skb+0x21/0x70
[20195712.105209]  [<ffffffff8164eb61>] process_backlog+0xb1/0x190
[20195712.105216]  [<ffffffff8164f559>] net_rx_action+0x139/0x250
[20195712.105224]  [<ffffffff8107000f>] __do_softirq+0xef/0x330
[20195712.105235]  [<ffffffff8176e4dc>] do_softirq_own_stack+0x1c/0x30
[20195712.105237]  <EOI>
[20195712.105240]  [<ffffffff81070305>] do_softirq+0x65/0x70
[20195712.105251]  [<ffffffff810703a1>] __local_bh_enable_ip+0x91/0xa0
[20195712.105260]  [<ffffffff817631b0>] _raw_spin_unlock_bh+0x20/0x40
[20195712.105268]  [<ffffffff8167bc0f>] netlink_poll+0x11f/0x1a0
[20195712.105276]  [<ffffffff81633da6>] sock_poll+0x116/0x130
[20195712.105285]  [<ffffffff811e6554>] do_poll.isra.7+0x144/0x380
[20195712.105293]  [<ffffffff811e76b9>] do_sys_poll+0x199/0x200
[20195712.105301]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105308]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105315]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105322]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105329]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105336]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105343]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105350]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105357]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105365]  [<ffffffff8101e4a9>] ? read_tsc+0x9/0x20
[20195712.105374]  [<ffffffff810d870c>] ? ktime_get_ts+0x4c/0xe0
[20195712.105382]  [<ffffffff811e6815>] ? poll_select_set_timeout+0x85/0xa0
[20195712.105388]  [<ffffffff8102468d>] ? syscall_trace_leave+0xdd/0x150
[20195712.105396]  [<ffffffff811e77fb>] SyS_poll+0x6b/0x100
[20195712.105403]  [<ffffffff8176ccdf>] tracesys+0xe1/0xe6
[20195712.105406] Code: 8b 00 a8 01 74 21 e9 ff 00 00 00 0f 1f 80 00 00
00 00 49 8b 95 e8 09 00 00 65 ff 02 48 8b 00 a8 01 0f 85 e3 00 00 00 0f
b6 50 37 <48> 8d 0c d5 00 00 00 00 48 c1 e2 06 48 29 ca 48 89 c1 48 83 c2

-------------- next part --------------
[20195712.090625] BUG: soft lockup - CPU#2 stuck for 22s! [ovs-vswitchd:8558]
[20195712.104800] CPU: 2 PID: 8558 Comm: ovs-vswitchd Tainted: G        W    3.14.39-031439-generic #201504211206
[20195712.104805] Hardware name: Supermicro X9SRH-7F/7TF/X9SRH-7F/7TF, BIOS 3.00 07/05/2013
[20195712.104810] task: ffff88017bd31910 ti: ffff8801b92a2000 task.ti: ffff8801b92a2000
[20195712.104814] RIP: 0010:[<ffffffffa04f7369>]  [<ffffffffa04f7369>] nf_conntrack_tuple_taken+0x99/0x1b0 [nf_conntrack]
[20195712.104834] RSP: 0018:ffff88207fc437d8  EFLAGS: 00000246
[20195712.104838] RAX: ffff8801167e6c70 RBX: 000000000770c44e RCX: 00000000b612c443
[20195712.104842] RDX: 0000000000000001 RSI: 00000000fa8f0415 RDI: ffff88207fc43810
[20195712.104845] RBP: ffff88207fc437f8 R08: 00000000fea3ad6e R09: 000000007361453d
[20195712.104849] R10: ffff88207fc43828 R11: ffff88207fc43978 R12: ffff88207fc43748
[20195712.104852] R13: ffffffff8176d81d R14: ffff88207fc437f8 R15: ffff88207fc43810
[20195712.104857] FS:  00007fa272770980(0000) GS:ffff88207fc40000(0000) knlGS:0000000000000000
[20195712.104861] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[20195712.104864] CR2: 000012b6d8042000 CR3: 0000001f38f8c000 CR4: 00000000001427e0
[20195712.104868] DR0: 0000000000000003 DR1: 00000000000000b0 DR2: 0000000000000001
[20195712.104871] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[20195712.104874] Stack:
[20195712.104877]  ffff88020de67998 ffff88207fc43a00 0000000000007c5b 000000000000fc00
[20195712.104887]  ffff88207fc43848 ffffffffa051c659 0000000000000000 000000009f66459e
[20195712.104895]  0000000000000000 0166459e00025000 0000000000000000 0106f7ef00000000
[20195712.104903] Call Trace:
[20195712.104907]  <IRQ>
[20195712.104913]  [<ffffffffa051c659>] nf_nat_used_tuple+0x29/0x30 [nf_nat]
[20195712.104934]  [<ffffffffa051d623>] nf_nat_l4proto_unique_tuple+0xf3/0x190 [nf_nat]
[20195712.104945]  [<ffffffffa051c659>] ? nf_nat_used_tuple+0x29/0x30 [nf_nat]
[20195712.104955]  [<ffffffff8101ec59>] ? sched_clock+0x9/0x10
[20195712.104965]  [<ffffffffa051d7f5>] tcp_unique_tuple+0x15/0x20 [nf_nat]
[20195712.104975]  [<ffffffffa051ce40>] get_unique_tuple+0x110/0x260 [nf_nat]
[20195712.104989]  [<ffffffffa051d017>] nf_nat_setup_info+0x87/0x360 [nf_nat]
[20195712.104998]  [<ffffffff81092215>] ? run_posix_cpu_timers+0x45/0x290
[20195712.105007]  [<ffffffffa0625104>] xt_snat_target_v0+0x34/0x40 [xt_nat]
[20195712.105017]  [<ffffffffa04de645>] ipt_do_table+0x335/0x550 [ip_tables]
[20195712.105031]  [<ffffffffa04f7459>] ? nf_conntrack_tuple_taken+0x189/0x1b0 [nf_conntrack]
[20195712.105042]  [<ffffffffa05290b8>] nf_nat_rule_find+0x28/0xc0 [iptable_nat]
[20195712.105051]  [<ffffffffa0529319>] nf_nat_ipv4_fn+0x1c9/0x280 [iptable_nat]
[20195712.105062]  [<ffffffff8168c5e0>] ? ip_finish_output.part.42+0x440/0x440
[20195712.105071]  [<ffffffffa0529504>] nf_nat_ipv4_out.part.6+0x14/0xd0 [iptable_nat]
[20195712.105079]  [<ffffffffa0529605>] nf_nat_ipv4_out+0x45/0x50 [iptable_nat]
[20195712.105089]  [<ffffffff816804ce>] nf_iterate+0x8e/0xd0
[20195712.105097]  [<ffffffff8168c5e0>] ? ip_finish_output.part.42+0x440/0x440
[20195712.105104]  [<ffffffff8168058d>] nf_hook_slow+0x7d/0x150
[20195712.105111]  [<ffffffff8168c5e0>] ? ip_finish_output.part.42+0x440/0x440
[20195712.105119]  [<ffffffff8168cf72>] ip_output+0x82/0x90
[20195712.105126]  [<ffffffff81688e02>] ip_forward_finish+0x102/0x130
[20195712.105132]  [<ffffffff816891b9>] ip_forward+0x389/0x440
[20195712.105139]  [<ffffffff81686e51>] ip_rcv_finish+0x121/0x380
[20195712.105145]  [<ffffffff81687726>] ip_rcv+0x286/0x380
[20195712.105168]  [<ffffffffa015683d>] ? ixgbe_alloc_rx_buffers+0x7d/0xd0 [ixgbe]
[20195712.105177]  [<ffffffff8164e002>] __netif_receive_skb_core+0x5e2/0x730
[20195712.105195]  [<ffffffffa0156a20>] ? ixgbe_clean_rx_irq+0x190/0x260 [ixgbe]
[20195712.105202]  [<ffffffff8164e171>] __netif_receive_skb+0x21/0x70
[20195712.105209]  [<ffffffff8164eb61>] process_backlog+0xb1/0x190
[20195712.105216]  [<ffffffff8164f559>] net_rx_action+0x139/0x250
[20195712.105224]  [<ffffffff8107000f>] __do_softirq+0xef/0x330
[20195712.105235]  [<ffffffff8176e4dc>] do_softirq_own_stack+0x1c/0x30
[20195712.105237]  <EOI>
[20195712.105240]  [<ffffffff81070305>] do_softirq+0x65/0x70
[20195712.105251]  [<ffffffff810703a1>] __local_bh_enable_ip+0x91/0xa0
[20195712.105260]  [<ffffffff817631b0>] _raw_spin_unlock_bh+0x20/0x40
[20195712.105268]  [<ffffffff8167bc0f>] netlink_poll+0x11f/0x1a0
[20195712.105276]  [<ffffffff81633da6>] sock_poll+0x116/0x130
[20195712.105285]  [<ffffffff811e6554>] do_poll.isra.7+0x144/0x380
[20195712.105293]  [<ffffffff811e76b9>] do_sys_poll+0x199/0x200
[20195712.105301]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105308]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105315]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105322]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105329]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105336]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105343]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105350]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105357]  [<ffffffff811e6280>] ? __pollwait+0xf0/0xf0
[20195712.105365]  [<ffffffff8101e4a9>] ? read_tsc+0x9/0x20
[20195712.105374]  [<ffffffff810d870c>] ? ktime_get_ts+0x4c/0xe0
[20195712.105382]  [<ffffffff811e6815>] ? poll_select_set_timeout+0x85/0xa0
[20195712.105388]  [<ffffffff8102468d>] ? syscall_trace_leave+0xdd/0x150
[20195712.105396]  [<ffffffff811e77fb>] SyS_poll+0x6b/0x100
[20195712.105403]  [<ffffffff8176ccdf>] tracesys+0xe1/0xe6
[20195712.105406] Code: 8b 00 a8 01 74 21 e9 ff 00 00 00 0f 1f 80 00 00 00 00 49 8b 95 e8 09 00 00 65 ff 02 48 8b 00 a8 01 0f 85 e3 00 00 00 0f b6 50 37 <48> 8d 0c d5 00 00 00 00 48 c1 e2 06 48 29 ca 48 89 c1 48 83 c2


More information about the discuss mailing list