[ovs-dev] [PATCH] datapath: inet: frag: Always orphan skbs inside ip_defrag().

Joe Stringer joe at ovn.org
Fri Jan 29 19:01:56 UTC 2016


When the linux stack is an endpoint connected to OVS which is performing
IP fragmentation via conntrack actions, it's possible to hit a kernel
BUG. The following upstream commit fixes the issue inside ip_defrag().
For the backport, we provide this inside ip_defrag() for kernels that we
currently backport that function, and also provide just the bugfix for
newer kernels, so we can continue to use upstream functionality as much
as possible.

Upstream commit:
    Later parts of the stack (including fragmentation) expect that there is
    never a socket attached to frag in a frag_list, however this invariant
    was not enforced on all defrag paths. This could lead to the
    BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
    end of this commit message.

    While the call could be added to openvswitch to fix this particular
    error, the head and tail of the frags list are already orphaned
    indirectly inside ip_defrag(), so it seems like the remaining fragments
    should all be orphaned in all circumstances.

    kernel BUG at net/ipv4/ip_output.c:586!
    [...]
    Call Trace:
     <IRQ>
     [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch]
     [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch]
     [<ffffffff81667830>] ? dst_discard_out+0x20/0x20
     [<ffffffff81667810>] ? dst_ifdown+0x80/0x80
     [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
     [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210
     [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
     [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack]
     [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
     [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch]
     [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch]
     [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
     [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch]
     [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
     [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
     [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch]
     [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
     [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch]
     [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch]
     [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
     [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
     [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
     [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
     [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch]
     [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
     [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0
     [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0
     [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930
     [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930
     [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
     [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220
     [<ffffffff81661060>] dev_queue_xmit+0x10/0x20
     [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220
     [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590
     [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590
     [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590
     [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
     [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
     [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
     [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
     [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
     [<ffffffff816ab4c0>] ip_output+0x70/0x110
     [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
     [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
     [<ffffffff816abf89>] ip_send_skb+0x19/0x40
     [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
     [<ffffffff816df21a>] icmp_push_reply+0xea/0x120
     [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230
     [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50
     [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
     [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70
     [<ffffffff816dfa06>] icmp_echo+0x36/0x70
     [<ffffffff816e0d11>] icmp_rcv+0x271/0x450
     [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0
     [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0
     [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0
     [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560
     [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560
     [<ffffffff816a5453>] ip_rcv+0x283/0x3e0
     [<ffffffff810b6302>] ? match_held_lock+0x192/0x200
     [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40
     [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0
     [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230
     [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
     [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60
     [<ffffffff8165e678>] process_backlog+0x78/0x230
     [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230
     [<ffffffff8165e355>] net_rx_action+0x155/0x400
     [<ffffffff8106b48c>] __do_softirq+0xcc/0x420
     [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590
     [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30
     <EOI>
     [<ffffffff8106b88e>] do_softirq+0x4e/0x60
     [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0
     [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590
     [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0
     [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
     [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
     [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
     [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
     [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
     [<ffffffff816ab4c0>] ip_output+0x70/0x110
     [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
     [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
     [<ffffffff816abf89>] ip_send_skb+0x19/0x40
     [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
     [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30
     [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
     [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0
     [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
     [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0
     [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0
     [<ffffffff8163e398>] sock_sendmsg+0x38/0x50
     [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270
     [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320
     [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40
     [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460
     [<ffffffff81204886>] ? __fget_light+0x66/0x90
     [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80
     [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20
     [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f
    Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f
     66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
    RIP  [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0
     RSP <ffff88006d603170>

    Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
    Signed-off-by: Joe Stringer <joe at ovn.org>
    Signed-off-by: David S. Miller <davem at davemloft.net>

Upstream: 8282f27449bf ("inet: frag: Always orphan skbs inside ip_defrag()")
Signed-off-by: Joe Stringer <joe at ovn.org>
---
 datapath/linux/compat/include/net/ip.h | 13 +++++++++++++
 datapath/linux/compat/ip_fragment.c    |  1 +
 2 files changed, 14 insertions(+)

diff --git a/datapath/linux/compat/include/net/ip.h b/datapath/linux/compat/include/net/ip.h
index 63a3f0924e9d..2e33664147a1 100644
--- a/datapath/linux/compat/include/net/ip.h
+++ b/datapath/linux/compat/include/net/ip.h
@@ -122,6 +122,19 @@ int rpl_ip_defrag(struct sk_buff *skb, u32 user);
 int __init rpl_ipfrag_init(void);
 void rpl_ipfrag_fini(void);
 #else /* OVS_FRAGMENT_BACKPORT */
+
+/* We have no good way to detect the presence of upstream commit 8282f27449bf
+ * ("inet: frag: Always orphan skbs inside ip_defrag()"), but it should be
+ * always included in kernels 4.5+. */
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4,5,0)
+static inline int rpl_ip_defrag(struct sk_buff *skb, u32 user)
+{
+	skb_orphan(skb);
+	ip_defrag(skb, user);
+}
+#define ip_defrag rpl_ip_defrag
+#endif
+
 static inline int rpl_ipfrag_init(void) { return 0; }
 static inline void rpl_ipfrag_fini(void) { }
 #endif /* OVS_FRAGMENT_BACKPORT */
diff --git a/datapath/linux/compat/ip_fragment.c b/datapath/linux/compat/ip_fragment.c
index 418134928114..42514e42815f 100644
--- a/datapath/linux/compat/ip_fragment.c
+++ b/datapath/linux/compat/ip_fragment.c
@@ -682,6 +682,7 @@ int rpl_ip_defrag(struct sk_buff *skb, u32 user)
 	struct ipq *qp;
 
 	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMREQDS);
+	skb_orphan(skb);
 
 	/* Lookup (or create) queue header */
 	qp = ip_find(net, ip_hdr(skb), user, vif);
-- 
2.1.4




More information about the dev mailing list