[ovs-discuss] conntrack: Another ct-clean thread crash bug
Darrell Ball
dball at vmware.com
Tue Sep 12 04:28:39 UTC 2017
On 9/8/17, 2:51 AM, "wangyunjian" <wangyunjian at huawei.com> wrote:
The operations of buckets->connections hmap should with a lock to protect between
ovs-vswitchd thread and pmd thread(or ct clean thread).But conn_clean() will release the lock.
This time, the hmap maybe change by other thread and the node->next maybe remove from hmap.
[Darrell] If I read your text correctly, then what you describe is intentional – there are multiple threads
and buckets are protected by locks, but, I don’t see a specific issue pointed to.
Also, this is not consistent with the stack trace below, assuming the stack trace is generated from
non-instrumented code and is reliable.
Can you provide initially :
1/ Your config – ovs-vsctl show, sudo ovs-appctl dpif/show, sudo ovs-ofctl dump-flows <bridge>,
ovs-appctl dpif/dump-flows br0
2/ I could not understand your simple diagram – can you make association of the diagram to
“ovs-vsctl show” output.
3/ Are you sending TCP traffic or UDP traffic ?
4/ What streams are you sending ?
What I mean by streams – what range of src/dst IP addresses and src/dst layer 4 port numbers
What physical or virtual ports are you sending the streams from.
process_one(){ // pmd thread
ct_lock_lock(ctb.lock);
conn_clean (){
ct_lock_unlock(ctb.lock);
...
ct_lock_lock(ctb.lock)
}
ct_lock_unlock(ctb.lock);
}
conntrack_flush() { //main thread
ct_lock_lock(ctb.lock);
nat_clean () {
ct_lock_unlock(ctb.lock);
...
ct_lock_lock(ctb.lock)
}
ct_lock_unlock(ctb.lock);
}
clean_thread_main() { // ct clean thread
ct_lock_lock(ctb.lock);
conn_clean (){
ct_lock_unlock(ctb.lock);
...
ct_lock_lock(ctb.lock)
}
ct_lock_unlock(ctb.lock);
}
> -----Original Message-----
> From: ovs-discuss-bounces at openvswitch.org [mailto:ovs-discuss-
> bounces at openvswitch.org] On Behalf Of Darrell Ball
> Sent: Wednesday, September 06, 2017 11:50 PM
> To: Huanglili (lee) <huanglili.huang at huawei.com>; ovs-
> discuss at openvswitch.org
> Cc: blp at nicira.com; caihe <caihe at huawei.com>; liucheng (J)
> <liucheng11 at huawei.com>
> Subject: Re: [ovs-discuss] conntrack: Another ct-clean thread crash bug
>
> Hmm, that seems odd.
> Also, the code change you propose below does not make sense and would
> likely cause similar crashes itself.
>
> Maybe, you explain what you are trying to do in your testing ?
> Can you say what traffic are you sending and from which ports ?
>
> I’ll take another look at the related code.
>
> Darrell
>
>
> On 9/6/17, 6:14 AM, "Huanglili (lee)" <huanglili.huang at huawei.com> wrote:
>
> Hi,
> We met another vswitchd crash when we use ct(nat) (ovs+dpdk).
>
> Program terminated with signal 11, Segmentation fault.
> #0 0x0000000000574a0b in hmap_remove (node=0x7f150c6e60a8,
> hmap=0x7f1553c40780) at lib/hmap.h:270
> while (*bucket != node) {
>
> (gdb) bt
> #0 0x0000000000574a0b in hmap_remove (node=0x7f150c6e60a8,
> hmap=0x7f1553c40780)
> #1 sweep_bucket (limit=1808, now=563303851, ctb=0x7f1553c40778,
> ct=0x7f1553c3f9a8)
> #2 conntrack_clean (now=563303851, ct=0x7f1553c3f9a8)
> #3 clean_thread_main (f_=0x7f1553c3f9a8)
>
> This crash can be triggered by using following flows, maybe the flows are
> not reasonable, but shouldn't trigger crash
> "table=0,priority=2,in_port=1 actions=resubmit(,2)
> table=0,priority=2,in_port=4 actions=resubmit(,2)
> table=0,priority=0 actions=drop
> table=0,priority=1 actions=resubmit(,10)
> table=1,priority=0 actions=resubmit(,14)
> table=2,priority=0 actions=resubmit(,4)
> table=4,priority=0 actions=resubmit(,14)
> table=10,priority=2,arp actions=resubmit(,12)
> table=10,priority=1,dl_src=90:E2:BA:69:CD:61 actions=resubmit(,1)
> table=10,priority=0 actions=drop
>
> table=12,priority=3,arp,dl_src=90:E2:BA:69:CD:61,arp_spa=194.168.100.1,arp
> _sha=90:E2:BA:69:CD:61 actions=resubmit(,1)
> table=12,priority=2,arp actions=drop
> table=14,priority=6,ip actions=ct(table=16,zone=1)
> table=14,priority=0 actions=resubmit(,20)
> table=14,priority=20,ip,ip_frag=yes,actions=resubmit(,18)
> table=16,priority=20,ct_state=+est+trk,ip actions=resubmit(,20)
> table=16,priority=15,ct_state=+rel+trk,ip actions=resubmit(,20)
> table=16,priority=10,ct_mark=0x80000000/0x80000000,udp
> actions=resubmit(,20)
> table=16,priority=5,ct_state=+new+trk,ip,in_port=3 actions=resubmit(,18)
> table=16,priority=5,ct_state=+new+trk,ip,in_port=4 actions=resubmit(,18)
> table=16,priority=5,ct_state=+new+trk,ip,in_port=2
> actions=ct(commit,zone=1,exec(load:0x1-
> >NXM_NX_CT_MARK[31])),output:4
> table=16,priority=5,ct_state=+new+trk,ip,in_port=1
> actions=ct(commit,zone=1,exec(load:0x1-
> >NXM_NX_CT_MARK[31])),output:3
> table=18,priority=0,in_port=3 actions=ct(zone=1,table=24)
> table=18,priority=0,in_port=2 actions=output:4
> table=18,priority=0,in_port=4,ip
> actions=ct(commit,zone=1,nat(dst=194.168.100.1)),2
> table=18,priority=0,in_port=1 actions=output:3
> table=20,priority=10,in_port=3,ip actions=ct(zone=1,table=22)
> table=20,priority=10,in_port=4,ip actions=ct(zone=1,table=23)
> table=20,priority=1 actions=ct(zone=1,table=18)
> table=22,priority=10,in_port=3 action=4
> table=23,priority=10,in_port=4 action=3
> table=24,priority=10,in_port=3 action=1"
>
> The networking:
> vm
> |
> br-ply - br-linux
> |
> br-int
>
> We find rev_conn is in the list of ctb->exp_lists[] sometimes.
> The following change will solve this problem, but we can't explain why
>
> $ git diff
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 419cb1d..d5141c4 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ nat_clean(struct conntrack *ct, struct conn *conn,
> if (rev_conn && (!nat_conn_key_node ||
> conn_key_cmp(&nat_conn_key_node->value,
> &rev_conn->rev_key))) {
> + ovs_list_remove(&rev_conn->exp_node);
> hmap_remove(&ct->buckets[bucket_rev_conn].connections,
> &rev_conn->node);
> free(rev_conn);
> @@ create_un_nat_conn(struct conntrack *ct, struct conn *conn_f
> or_un_nat_copy,
> nat_conn_keys_lookup(&ct->nat_conn_keys, &nc->key, ct-
> >hash_basis);
> if (nat_conn_key_node && !conn_key_cmp(&nat_conn_key_node-
> >value,
> &nc->rev_key) && !rev_conn) {
> -
> + ovs_list_init(&nc->exp_node);
> hmap_insert(&ct->buckets[un_nat_conn_bucket].connections,
> &nc->node, un_nat_hash);
>
> Any idea?
>
> Thanks.
>
> -------------------------------------------------------------------------------------------
> ---------------------------------
> On 8/24/17, 3:36 AM, "ovs-dev-bounces at openvswitch.org on behalf of
> huanglili" <ovs-dev-bounces at openvswitch.org on behalf of
> huanglili.huang at huawei.com> wrote:
>
> From: Lili Huang <huanglili.huang at huawei.com>
>
> Conn should be removed from the list before freed.
>
> This crash will be triggered when a established flow do ct(nat)
> again, like
> "ip,actions=ct(table=1)
> table=1,in_port=1,ip,actions=ct(commit,nat(dst=5.5.5.5)),2
> table=1,in_port=2,ip,ct_state=+est,actions=1
> table=1,in_port=1,ip,ct_state=+est,actions=2"
>
> Signed-off-by: Lili Huang <huanglili.huang at huawei.com>
> ---
> lib/conntrack.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 1c0e023..dd73e1a 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -779,6 +779,8 @@ conn_not_found(struct conntrack *ct, struct
> dp_packet *pkt,
> ct, nc, conn_for_un_nat_copy);
>
> if (!nat_res) {
> + ovs_list_remove(&nc->exp_node);
> + ctx->conn = NULL;
> goto nat_res_exhaustion;
> }
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddiscuss&d=DwIGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=lnI-qQIMAHIpa_MGp128v7L0S9aFBInTNa9yV1Xt8wQ&s=2kFJMq-RJMN-lFi1Aj7uOTx-0mMQl3rJXKVd-snu32Y&e=
More information about the discuss
mailing list