[ovs-discuss] tcp stream is broken with userspace conntrack after VM migration

Yangxiaoliang (Neo) david.yangxiaoliang at huawei.com
Sat Feb 24 10:08:54 UTC 2018


Hi,

Recently we have met a TCP STREAM broken problem with userspace conntrack feature.  After reading the ovs maillist, I found a similar problem with mine:  https://mail.openvswitch.org/pipermail/ovs-discuss/2018-January/045958.html
Our test case is not absolutely same with his, but the result is almost the same (The TCP stream is stopped, and this TCP stream will close finally with timeout reason. But ping is ok. ). Since these is no any conclusion about this TCP conntrack issue, so I describe my problem here again to check that if anyone have any idea.

Our test case is described as below:
We are testing VM migration based on ovs(2.7.3)+dpdk(16.11).  There an iperf3 TCP_STREAM test (speed >= 8Gbps) that running between two VMs.
 (VM_client ¨C host1)   <- iperf3 TCP STREAM test ->  (VM_server ¨C host2)

During the TCP_STREAM  test is running, we migrate the iperf3 client VM over and over again (migrating the iperf3 server VM doesn¡¯t reproduce this problem) .

I capture the packets on client and server, then found that  the client doesn¡¯t receive the TCP ACK packet from the server.  The client retransmit  TCP packets over and over again, and the server replies ACK packet but the ACK packet was dropped in server side.

We read the TCP conntrack code and add some debugging log. Then we found that below branchs in tcp_conn_update  will not be true.

--branch 1:
if (SEQ_GEQ(src->seqhi, end)
    /* Last octet inside other's window space */
    && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))
    /* Retrans: not more than one window back */
    && (ackskew >= -MAXACKWINDOW)
    /* Acking not more than one reassembled fragment backwards */
    && (ackskew <= (MAXACKWINDOW << sws))
    /* Acking not more than one window forward */
    && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo
        || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))) {
    /* Require an exact/+1 sequence match on resets when possible */

--branch 2:
} else if ((dst->state < CT_DPIF_TCPS_SYN_SENT
            || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2
            || src->state >= CT_DPIF_TCPS_FIN_WAIT_2)
           && SEQ_GEQ(src->seqhi + MAXACKWINDOW, end)
           /* Within a window forward of the originating packet */
           && SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW)) {
           /* Within a window backward of the originating packet */

--Our debugging log is added as below:
} else {
    // debugging log here
    return CT_UPDATE_INVALID;
}

--Debugging log sample
2018-02-11T16:15:24.902940+08:00|err|ovs-vswitchd[16113]|tcp_conn_update[399]|00056|conntrack_tcp(pmd8)|:  ======
2018-02-11T16:15:24.903168+08:00|err|ovs-vswitchd[16113]|tcp_conn_update[405]|00057|conntrack_tcp(pmd8)|: reply=0, (tcp header: tcp_src=59138, tcp_dst=5201, tcp_seq=557256579, tcp_ack=1919922981, tcp_ctl=0x00008010, tcp_winsz=229, tcp_csum=21303, tcp_urg=0), (CON SRC: src->state=2, src->seqlo=557324635, src->seqhi=557258028, src->max_win=229, src->wscale=64), (CON DST: dst->state=0, dst->seqlo=0, dst->seqhi=1923674917, dst->max_win=1, dst->wscale=64)
2018-02-11T16:15:24.903357+08:00|err|ovs-vswitchd[16113]|tcp_conn_update[410]|00058|conntrack_tcp(pmd8)|: +3 Check branch 1: SEQ_GEQ(src->seqhi, end)=** 1 **, SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))=** 0 **, (ackskew <= (MAXACKWINDOW << sws))=** 1 **, ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo|| (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))=** 1 **
2018-02-11T16:15:24.903543+08:00|err|ovs-vswitchd[16113]|tcp_conn_update[414]|00059|conntrack_tcp(pmd8)|: +3 1st branch detail: (src->seqhi=557258028, end=557258027), (seq=557256579, src->seqlo - (dst->max_win << dws)=557308251), (ackskew=-1919922981, -MAXACKWINDOW=-67035), (ackskew=-1919922981, (MAXACKWINDOW << sws)=1098301440), (tcp_flags=0x00000010, orig_seq=557256579, src->seqlo=557324635), sws=14, dws=14, orig_seq=557256579
2018-02-11T16:15:24.903729+08:00|err|ovs-vswitchd[16113]|tcp_conn_update[419]|00060|conntrack_tcp(pmd8)|: +3 Check branch 2: (dst->state < CT_DPIF_TCPS_SYN_SENT || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2 || src->state >= CT_DPIF_TCPS_FIN_WAIT_2)=** 1 **, SEQ_GEQ(src->seqhi + MAXACKWINDOW, end)=** 1 **, SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW)=** 0 **
2018-02-11T16:15:24.903936+08:00|err|ovs-vswitchd[16113]|tcp_conn_update[423]|00061|conntrack_tcp(pmd8)|: +3 2nd branch detail: (dst->state=0, src->state=2, CT_DPIF_TCPS_SYN_SENT=2, CT_DPIF_TCPS_FIN_WAIT_2=9), (src->seqhi + MAXACKWINDOW=557325063, end=557258027), (seq=557256579, src->seqlo - MAXACKWINDOW=557257600)
2018-02-11T16:15:12.109421+08:00|err|ovs-vswitchd[16113]|tcp_conn_update[399]|00007|conntrack_tcp(handler12)|:  ======
2018-02-11T16:15:12.109627+08:00|err|ovs-vswitchd[16113]|tcp_conn_update[405]|00008|conntrack_tcp(handler12)|: reply=0, (tcp header: tcp_src=59138, tcp_dst=5201, tcp_seq=557326083, tcp_ack=1919922981, tcp_ctl=0x00008010, tcp_winsz=229, tcp_csum=2924, tcp_urg=0), (CON SRC: src->state=2, src->seqlo=557324635, src->seqhi=557258028, src->max_win=229, src->wscale=64), (CON DST: dst->state=0, dst->seqlo=0, dst->seqhi=1923674917, dst->max_win=1, dst->wscale=64)
2018-02-11T16:15:12.109818+08:00|err|ovs-vswitchd[16113]|tcp_conn_update[410]|00009|conntrack_tcp(handler12)|: +3 Check branch 1: SEQ_GEQ(src->seqhi, end)=** 0 **, SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))=** 1 **, (ackskew <= (MAXACKWINDOW << sws))=** 1 **, ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo|| (orig_seq == src->seqlo + 1) || (orig_seq + 1 == src->seqlo))=** 1 **
2018-02-11T16:15:12.110043+08:00|err|ovs-vswitchd[16113]|tcp_conn_update[414]|00010|conntrack_tcp(handler12)|: +3 1st branch detail: (src->seqhi=557258028, end=557327531), (seq=557326083, src->seqlo - (dst->max_win << dws)=557308251), (ackskew=-1919922981, -MAXACKWINDOW=-67035), (ackskew=-1919922981, (MAXACKWINDOW << sws)=1098301440), (tcp_flags=0x00000010, orig_seq=557326083, src->seqlo=557324635), sws=14, dws=14, orig_seq=557326083
2018-02-11T16:15:12.110234+08:00|err|ovs-vswitchd[16113]|tcp_conn_update[419]|00011|conntrack_tcp(handler12)|: +3 Check branch 2: (dst->state < CT_DPIF_TCPS_SYN_SENT || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2 || src->state >= CT_DPIF_TCPS_FIN_WAIT_2)=** 1 **, SEQ_GEQ(src->seqhi + MAXACKWINDOW, end)=** 0 **, SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW)=** 1 **
2018-02-11T16:15:12.110423+08:00|err|ovs-vswitchd[16113]|tcp_conn_update[423]|00012|conntrack_tcp(handler12)|: +3 2nd branch detail: (dst->state=0, src->state=2, CT_DPIF_TCPS_SYN_SENT=2, CT_DPIF_TCPS_FIN_WAIT_2=9), (src->seqhi + MAXACKWINDOW=557325063, end=557327531), (seq=557326083, src->seqlo - MAXACKWINDOW=557257600)

Checking the debugging log above, we can found one or more of below conditions may not be true:
SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW)=** 0 **
SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws))=** 0 **
SEQ_GEQ(src->seqhi, end)=** 0 **
SEQ_GEQ(src->seqhi + MAXACKWINDOW, end)=** 0 **

After the VM migration finished, there must be a big number of TCP packets will be lost.  The client retransmits the lost packets and the server sends the ACK reply packets. But it seems that the ACK reply packet number id is not in the expected range.
For example:
(tcp header: tcp_src=5201, tcp_dst=59138, tcp_seq=1919922981, tcp_ack=557387651, tcp_ctl=0x0000B010, tcp_winsz=6416, tcp_csum=2049, tcp_urg=0)
(Connection SRC: src->state=2, src->seqlo=1919922981, src->seqhi=1923674917, src->max_win=6416, src->wscale=64),
(Connection DST: dst->state=2, dst->seqlo=557256579, dst->seqhi=557258028, dst->max_win=229, dst->wscale=64)
tcp_ack=557387651 is not in range (557256579, 557258028), the connection tracking treats it as invalid. I think maybe the valid range for TCP packets is not big enough.

We modify the below checking branch conditions
£¨1£©&& (ackskew >= -MAXACKWINDOW)   =>   && (ackskew >= -MAXACKWINDOW*128)    // 128 is an arbitrary value to test
£¨2£©&& SEQ_GEQ(src->seqhi + MAXACKWINDOW, end)  => && SEQ_GEQ(src->seqhi + MAXACKWINDOW*128, end)
£¨3£©&& SEQ_GEQ(seq, src->seqlo ¨C MAXACKWINDOW)  => && SEQ_GEQ(seq, src->seqlo ¨C MAXACKWINDOW*128)

With above modifications in ovs, we run the same test case for more than 24 hours and the problem doesn¡¯t occur again.
I don¡¯t know why the server sent the ACK packet that number id is not in expected range. Maybe there is something wrong with connection tracking? Is it a good idea to enlarge the MAXACKWINDOW in above three conditions ?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20180224/2ae7e3ca/attachment-0001.html>


More information about the discuss mailing list