<div dir="ltr"><div dir="ltr">Hi Flavio,<div>As per your inputs, I modified the gso_size, and now skb_gso_validate_mtu(skb, mtu) is returning true, and</div><div>ip_finish_output2(sk, skb)  and dst_neigh_output(dst, neigh, skb); are getting called. But still, I am seeing the large packets getting dropped somewhere in the kernel</div><div>down the line and retransmission happening.  </div><div><div><span style="color:rgb(80,0,80)"><br></span></div><div><span style="color:rgb(80,0,80)">if (skb_gso_validate_mtu(skb, mtu))</span><br></div><span style="color:rgb(80,0,80)">             return ip_finish_output2(sk, skb);</span> </div><div> <br></div><div><div>[ 1854.905733] vxlan_xmit:2262 skb-&gt;len:2776 packet_length:2762<br>[ 1854.905744] skb_gso_size_check:4478 and seg_len:1500 and max_len:1500 and shinfo-&gt;gso_size:1398 and GSO_BY_FRAGS:65535<br></div><div></div></div><div><br></div><div>The gso_size 1398 bytes is correct in my case ( 1398 + 50 (vxlan header) + 20(IP) + TCP(32) + 14(ETH) = 1514 bytes)</div><div>The code is simple:</div><div>      vnet = buf;  // buf is an array of 64k bytes</div><div>      len = 0;</div><div>        if (csum) {</div><div>                        vnet-&gt;flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM);                                                                                                        </div><div>                         vnet-&gt;csum_start = (ETH_HLEN + sizeof(*iph));<br>                        vnet-&gt;csum_offset = (__builtin_offsetof(struct tcphdr, check));                                                 </div><div>                       }                                                                                                                                                                                                                                                                                                           if (gso) {<br>                        vnet-&gt;hdr_len = (ETH_HLEN + sizeof(*iph) + sizeof(*tcph));<br>                        vnet-&gt;gso_type = VIRTIO_NET_HDR_GSO_TCPV4;                                                                                                         </div><div>                        vnet-&gt;gso_size = (

ETH_DATA_LEN  - 50 - sizeof(struct iphdr) -<br>                                                        sizeof(struct tcphdr));  // 50 is the vxlan header<br>                } else {<br>                        vnet-&gt;gso_type = VIRTIO_NET_HDR_GSO_NONE;</div><div>                        

vnet-&gt;gso_size =  0;

</div><div>                }  </div><div>             len =sizeof(*vnet);                                                                                                                                       <br>             // Now copying the entire L2  packet into the buf starting at an offset buf + len and sending the packet.</div><div><br></div><div> Did I miss something? And I am not sure how OVS behaves after receiving this packet and before transmitting to vxlan.</div><div>How is checksum offloading happening with af_packet in OVS?  Does OVS have any role in this?          </div><div>                                                                                                                                </div><div>Please see the attached image for reference. The packet flow with in the host is given below:<br></div><div><br></div><div>Ubuntu container (eth0 (1500MTU))----------routing lookup--------------&gt;Ubuntu container(veth0(1450 MTU)) -&gt;OVS(veth1(1450MTU))-&gt;vxlan(65K MTU)-&gt;eth0(physical interface(1500MTU))-&gt;other machine. <br></div><div><br></div><div>Looking forward to your reply. </div><div>Regards,</div><div>Ramana</div><div></div><div><span style="color:rgb(80,0,80)">     </span></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 4, 2019 at 10:41 PM Ramana Reddy &lt;<a href="mailto:gtvrreddy@gmail.com" target="_blank">gtvrreddy@gmail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Thanks, Flavio. I will check it out tomorrow and let you know how it goes.<div><br></div><div>Regards,</div><div>Ramana<br><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 4, 2019 at 10:15 PM Flavio Leitner &lt;<a href="mailto:fbl@sysclose.org" target="_blank">fbl@sysclose.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, 4 Nov 2019 21:32:28 +0530<br>
Ramana Reddy &lt;<a href="mailto:gtvrreddy@gmail.com" target="_blank">gtvrreddy@gmail.com</a>&gt; wrote:<br>
<br>
&gt; Hi Favio Leitner,<br>
&gt; Thank you very much for your reply. Here is the code snippet. But the<br>
&gt; same code is working if I send the packet without ovs.<br>
<br>
Could you provide more details on the OvS environment and the test?<br>
<br>
The linux kernel propagates the header size dependencies when you stack<br>
the devices in net_device-&gt;hard_header_len, so in the case of vxlan dev<br>
it will be:<br>
<br>
needed_headroom = lowerdev-&gt;hard_header_len;<br>
needed_headroom += VXLAN_HEADROOM;<br>
dev-&gt;needed_headroom = needed_headroom;<br>
<br>
Sounds like that is helping when OvS is not being used.<br>
<br>
fbl<br>
<br>
<br>
&gt; bool csum = true;<br>
&gt; bool gso = true&#39;<br>
&gt;  struct virtio_net_hdr *vnet = buf;<br>
&gt;                if (csum) {<br>
&gt;                         vnet-&gt;flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM);<br>
&gt;                         vnet-&gt;csum_start = ETH_HLEN + sizeof(*iph);<br>
&gt;                         vnet-&gt;csum_offset = __builtin_offsetof(struct<br>
&gt; tcphdr, check);<br>
&gt;                 }<br>
&gt; <br>
&gt;                 if (gso) {<br>
&gt;                         vnet-&gt;hdr_len = ETH_HLEN + sizeof(*iph) +<br>
&gt; sizeof(*tcph);<br>
&gt;                         vnet-&gt;gso_type = VIRTIO_NET_HDR_GSO_TCPV4;<br>
&gt;                         vnet-&gt;gso_size = ETH_DATA_LEN - sizeof(struct<br>
&gt; iphdr) -<br>
&gt;                                                         sizeof(struct<br>
&gt; tcphdr);<br>
&gt;                 } else {<br>
&gt;                         vnet-&gt;gso_type = VIRTIO_NET_HDR_GSO_NONE;<br>
&gt;                 }<br>
&gt; Regards,<br>
&gt; Ramana<br>
&gt; <br>
&gt; <br>
&gt; On Mon, Nov 4, 2019 at 8:39 PM Flavio Leitner &lt;<a href="mailto:fbl@sysclose.org" target="_blank">fbl@sysclose.org</a>&gt;<br>
&gt; wrote:<br>
&gt; <br>
&gt; &gt;<br>
&gt; &gt; Hi,<br>
&gt; &gt;<br>
&gt; &gt; What&#39;s the value you&#39;re passing on gso_size in struct<br>
&gt; &gt; virtio_net_hdr? You need to leave room for the encapsulation<br>
&gt; &gt; header, e.g.:<br>
&gt; &gt;<br>
&gt; &gt; gso_size = iface_mtu - virtio_net_hdr-&gt;hdr_len<br>
&gt; &gt;<br>
&gt; &gt; fbl<br>
&gt; &gt;<br>
&gt; &gt; On Mon, 4 Nov 2019 01:11:36 +0530<br>
&gt; &gt; Ramana Reddy &lt;<a href="mailto:gtvrreddy@gmail.com" target="_blank">gtvrreddy@gmail.com</a>&gt; wrote:<br>
&gt; &gt;  <br>
&gt; &gt; &gt; Hi,<br>
&gt; &gt; &gt; I am wondering if anyone can help me with this. I am having<br>
&gt; &gt; &gt; trouble to send tso/gso packet<br>
&gt; &gt; &gt; with af_packet socket with packet_vnet_hdr (through<br>
&gt; &gt; &gt; virtio_net_hdr) over vxlan tunnel in OVS.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; What I observed that, the following function eventually hitting<br>
&gt; &gt; &gt; and is returning false (net/core/skbuff.c), hence the packet is<br>
&gt; &gt; &gt; dropping. static inline bool skb_gso_size_check(const struct<br>
&gt; &gt; &gt; sk_buff *skb, unsigned int seg_len,<br>
&gt; &gt; &gt;                                       unsigned int max_len) {<br>
&gt; &gt; &gt;         const struct skb_shared_info *shinfo = skb_shinfo(skb);<br>
&gt; &gt; &gt;         const struct sk_buff *iter;<br>
&gt; &gt; &gt;         if (shinfo-&gt;gso_size != GSO_BY_FRAGS)<br>
&gt; &gt; &gt;                 return seg_len &lt;= max_len;<br>
&gt; &gt; &gt;         ..........<br>
&gt; &gt; &gt; }<br>
&gt; &gt; &gt; [  678.756673] ip_finish_output_gso:235 packet_length:2762 (here<br>
&gt; &gt; &gt; packet_length = skb-&gt;len - skb_inner_network_offset(skb))<br>
&gt; &gt; &gt; [  678.756678] ip_fragment:510 packet length:1500<br>
&gt; &gt; &gt; [  678.756715] ip_fragment:510 packet length:1314<br>
&gt; &gt; &gt; [  678.956889] skb_gso_size_check:4474 and seg_len:1550 and<br>
&gt; &gt; &gt; max_len:1500 and shinfo-&gt;gso_size:1448 and GSO_BY_FRAGS:65535<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Observation:<br>
&gt; &gt; &gt; When we send the large packet ( example here is<br>
&gt; &gt; &gt; packet_length:2762), its showing the seg_len(1550) &gt;<br>
&gt; &gt; &gt; max_len(1500). Hence return seg_len &lt;= max_len statement<br>
&gt; &gt; &gt; returning false. Because of this, ip_fragment calling<br>
&gt; &gt; &gt; icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));<br>
&gt; &gt; &gt; rather the code reaching to ip_finish_output2(sk, skb)<br>
&gt; &gt; &gt; function in net/ipv4/ip_output.c and is given below:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; static int ip_finish_output_gso(struct sock *sk, struct sk_buff<br>
&gt; &gt; &gt; *skb, unsigned int mtu)<br>
&gt; &gt; &gt; {<br>
&gt; &gt; &gt;         netdev_features_t features;<br>
&gt; &gt; &gt;         struct sk_buff *segs;<br>
&gt; &gt; &gt;         int ret = 0;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;         /* common case: seglen is &lt;= mtu */<br>
&gt; &gt; &gt;         if (skb_gso_validate_mtu(skb, mtu))<br>
&gt; &gt; &gt;                 return ip_finish_output2(sk, skb);<br>
&gt; &gt; &gt;        ...........<br>
&gt; &gt; &gt;       err = ip_fragment(sk, segs, mtu, ip_finish_output2);<br>
&gt; &gt; &gt;       ...........<br>
&gt; &gt; &gt;  }<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; But when we send normal iperf traffic ( gso/tso  traffic) over<br>
&gt; &gt; &gt; vxlan, the skb_gso_size_check returning a true value, and<br>
&gt; &gt; &gt; ip_finish_output2 getting executed.<br>
&gt; &gt; &gt; Here is the values of normal iperf traffic over vxlan.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; [ 1041.400537] skb_gso_size_check:4477 and seg_len:1500 and<br>
&gt; &gt; &gt; max_len:1500 and shinfo-&gt;gso_size:1398 and GSO_BY_FRAGS:65535<br>
&gt; &gt; &gt; [ 1041.400587] skb_gso_size_check:4477 and seg_len:1450 and<br>
&gt; &gt; &gt; max_len:1450 and shinfo-&gt;gso_size:1398 and GSO_BY_FRAGS:65535<br>
&gt; &gt; &gt; [ 1041.400594] skb_gso_size_check:4477 and seg_len:1500 and<br>
&gt; &gt; &gt; max_len:1500 and shinfo-&gt;gso_size:1398 and GSO_BY_FRAGS:65535<br>
&gt; &gt; &gt; [ 1041.400732] skb_gso_size_check:4477 and seg_len:1450 and<br>
&gt; &gt; &gt; max_len:1450 and shinfo-&gt;gso_size:1398 and GSO_BY_FRAGS:65535<br>
&gt; &gt; &gt; [ 1041.400741] skb_gso_size_check:4477 and seg_len:1450 and<br>
&gt; &gt; &gt; max_len:1450 and shinfo-&gt;gso_size:1398 and GSO_BY_FRAGS:65535<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Can someone help me to solve what is missing, and where should I<br>
&gt; &gt; &gt; modify the code in OVS/ or outside of ovs, so that it works as<br>
&gt; &gt; &gt; expected.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Thanks in advance.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Some more info:<br>
&gt; &gt; &gt; [root@xx ~]# uname -r<br>
&gt; &gt; &gt; 3.10.0-1062.4.1.el7.x86_64<br>
&gt; &gt; &gt; [root@xx ~]# cat /etc/redhat-release<br>
&gt; &gt; &gt; Red Hat Enterprise Linux Server release 7.7 (Maipo)<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; [root@xx]# ovs-vsctl --version<br>
&gt; &gt; &gt; ovs-vsctl (Open vSwitch) 2.9.0<br>
&gt; &gt; &gt; DB Schema 7.15.1<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; And dump_stack output with af_packet:<br>
&gt; &gt; &gt; [ 4833.637460]  &lt;IRQ&gt;  [&lt;ffffffff81979612&gt;] dump_stack+0x19/0x1b<br>
&gt; &gt; &gt; [ 4833.637474]  [&lt;ffffffff8197c3ca&gt;]<br>
&gt; &gt; &gt; ip_fragment.constprop.55+0xc3/0x141 [ 4833.637481]<br>
&gt; &gt; &gt; [&lt;ffffffff8189dd84&gt;] ip_finish_output+0x314/0x350 [ 4833.637484]<br>
&gt; &gt; &gt; [&lt;ffffffff8189eb83&gt;] ip_output+0xb3/0x130 [ 4833.637490]<br>
&gt; &gt; &gt; [&lt;ffffffff8189da70&gt;] ? ip_do_fragment+0x910/0x910 [ 4833.637493]<br>
&gt; &gt; &gt; [&lt;ffffffff8189cac9&gt;] ip_local_out_sk+0xf9/0x180 [ 4833.637497]<br>
&gt; &gt; &gt; [&lt;ffffffff818e6f6c&gt;] iptunnel_xmit+0x18c/0x220 [ 4833.637505]<br>
&gt; &gt; &gt; [&lt;ffffffffc073b2e7&gt;] udp_tunnel_xmit_skb+0x117/0x130 [udp_tunnel]<br>
&gt; &gt; &gt; [ 4833.637538]  [&lt;ffffffffc074585a&gt;] vxlan_xmit_one+0xb6a/0xb70<br>
&gt; &gt; &gt; [vxlan] [ 4833.637545]  [&lt;ffffffff8129dad9&gt;] ?<br>
&gt; &gt; &gt; vprintk_default+0x29/0x40 [ 4833.637551]  [&lt;ffffffffc074765e&gt;]<br>
&gt; &gt; &gt; vxlan_xmit+0xc9e/0xef0 [vxlan] [ 4833.637555]<br>
&gt; &gt; &gt; [&lt;ffffffff818356e7&gt;] ? kfree_skbmem+0x37/0x90 [ 4833.637559]<br>
&gt; &gt; &gt; [&lt;ffffffff81836c24&gt;] ? consume_skb+0x34/0x90 [ 4833.637564]<br>
&gt; &gt; &gt; [&lt;ffffffff819547bc&gt;] ? packet_rcv+0x4c/0x3e0 [ 4833.637570]<br>
&gt; &gt; &gt; [&lt;ffffffff8184d346&gt;] dev_hard_start_xmit+0x246/0x3b0 [<br>
&gt; &gt; &gt; 4833.637574]  [&lt;ffffffff81850339&gt;] __dev_queue_xmit+0x519/0x650 [<br>
&gt; &gt; &gt; 4833.637580]  [&lt;ffffffff812d9df0&gt;] ? try_to_wake_up+0x190/0x390 [<br>
&gt; &gt; &gt; 4833.637585]  [&lt;ffffffff81850480&gt;] dev_queue_xmit+0x10/0x20 [<br>
&gt; &gt; &gt; 4833.637592]  [&lt;ffffffffc0724316&gt;] ovs_vport_send+0xa6/0x180<br>
&gt; &gt; &gt; [openvswitch] [ 4833.637599] [&lt;ffffffffc07150fe&gt;]<br>
&gt; &gt; &gt; do_output+0x4e/0xd0 [openvswitch] [ 4833.637604]<br>
&gt; &gt; &gt; [&lt;ffffffffc0716699&gt;] do_execute_actions+0xa29/0xa40 [openvswitch]<br>
&gt; &gt; &gt; [ 4833.637610]  [&lt;ffffffff812d24d2&gt;] ?<br>
&gt; &gt; &gt; __wake_up_common+0x82/0x120 [ 4833.637615]  [&lt;ffffffffc0716aac&gt;]<br>
&gt; &gt; &gt; ovs_execute_actions+0x4c/0x140 [openvswitch] [ 4833.637621]<br>
&gt; &gt; &gt; [&lt;ffffffffc071a824&gt;] ovs_dp_process_packet+0x84/0x120<br>
&gt; &gt; &gt; [openvswitch] [ 4833.637627]  [&lt;ffffffffc0725404&gt;] ?<br>
&gt; &gt; &gt; ovs_ct_update_key+0xc4/0x150 [openvswitch]<br>
&gt; &gt; &gt; [ 4833.637633]  [&lt;ffffffffc0724213&gt;] ovs_vport_receive+0x73/0xd0<br>
&gt; &gt; &gt; [openvswitch]<br>
&gt; &gt; &gt; [ 4833.637638]  [&lt;ffffffff812d666f&gt;] ? ttwu_do_activate+0x6f/0x80<br>
&gt; &gt; &gt; [ 4833.637642]  [&lt;ffffffff812d9df0&gt;] ? try_to_wake_up+0x190/0x390<br>
&gt; &gt; &gt; [ 4833.637646]  [&lt;ffffffff812da0c2&gt;] ?<br>
&gt; &gt; &gt; default_wake_function+0x12/0x20 [ 4833.637651]<br>
&gt; &gt; &gt; [&lt;ffffffff812c61eb&gt;] ? autoremove_wake_function+0x2b/0x40 [<br>
&gt; &gt; &gt; 4833.637657] [&lt;ffffffff812d24d2&gt;] ? __wake_up_common+0x82/0x120 [<br>
&gt; &gt; &gt; 4833.637661] [&lt;ffffffff812e3ae9&gt;] ? update_cfs_shares+0xa9/0xf0 [<br>
&gt; &gt; &gt; 4833.637665] [&lt;ffffffff812e3696&gt;] ? update_curr+0x86/0x1e0 [<br>
&gt; &gt; &gt; 4833.637669] [&lt;ffffffff812dee88&gt;] ? __enqueue_entity+0x78/0x80 [<br>
&gt; &gt; &gt; 4833.637677] [&lt;ffffffffc0724cbe&gt;] netdev_frame_hook+0xde/0x180<br>
&gt; &gt; &gt; [openvswitch] [ 4833.637682]  [&lt;ffffffff8184d6aa&gt;]<br>
&gt; &gt; &gt; __netif_receive_skb_core+0x1fa/0xa10 [ 4833.637688]<br>
&gt; &gt; &gt; [&lt;ffffffffc0724be0&gt;] ? vport_netdev_free+0x30/0x30 [openvswitch]<br>
&gt; &gt; &gt; [ 4833.637692]  [&lt;ffffffff812d6539&gt;] ? ttwu_do_wakeup+0x19/0xe0<br>
&gt; &gt; &gt; [ 4833.637697]  [&lt;ffffffff8184ded8&gt;] __netif_receive_skb+0x18/0x60<br>
&gt; &gt; &gt; [ 4833.637703]  [&lt;ffffffff8184ee9e&gt;] process_backlog+0xae/0x180<br>
&gt; &gt; &gt; [ 4833.637707]  [&lt;ffffffff8184e57f&gt;] net_rx_action+0x26f/0x390<br>
&gt; &gt; &gt; [ 4833.637713]  [&lt;ffffffff812a41e5&gt;] __do_softirq+0xf5/0x280<br>
&gt; &gt; &gt; [ 4833.637719]  [&lt;ffffffff8199042c&gt;] call_softirq+0x1c/0x30<br>
&gt; &gt; &gt; [ 4833.637723]  &lt;EOI&gt;  [&lt;ffffffff8122f675&gt;] do_softirq+0x65/0xa0<br>
&gt; &gt; &gt; [ 4833.637730]  [&lt;ffffffff812a363b&gt;]<br>
&gt; &gt; &gt; __local_bh_enable_ip+0x9b/0xb0 [ 4833.637735]<br>
&gt; &gt; &gt; [&lt;ffffffff812a3667&gt;] local_bh_enable+0x17/0x20 [ 4833.637741]<br>
&gt; &gt; &gt; [&lt;ffffffff81850065&gt;] __dev_queue_xmit+0x245/0x650 [ 4833.637746]<br>
&gt; &gt; &gt; [&lt;ffffffff81972e28&gt;] ? printk+0x60/0x77 [ 4833.637752]<br>
&gt; &gt; &gt; [&lt;ffffffff81850480&gt;] dev_queue_xmit+0x10/0x20 [ 4833.637757]<br>
&gt; &gt; &gt; [&lt;ffffffff81957a75&gt;] packet_sendmsg+0xf65/0x1210 [ 4833.637761]<br>
&gt; &gt; &gt; [&lt;ffffffff813d7524&gt;] ? shmem_fault+0x84/0x1f0 [ 4833.637768]<br>
&gt; &gt; &gt; [&lt;ffffffff8182d3a6&gt;] sock_sendmsg+0xb6/0xf0 [ 4833.637772]<br>
&gt; &gt; &gt; [&lt;ffffffff812e3696&gt;] ? update_curr+0x86/0x1e0 [ 4833.637777]<br>
&gt; &gt; &gt; [&lt;ffffffff812e3ae9&gt;] ? update_cfs_shares+0xa9/0xf0 [ 4833.637781]<br>
&gt; &gt; &gt;  [&lt;ffffffff8122b621&gt;] ? __switch_to+0x151/0x580 [ 4833.637786]<br>
&gt; &gt; &gt; [&lt;ffffffff8182dad1&gt;] SYSC_sendto+0x121/0x1c0 [ 4833.637793]<br>
&gt; &gt; &gt; [&lt;ffffffff812c8d10&gt;] ? hrtimer_get_res+0x50/0x50 [ 4833.637797]<br>
&gt; &gt; &gt; [&lt;ffffffff8197e54b&gt;] ? do_nanosleep+0x5b/0x100 [ 4833.637802]<br>
&gt; &gt; &gt; [&lt;ffffffff8182f5ee&gt;] SyS_sendto+0xe/0x10 [ 4833.637806]<br>
&gt; &gt; &gt; [&lt;ffffffff8198cede&gt;] system_call_fastpath+0x25/0x2a<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Looking forward to your reply.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Regards,<br>
&gt; &gt; &gt; Ramana  <br>
&gt; &gt;<br>
&gt; &gt;  <br>
<br>
</blockquote></div>
</blockquote></div></div>