<div dir="ltr"><div dir="ltr">Hi Flavio,<div>As per your inputs, I modified the gso_size, and now skb_gso_validate_mtu(skb, mtu) is returning true, and</div><div>ip_finish_output2(sk, skb) and dst_neigh_output(dst, neigh, skb); are getting called. But still, I am seeing the large packets getting dropped somewhere in the kernel</div><div>down the line and retransmission happening. </div><div><div><span style="color:rgb(80,0,80)"><br></span></div><div><span style="color:rgb(80,0,80)">if (skb_gso_validate_mtu(skb, mtu))</span><br></div><span style="color:rgb(80,0,80)"> return ip_finish_output2(sk, skb);</span> </div><div> <br></div><div><div>[ 1854.905733] vxlan_xmit:2262 skb->len:2776 packet_length:2762<br>[ 1854.905744] skb_gso_size_check:4478 and seg_len:1500 and max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535<br></div><div></div></div><div><br></div><div>The gso_size 1398 bytes is correct in my case ( 1398 + 50 (vxlan header) + 20(IP) + TCP(32) + 14(ETH) = 1514 bytes)</div><div>The code is simple:</div><div> vnet = buf; // buf is an array of 64k bytes</div><div> len = 0;</div><div> if (csum) {</div><div> vnet->flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM); </div><div> vnet->csum_start = (ETH_HLEN + sizeof(*iph));<br> vnet->csum_offset = (__builtin_offsetof(struct tcphdr, check)); </div><div> } if (gso) {<br> vnet->hdr_len = (ETH_HLEN + sizeof(*iph) + sizeof(*tcph));<br> vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4; </div><div> vnet->gso_size = (
ETH_DATA_LEN - 50 - sizeof(struct iphdr) -<br> sizeof(struct tcphdr)); // 50 is the vxlan header<br> } else {<br> vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE;</div><div>
vnet->gso_size = 0;
</div><div> } </div><div> len =sizeof(*vnet); <br> // Now copying the entire L2 packet into the buf starting at an offset buf + len and sending the packet.</div><div><br></div><div> Did I miss something? And I am not sure how OVS behaves after receiving this packet and before transmitting to vxlan.</div><div>How is checksum offloading happening with af_packet in OVS? Does OVS have any role in this? </div><div> </div><div>Please see the attached image for reference. The packet flow with in the host is given below:<br></div><div><br></div><div>Ubuntu container (eth0 (1500MTU))----------routing lookup-------------->Ubuntu container(veth0(1450 MTU)) ->OVS(veth1(1450MTU))->vxlan(65K MTU)->eth0(physical interface(1500MTU))->other machine. <br></div><div><br></div><div>Looking forward to your reply. </div><div>Regards,</div><div>Ramana</div><div></div><div><span style="color:rgb(80,0,80)"> </span></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 4, 2019 at 10:41 PM Ramana Reddy <<a href="mailto:gtvrreddy@gmail.com" target="_blank">gtvrreddy@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Thanks, Flavio. I will check it out tomorrow and let you know how it goes.<div><br></div><div>Regards,</div><div>Ramana<br><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 4, 2019 at 10:15 PM Flavio Leitner <<a href="mailto:fbl@sysclose.org" target="_blank">fbl@sysclose.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Mon, 4 Nov 2019 21:32:28 +0530<br>
Ramana Reddy <<a href="mailto:gtvrreddy@gmail.com" target="_blank">gtvrreddy@gmail.com</a>> wrote:<br>
<br>
> Hi Favio Leitner,<br>
> Thank you very much for your reply. Here is the code snippet. But the<br>
> same code is working if I send the packet without ovs.<br>
<br>
Could you provide more details on the OvS environment and the test?<br>
<br>
The linux kernel propagates the header size dependencies when you stack<br>
the devices in net_device->hard_header_len, so in the case of vxlan dev<br>
it will be:<br>
<br>
needed_headroom = lowerdev->hard_header_len;<br>
needed_headroom += VXLAN_HEADROOM;<br>
dev->needed_headroom = needed_headroom;<br>
<br>
Sounds like that is helping when OvS is not being used.<br>
<br>
fbl<br>
<br>
<br>
> bool csum = true;<br>
> bool gso = true'<br>
> struct virtio_net_hdr *vnet = buf;<br>
> if (csum) {<br>
> vnet->flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM);<br>
> vnet->csum_start = ETH_HLEN + sizeof(*iph);<br>
> vnet->csum_offset = __builtin_offsetof(struct<br>
> tcphdr, check);<br>
> }<br>
> <br>
> if (gso) {<br>
> vnet->hdr_len = ETH_HLEN + sizeof(*iph) +<br>
> sizeof(*tcph);<br>
> vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;<br>
> vnet->gso_size = ETH_DATA_LEN - sizeof(struct<br>
> iphdr) -<br>
> sizeof(struct<br>
> tcphdr);<br>
> } else {<br>
> vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE;<br>
> }<br>
> Regards,<br>
> Ramana<br>
> <br>
> <br>
> On Mon, Nov 4, 2019 at 8:39 PM Flavio Leitner <<a href="mailto:fbl@sysclose.org" target="_blank">fbl@sysclose.org</a>><br>
> wrote:<br>
> <br>
> ><br>
> > Hi,<br>
> ><br>
> > What's the value you're passing on gso_size in struct<br>
> > virtio_net_hdr? You need to leave room for the encapsulation<br>
> > header, e.g.:<br>
> ><br>
> > gso_size = iface_mtu - virtio_net_hdr->hdr_len<br>
> ><br>
> > fbl<br>
> ><br>
> > On Mon, 4 Nov 2019 01:11:36 +0530<br>
> > Ramana Reddy <<a href="mailto:gtvrreddy@gmail.com" target="_blank">gtvrreddy@gmail.com</a>> wrote:<br>
> > <br>
> > > Hi,<br>
> > > I am wondering if anyone can help me with this. I am having<br>
> > > trouble to send tso/gso packet<br>
> > > with af_packet socket with packet_vnet_hdr (through<br>
> > > virtio_net_hdr) over vxlan tunnel in OVS.<br>
> > ><br>
> > > What I observed that, the following function eventually hitting<br>
> > > and is returning false (net/core/skbuff.c), hence the packet is<br>
> > > dropping. static inline bool skb_gso_size_check(const struct<br>
> > > sk_buff *skb, unsigned int seg_len,<br>
> > > unsigned int max_len) {<br>
> > > const struct skb_shared_info *shinfo = skb_shinfo(skb);<br>
> > > const struct sk_buff *iter;<br>
> > > if (shinfo->gso_size != GSO_BY_FRAGS)<br>
> > > return seg_len <= max_len;<br>
> > > ..........<br>
> > > }<br>
> > > [ 678.756673] ip_finish_output_gso:235 packet_length:2762 (here<br>
> > > packet_length = skb->len - skb_inner_network_offset(skb))<br>
> > > [ 678.756678] ip_fragment:510 packet length:1500<br>
> > > [ 678.756715] ip_fragment:510 packet length:1314<br>
> > > [ 678.956889] skb_gso_size_check:4474 and seg_len:1550 and<br>
> > > max_len:1500 and shinfo->gso_size:1448 and GSO_BY_FRAGS:65535<br>
> > ><br>
> > > Observation:<br>
> > > When we send the large packet ( example here is<br>
> > > packet_length:2762), its showing the seg_len(1550) ><br>
> > > max_len(1500). Hence return seg_len <= max_len statement<br>
> > > returning false. Because of this, ip_fragment calling<br>
> > > icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));<br>
> > > rather the code reaching to ip_finish_output2(sk, skb)<br>
> > > function in net/ipv4/ip_output.c and is given below:<br>
> > ><br>
> > > static int ip_finish_output_gso(struct sock *sk, struct sk_buff<br>
> > > *skb, unsigned int mtu)<br>
> > > {<br>
> > > netdev_features_t features;<br>
> > > struct sk_buff *segs;<br>
> > > int ret = 0;<br>
> > ><br>
> > > /* common case: seglen is <= mtu */<br>
> > > if (skb_gso_validate_mtu(skb, mtu))<br>
> > > return ip_finish_output2(sk, skb);<br>
> > > ...........<br>
> > > err = ip_fragment(sk, segs, mtu, ip_finish_output2);<br>
> > > ...........<br>
> > > }<br>
> > ><br>
> > > But when we send normal iperf traffic ( gso/tso traffic) over<br>
> > > vxlan, the skb_gso_size_check returning a true value, and<br>
> > > ip_finish_output2 getting executed.<br>
> > > Here is the values of normal iperf traffic over vxlan.<br>
> > ><br>
> > > [ 1041.400537] skb_gso_size_check:4477 and seg_len:1500 and<br>
> > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535<br>
> > > [ 1041.400587] skb_gso_size_check:4477 and seg_len:1450 and<br>
> > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535<br>
> > > [ 1041.400594] skb_gso_size_check:4477 and seg_len:1500 and<br>
> > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535<br>
> > > [ 1041.400732] skb_gso_size_check:4477 and seg_len:1450 and<br>
> > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535<br>
> > > [ 1041.400741] skb_gso_size_check:4477 and seg_len:1450 and<br>
> > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535<br>
> > ><br>
> > > Can someone help me to solve what is missing, and where should I<br>
> > > modify the code in OVS/ or outside of ovs, so that it works as<br>
> > > expected.<br>
> > ><br>
> > > Thanks in advance.<br>
> > ><br>
> > > Some more info:<br>
> > > [root@xx ~]# uname -r<br>
> > > 3.10.0-1062.4.1.el7.x86_64<br>
> > > [root@xx ~]# cat /etc/redhat-release<br>
> > > Red Hat Enterprise Linux Server release 7.7 (Maipo)<br>
> > ><br>
> > > [root@xx]# ovs-vsctl --version<br>
> > > ovs-vsctl (Open vSwitch) 2.9.0<br>
> > > DB Schema 7.15.1<br>
> > ><br>
> > > And dump_stack output with af_packet:<br>
> > > [ 4833.637460] <IRQ> [<ffffffff81979612>] dump_stack+0x19/0x1b<br>
> > > [ 4833.637474] [<ffffffff8197c3ca>]<br>
> > > ip_fragment.constprop.55+0xc3/0x141 [ 4833.637481]<br>
> > > [<ffffffff8189dd84>] ip_finish_output+0x314/0x350 [ 4833.637484]<br>
> > > [<ffffffff8189eb83>] ip_output+0xb3/0x130 [ 4833.637490]<br>
> > > [<ffffffff8189da70>] ? ip_do_fragment+0x910/0x910 [ 4833.637493]<br>
> > > [<ffffffff8189cac9>] ip_local_out_sk+0xf9/0x180 [ 4833.637497]<br>
> > > [<ffffffff818e6f6c>] iptunnel_xmit+0x18c/0x220 [ 4833.637505]<br>
> > > [<ffffffffc073b2e7>] udp_tunnel_xmit_skb+0x117/0x130 [udp_tunnel]<br>
> > > [ 4833.637538] [<ffffffffc074585a>] vxlan_xmit_one+0xb6a/0xb70<br>
> > > [vxlan] [ 4833.637545] [<ffffffff8129dad9>] ?<br>
> > > vprintk_default+0x29/0x40 [ 4833.637551] [<ffffffffc074765e>]<br>
> > > vxlan_xmit+0xc9e/0xef0 [vxlan] [ 4833.637555]<br>
> > > [<ffffffff818356e7>] ? kfree_skbmem+0x37/0x90 [ 4833.637559]<br>
> > > [<ffffffff81836c24>] ? consume_skb+0x34/0x90 [ 4833.637564]<br>
> > > [<ffffffff819547bc>] ? packet_rcv+0x4c/0x3e0 [ 4833.637570]<br>
> > > [<ffffffff8184d346>] dev_hard_start_xmit+0x246/0x3b0 [<br>
> > > 4833.637574] [<ffffffff81850339>] __dev_queue_xmit+0x519/0x650 [<br>
> > > 4833.637580] [<ffffffff812d9df0>] ? try_to_wake_up+0x190/0x390 [<br>
> > > 4833.637585] [<ffffffff81850480>] dev_queue_xmit+0x10/0x20 [<br>
> > > 4833.637592] [<ffffffffc0724316>] ovs_vport_send+0xa6/0x180<br>
> > > [openvswitch] [ 4833.637599] [<ffffffffc07150fe>]<br>
> > > do_output+0x4e/0xd0 [openvswitch] [ 4833.637604]<br>
> > > [<ffffffffc0716699>] do_execute_actions+0xa29/0xa40 [openvswitch]<br>
> > > [ 4833.637610] [<ffffffff812d24d2>] ?<br>
> > > __wake_up_common+0x82/0x120 [ 4833.637615] [<ffffffffc0716aac>]<br>
> > > ovs_execute_actions+0x4c/0x140 [openvswitch] [ 4833.637621]<br>
> > > [<ffffffffc071a824>] ovs_dp_process_packet+0x84/0x120<br>
> > > [openvswitch] [ 4833.637627] [<ffffffffc0725404>] ?<br>
> > > ovs_ct_update_key+0xc4/0x150 [openvswitch]<br>
> > > [ 4833.637633] [<ffffffffc0724213>] ovs_vport_receive+0x73/0xd0<br>
> > > [openvswitch]<br>
> > > [ 4833.637638] [<ffffffff812d666f>] ? ttwu_do_activate+0x6f/0x80<br>
> > > [ 4833.637642] [<ffffffff812d9df0>] ? try_to_wake_up+0x190/0x390<br>
> > > [ 4833.637646] [<ffffffff812da0c2>] ?<br>
> > > default_wake_function+0x12/0x20 [ 4833.637651]<br>
> > > [<ffffffff812c61eb>] ? autoremove_wake_function+0x2b/0x40 [<br>
> > > 4833.637657] [<ffffffff812d24d2>] ? __wake_up_common+0x82/0x120 [<br>
> > > 4833.637661] [<ffffffff812e3ae9>] ? update_cfs_shares+0xa9/0xf0 [<br>
> > > 4833.637665] [<ffffffff812e3696>] ? update_curr+0x86/0x1e0 [<br>
> > > 4833.637669] [<ffffffff812dee88>] ? __enqueue_entity+0x78/0x80 [<br>
> > > 4833.637677] [<ffffffffc0724cbe>] netdev_frame_hook+0xde/0x180<br>
> > > [openvswitch] [ 4833.637682] [<ffffffff8184d6aa>]<br>
> > > __netif_receive_skb_core+0x1fa/0xa10 [ 4833.637688]<br>
> > > [<ffffffffc0724be0>] ? vport_netdev_free+0x30/0x30 [openvswitch]<br>
> > > [ 4833.637692] [<ffffffff812d6539>] ? ttwu_do_wakeup+0x19/0xe0<br>
> > > [ 4833.637697] [<ffffffff8184ded8>] __netif_receive_skb+0x18/0x60<br>
> > > [ 4833.637703] [<ffffffff8184ee9e>] process_backlog+0xae/0x180<br>
> > > [ 4833.637707] [<ffffffff8184e57f>] net_rx_action+0x26f/0x390<br>
> > > [ 4833.637713] [<ffffffff812a41e5>] __do_softirq+0xf5/0x280<br>
> > > [ 4833.637719] [<ffffffff8199042c>] call_softirq+0x1c/0x30<br>
> > > [ 4833.637723] <EOI> [<ffffffff8122f675>] do_softirq+0x65/0xa0<br>
> > > [ 4833.637730] [<ffffffff812a363b>]<br>
> > > __local_bh_enable_ip+0x9b/0xb0 [ 4833.637735]<br>
> > > [<ffffffff812a3667>] local_bh_enable+0x17/0x20 [ 4833.637741]<br>
> > > [<ffffffff81850065>] __dev_queue_xmit+0x245/0x650 [ 4833.637746]<br>
> > > [<ffffffff81972e28>] ? printk+0x60/0x77 [ 4833.637752]<br>
> > > [<ffffffff81850480>] dev_queue_xmit+0x10/0x20 [ 4833.637757]<br>
> > > [<ffffffff81957a75>] packet_sendmsg+0xf65/0x1210 [ 4833.637761]<br>
> > > [<ffffffff813d7524>] ? shmem_fault+0x84/0x1f0 [ 4833.637768]<br>
> > > [<ffffffff8182d3a6>] sock_sendmsg+0xb6/0xf0 [ 4833.637772]<br>
> > > [<ffffffff812e3696>] ? update_curr+0x86/0x1e0 [ 4833.637777]<br>
> > > [<ffffffff812e3ae9>] ? update_cfs_shares+0xa9/0xf0 [ 4833.637781]<br>
> > > [<ffffffff8122b621>] ? __switch_to+0x151/0x580 [ 4833.637786]<br>
> > > [<ffffffff8182dad1>] SYSC_sendto+0x121/0x1c0 [ 4833.637793]<br>
> > > [<ffffffff812c8d10>] ? hrtimer_get_res+0x50/0x50 [ 4833.637797]<br>
> > > [<ffffffff8197e54b>] ? do_nanosleep+0x5b/0x100 [ 4833.637802]<br>
> > > [<ffffffff8182f5ee>] SyS_sendto+0xe/0x10 [ 4833.637806]<br>
> > > [<ffffffff8198cede>] system_call_fastpath+0x25/0x2a<br>
> > ><br>
> > > Looking forward to your reply.<br>
> > ><br>
> > > Regards,<br>
> > > Ramana <br>
> ><br>
> > <br>
<br>
</blockquote></div>
</blockquote></div></div>