[ovs-dev] [ovs-discuss] gso packet is failing with af_packet socket with packet_vnet_hdr

Ramana Reddy gtvrreddy at gmail.com
Tue Nov 5 20:29:41 UTC 2019


Hi Flavio,
As per your inputs, I modified the gso_size, and now
skb_gso_validate_mtu(skb, mtu) is returning true, and
ip_finish_output2(sk, skb)  and dst_neigh_output(dst, neigh, skb); are
getting called. But still, I am seeing the large packets getting dropped
somewhere in the kernel
down the line and retransmission happening.

if (skb_gso_validate_mtu(skb, mtu))
             return ip_finish_output2(sk, skb);

[ 1854.905733] vxlan_xmit:2262 skb->len:2776 packet_length:2762
[ 1854.905744] skb_gso_size_check:4478 and seg_len:1500 and max_len:1500
and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535

The gso_size 1398 bytes is correct in my case ( 1398 + 50 (vxlan header) +
20(IP) + TCP(32) + 14(ETH) = 1514 bytes)
The code is simple:
      vnet = buf;  // buf is an array of 64k bytes
      len = 0;
        if (csum) {
                        vnet->flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM);


                         vnet->csum_start = (ETH_HLEN + sizeof(*iph));
                        vnet->csum_offset = (__builtin_offsetof(struct
tcphdr, check));
                       }



                   if (gso) {
                        vnet->hdr_len = (ETH_HLEN + sizeof(*iph) +
sizeof(*tcph));
                        vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;


                        vnet->gso_size = ( ETH_DATA_LEN  - 50 -
sizeof(struct iphdr) -
                                                        sizeof(struct
tcphdr));  // 50 is the vxlan header
                } else {
                        vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE;
                         vnet->gso_size =  0;
                }
             len =sizeof(*vnet);


             // Now copying the entire L2  packet into the buf starting at
an offset buf + len and sending the packet.

 Did I miss something? And I am not sure how OVS behaves after receiving
this packet and before transmitting to vxlan.
How is checksum offloading happening with af_packet in OVS?  Does OVS have
any role in this?


Please see the attached image for reference. The packet flow with in the
host is given below:

Ubuntu container (eth0 (1500MTU))----------routing
lookup-------------->Ubuntu container(veth0(1450 MTU))
->OVS(veth1(1450MTU))->vxlan(65K MTU)->eth0(physical
interface(1500MTU))->other machine.

Looking forward to your reply.
Regards,
Ramana


On Mon, Nov 4, 2019 at 10:41 PM Ramana Reddy <gtvrreddy at gmail.com> wrote:

> Thanks, Flavio. I will check it out tomorrow and let you know how it goes.
>
> Regards,
> Ramana
>
>
> On Mon, Nov 4, 2019 at 10:15 PM Flavio Leitner <fbl at sysclose.org> wrote:
>
>> On Mon, 4 Nov 2019 21:32:28 +0530
>> Ramana Reddy <gtvrreddy at gmail.com> wrote:
>>
>> > Hi Favio Leitner,
>> > Thank you very much for your reply. Here is the code snippet. But the
>> > same code is working if I send the packet without ovs.
>>
>> Could you provide more details on the OvS environment and the test?
>>
>> The linux kernel propagates the header size dependencies when you stack
>> the devices in net_device->hard_header_len, so in the case of vxlan dev
>> it will be:
>>
>> needed_headroom = lowerdev->hard_header_len;
>> needed_headroom += VXLAN_HEADROOM;
>> dev->needed_headroom = needed_headroom;
>>
>> Sounds like that is helping when OvS is not being used.
>>
>> fbl
>>
>>
>> > bool csum = true;
>> > bool gso = true'
>> >  struct virtio_net_hdr *vnet = buf;
>> >                if (csum) {
>> >                         vnet->flags = (VIRTIO_NET_HDR_F_NEEDS_CSUM);
>> >                         vnet->csum_start = ETH_HLEN + sizeof(*iph);
>> >                         vnet->csum_offset = __builtin_offsetof(struct
>> > tcphdr, check);
>> >                 }
>> >
>> >                 if (gso) {
>> >                         vnet->hdr_len = ETH_HLEN + sizeof(*iph) +
>> > sizeof(*tcph);
>> >                         vnet->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
>> >                         vnet->gso_size = ETH_DATA_LEN - sizeof(struct
>> > iphdr) -
>> >                                                         sizeof(struct
>> > tcphdr);
>> >                 } else {
>> >                         vnet->gso_type = VIRTIO_NET_HDR_GSO_NONE;
>> >                 }
>> > Regards,
>> > Ramana
>> >
>> >
>> > On Mon, Nov 4, 2019 at 8:39 PM Flavio Leitner <fbl at sysclose.org>
>> > wrote:
>> >
>> > >
>> > > Hi,
>> > >
>> > > What's the value you're passing on gso_size in struct
>> > > virtio_net_hdr? You need to leave room for the encapsulation
>> > > header, e.g.:
>> > >
>> > > gso_size = iface_mtu - virtio_net_hdr->hdr_len
>> > >
>> > > fbl
>> > >
>> > > On Mon, 4 Nov 2019 01:11:36 +0530
>> > > Ramana Reddy <gtvrreddy at gmail.com> wrote:
>> > >
>> > > > Hi,
>> > > > I am wondering if anyone can help me with this. I am having
>> > > > trouble to send tso/gso packet
>> > > > with af_packet socket with packet_vnet_hdr (through
>> > > > virtio_net_hdr) over vxlan tunnel in OVS.
>> > > >
>> > > > What I observed that, the following function eventually hitting
>> > > > and is returning false (net/core/skbuff.c), hence the packet is
>> > > > dropping. static inline bool skb_gso_size_check(const struct
>> > > > sk_buff *skb, unsigned int seg_len,
>> > > >                                       unsigned int max_len) {
>> > > >         const struct skb_shared_info *shinfo = skb_shinfo(skb);
>> > > >         const struct sk_buff *iter;
>> > > >         if (shinfo->gso_size != GSO_BY_FRAGS)
>> > > >                 return seg_len <= max_len;
>> > > >         ..........
>> > > > }
>> > > > [  678.756673] ip_finish_output_gso:235 packet_length:2762 (here
>> > > > packet_length = skb->len - skb_inner_network_offset(skb))
>> > > > [  678.756678] ip_fragment:510 packet length:1500
>> > > > [  678.756715] ip_fragment:510 packet length:1314
>> > > > [  678.956889] skb_gso_size_check:4474 and seg_len:1550 and
>> > > > max_len:1500 and shinfo->gso_size:1448 and GSO_BY_FRAGS:65535
>> > > >
>> > > > Observation:
>> > > > When we send the large packet ( example here is
>> > > > packet_length:2762), its showing the seg_len(1550) >
>> > > > max_len(1500). Hence return seg_len <= max_len statement
>> > > > returning false. Because of this, ip_fragment calling
>> > > > icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
>> > > > rather the code reaching to ip_finish_output2(sk, skb)
>> > > > function in net/ipv4/ip_output.c and is given below:
>> > > >
>> > > > static int ip_finish_output_gso(struct sock *sk, struct sk_buff
>> > > > *skb, unsigned int mtu)
>> > > > {
>> > > >         netdev_features_t features;
>> > > >         struct sk_buff *segs;
>> > > >         int ret = 0;
>> > > >
>> > > >         /* common case: seglen is <= mtu */
>> > > >         if (skb_gso_validate_mtu(skb, mtu))
>> > > >                 return ip_finish_output2(sk, skb);
>> > > >        ...........
>> > > >       err = ip_fragment(sk, segs, mtu, ip_finish_output2);
>> > > >       ...........
>> > > >  }
>> > > >
>> > > > But when we send normal iperf traffic ( gso/tso  traffic) over
>> > > > vxlan, the skb_gso_size_check returning a true value, and
>> > > > ip_finish_output2 getting executed.
>> > > > Here is the values of normal iperf traffic over vxlan.
>> > > >
>> > > > [ 1041.400537] skb_gso_size_check:4477 and seg_len:1500 and
>> > > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
>> > > > [ 1041.400587] skb_gso_size_check:4477 and seg_len:1450 and
>> > > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
>> > > > [ 1041.400594] skb_gso_size_check:4477 and seg_len:1500 and
>> > > > max_len:1500 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
>> > > > [ 1041.400732] skb_gso_size_check:4477 and seg_len:1450 and
>> > > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
>> > > > [ 1041.400741] skb_gso_size_check:4477 and seg_len:1450 and
>> > > > max_len:1450 and shinfo->gso_size:1398 and GSO_BY_FRAGS:65535
>> > > >
>> > > > Can someone help me to solve what is missing, and where should I
>> > > > modify the code in OVS/ or outside of ovs, so that it works as
>> > > > expected.
>> > > >
>> > > > Thanks in advance.
>> > > >
>> > > > Some more info:
>> > > > [root at xx ~]# uname -r
>> > > > 3.10.0-1062.4.1.el7.x86_64
>> > > > [root at xx ~]# cat /etc/redhat-release
>> > > > Red Hat Enterprise Linux Server release 7.7 (Maipo)
>> > > >
>> > > > [root at xx]# ovs-vsctl --version
>> > > > ovs-vsctl (Open vSwitch) 2.9.0
>> > > > DB Schema 7.15.1
>> > > >
>> > > > And dump_stack output with af_packet:
>> > > > [ 4833.637460]  <IRQ>  [<ffffffff81979612>] dump_stack+0x19/0x1b
>> > > > [ 4833.637474]  [<ffffffff8197c3ca>]
>> > > > ip_fragment.constprop.55+0xc3/0x141 [ 4833.637481]
>> > > > [<ffffffff8189dd84>] ip_finish_output+0x314/0x350 [ 4833.637484]
>> > > > [<ffffffff8189eb83>] ip_output+0xb3/0x130 [ 4833.637490]
>> > > > [<ffffffff8189da70>] ? ip_do_fragment+0x910/0x910 [ 4833.637493]
>> > > > [<ffffffff8189cac9>] ip_local_out_sk+0xf9/0x180 [ 4833.637497]
>> > > > [<ffffffff818e6f6c>] iptunnel_xmit+0x18c/0x220 [ 4833.637505]
>> > > > [<ffffffffc073b2e7>] udp_tunnel_xmit_skb+0x117/0x130 [udp_tunnel]
>> > > > [ 4833.637538]  [<ffffffffc074585a>] vxlan_xmit_one+0xb6a/0xb70
>> > > > [vxlan] [ 4833.637545]  [<ffffffff8129dad9>] ?
>> > > > vprintk_default+0x29/0x40 [ 4833.637551]  [<ffffffffc074765e>]
>> > > > vxlan_xmit+0xc9e/0xef0 [vxlan] [ 4833.637555]
>> > > > [<ffffffff818356e7>] ? kfree_skbmem+0x37/0x90 [ 4833.637559]
>> > > > [<ffffffff81836c24>] ? consume_skb+0x34/0x90 [ 4833.637564]
>> > > > [<ffffffff819547bc>] ? packet_rcv+0x4c/0x3e0 [ 4833.637570]
>> > > > [<ffffffff8184d346>] dev_hard_start_xmit+0x246/0x3b0 [
>> > > > 4833.637574]  [<ffffffff81850339>] __dev_queue_xmit+0x519/0x650 [
>> > > > 4833.637580]  [<ffffffff812d9df0>] ? try_to_wake_up+0x190/0x390 [
>> > > > 4833.637585]  [<ffffffff81850480>] dev_queue_xmit+0x10/0x20 [
>> > > > 4833.637592]  [<ffffffffc0724316>] ovs_vport_send+0xa6/0x180
>> > > > [openvswitch] [ 4833.637599] [<ffffffffc07150fe>]
>> > > > do_output+0x4e/0xd0 [openvswitch] [ 4833.637604]
>> > > > [<ffffffffc0716699>] do_execute_actions+0xa29/0xa40 [openvswitch]
>> > > > [ 4833.637610]  [<ffffffff812d24d2>] ?
>> > > > __wake_up_common+0x82/0x120 [ 4833.637615]  [<ffffffffc0716aac>]
>> > > > ovs_execute_actions+0x4c/0x140 [openvswitch] [ 4833.637621]
>> > > > [<ffffffffc071a824>] ovs_dp_process_packet+0x84/0x120
>> > > > [openvswitch] [ 4833.637627]  [<ffffffffc0725404>] ?
>> > > > ovs_ct_update_key+0xc4/0x150 [openvswitch]
>> > > > [ 4833.637633]  [<ffffffffc0724213>] ovs_vport_receive+0x73/0xd0
>> > > > [openvswitch]
>> > > > [ 4833.637638]  [<ffffffff812d666f>] ? ttwu_do_activate+0x6f/0x80
>> > > > [ 4833.637642]  [<ffffffff812d9df0>] ? try_to_wake_up+0x190/0x390
>> > > > [ 4833.637646]  [<ffffffff812da0c2>] ?
>> > > > default_wake_function+0x12/0x20 [ 4833.637651]
>> > > > [<ffffffff812c61eb>] ? autoremove_wake_function+0x2b/0x40 [
>> > > > 4833.637657] [<ffffffff812d24d2>] ? __wake_up_common+0x82/0x120 [
>> > > > 4833.637661] [<ffffffff812e3ae9>] ? update_cfs_shares+0xa9/0xf0 [
>> > > > 4833.637665] [<ffffffff812e3696>] ? update_curr+0x86/0x1e0 [
>> > > > 4833.637669] [<ffffffff812dee88>] ? __enqueue_entity+0x78/0x80 [
>> > > > 4833.637677] [<ffffffffc0724cbe>] netdev_frame_hook+0xde/0x180
>> > > > [openvswitch] [ 4833.637682]  [<ffffffff8184d6aa>]
>> > > > __netif_receive_skb_core+0x1fa/0xa10 [ 4833.637688]
>> > > > [<ffffffffc0724be0>] ? vport_netdev_free+0x30/0x30 [openvswitch]
>> > > > [ 4833.637692]  [<ffffffff812d6539>] ? ttwu_do_wakeup+0x19/0xe0
>> > > > [ 4833.637697]  [<ffffffff8184ded8>] __netif_receive_skb+0x18/0x60
>> > > > [ 4833.637703]  [<ffffffff8184ee9e>] process_backlog+0xae/0x180
>> > > > [ 4833.637707]  [<ffffffff8184e57f>] net_rx_action+0x26f/0x390
>> > > > [ 4833.637713]  [<ffffffff812a41e5>] __do_softirq+0xf5/0x280
>> > > > [ 4833.637719]  [<ffffffff8199042c>] call_softirq+0x1c/0x30
>> > > > [ 4833.637723]  <EOI>  [<ffffffff8122f675>] do_softirq+0x65/0xa0
>> > > > [ 4833.637730]  [<ffffffff812a363b>]
>> > > > __local_bh_enable_ip+0x9b/0xb0 [ 4833.637735]
>> > > > [<ffffffff812a3667>] local_bh_enable+0x17/0x20 [ 4833.637741]
>> > > > [<ffffffff81850065>] __dev_queue_xmit+0x245/0x650 [ 4833.637746]
>> > > > [<ffffffff81972e28>] ? printk+0x60/0x77 [ 4833.637752]
>> > > > [<ffffffff81850480>] dev_queue_xmit+0x10/0x20 [ 4833.637757]
>> > > > [<ffffffff81957a75>] packet_sendmsg+0xf65/0x1210 [ 4833.637761]
>> > > > [<ffffffff813d7524>] ? shmem_fault+0x84/0x1f0 [ 4833.637768]
>> > > > [<ffffffff8182d3a6>] sock_sendmsg+0xb6/0xf0 [ 4833.637772]
>> > > > [<ffffffff812e3696>] ? update_curr+0x86/0x1e0 [ 4833.637777]
>> > > > [<ffffffff812e3ae9>] ? update_cfs_shares+0xa9/0xf0 [ 4833.637781]
>> > > >  [<ffffffff8122b621>] ? __switch_to+0x151/0x580 [ 4833.637786]
>> > > > [<ffffffff8182dad1>] SYSC_sendto+0x121/0x1c0 [ 4833.637793]
>> > > > [<ffffffff812c8d10>] ? hrtimer_get_res+0x50/0x50 [ 4833.637797]
>> > > > [<ffffffff8197e54b>] ? do_nanosleep+0x5b/0x100 [ 4833.637802]
>> > > > [<ffffffff8182f5ee>] SyS_sendto+0xe/0x10 [ 4833.637806]
>> > > > [<ffffffff8198cede>] system_call_fastpath+0x25/0x2a
>> > > >
>> > > > Looking forward to your reply.
>> > > >
>> > > > Regards,
>> > > > Ramana
>> > >
>> > >
>>
>>


More information about the dev mailing list