[ovs-discuss] KVS/OVS/NIC bridging issue on kernel 5.x not seen on 4.19?

Tyler Stachecki stachecki.tyler at gmail.com
Sat Feb 13 16:01:59 UTC 2021


I've fixed the issue in such a way that it works for me (TM), but would
appreciate confirmation from an OVS expert that I'm not overlooking
something here:

Based on my last post, we need:
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -503,6 +503,7 @@ void ovs_vport_send(struct vport *vport, struct sk_buff
*skb, u8 mac_proto)
        }

        skb->dev = vport->dev;
+       skb->tstamp = 0;
        vport->ops->send(skb);
        return;

As the timestamp must be cleared when forwarding packets to a different
namespace ref:
https://patchwork.ozlabs.org/project/netdev/patch/20180307011230.24001-3-jesus.sanchez-palencia@intel.com/#1871003

Cheers,
Tyler

On Sat, Feb 13, 2021 at 12:04 AM Tyler Stachecki <stachecki.tyler at gmail.com>
wrote:

> Here's the offender:
>
> commit fb420d5d91c1274d5966917725e71f27ed092a85 (refs/bisect/bad)
> Author: Eric Dumazet <edumazet at ...gle.com>
> Date:   Fri Sep 28 10:28:44 2018 -0700
>
>     tcp/fq: move back to CLOCK_MONOTONIC
>
> Without this, I wasn't able to make it past the 4.20 series.  I
> forward-ported a reversion to 5.4 LTS for fun and things still work great.
> Though it sounds like simply reverting this is not the right fix -- some
> interesting discussion on others impact of this commit:
> https://lists.openwall.net/netdev/2019/01/10/36
>
> > Then, we probably need to clear skb->tstamp in more paths (you are
> > mentioning bridge ...)
>
> I will try to take a peek sometime this weekend to see if I can spot where
> in OVS, assuming it is there.
>
> On Tue, Feb 9, 2021 at 4:22 PM Gregory Rose <gvrose8192 at gmail.com> wrote:
>
>>
>>
>> On 2/8/2021 4:19 PM, Tyler Stachecki wrote:
>> > Thanks for the reply.  This is router, so it is using conntrack; unsure
>> if
>> > there is additional connection tracking in OVS.  `ovs-ofctl dump-flows
>> > br-util` shows exactly one flow: the default one.
>> >
>> > Here's my approx /etc/network//interfaces.  I just attach VMs to this
>> with
>> > libvirt and having nothing else added at this point:
>> > allow-ovs br-util
>> > iface br-util inet manual
>> >          ovs_type OVSBridge
>> >          ovs_ports enp0s20f1.102 vrf-util
>> >
>> > allow-br-util enp0s20f1.102
>> > auto enp0s20f1.102
>> > iface enp0s20f1.102 inet manual
>> >          ovs_bridge br-util
>> >          ovs_type OVSPort
>> >          mtu 9000
>> >
>> > allow-br-util vrf-util
>> > iface vrf-util inet static
>> >          ovs_bridge br-util
>> >          ovs_type OVSIntPort
>> >          address 10.10.2.1/24
>> >          mtu 9000
>> >
>> > I roughly transcribed what I was doing into a Linux bridge, and it
>> works as
>> > expected in 5.10... e.g. this in my /etc/network/interfaces:
>> > auto enp0s20f1.102
>> > iface enp0s20f1.102 inet manual
>> >          mtu 9000
>> >
>> > auto vrf-util
>> > iface vrf-util inet static
>> >          bridge_ports enp0s20f1.102
>> >          bridge-vlan-aware no
>> >          address 10.10.2.1/24
>> >          mtu 9000
>> >
>> > I'm having a bit of a tough time following the dataflow code, and the ~1
>> > commit or so I was missing from the kernel staging tree does not seem to
>> > have fixed the issue.
>>
>> Hi Tyler,
>>
>> this does not sound like the previous issue I mentioned because that one
>> was caused by flow programming for dropping packets.
>>
>> I hate to say it but you're probably going to have to resort to a
>> bisect to find this one.
>>
>> - Greg
>>
>> >
>> > On Mon, Feb 8, 2021 at 6:21 PM Gregory Rose <gvrose8192 at gmail.com>
>> wrote:
>> >
>> >>
>> >>
>> >> On 2/6/2021 9:50 AM, Tyler Stachecki wrote:
>> >>> I have simple forwarding issues when running the Debian stable
>> backports
>> >>> kernel (5.9) that I don't see with the stable, non-backported 4.19
>> >> kernel.
>> >>> Big fat disclaimer: I compiled my OVS (2.14.1) from source, but given
>> it
>> >>> works with the 4.19 kernel I doubt it has anything to do with it.  For
>> >> good
>> >>> measure, I also compiled 5.10.8 from source and see the same issue I
>> do
>> >> in
>> >>> 5.9.
>> >>>
>> >>> The issue I see on 5.x (config snippets below):
>> >>> My VM (vnet0 - 10.10.0.16/24) can ARP/ping for other physical hosts
>> on
>> >> its
>> >>> subnet (e.g. 00:07:32:4d:2f:71 = 10.10.0.23/24 below), but only the
>> >> first
>> >>> echo request in a sequence is seen by the destination host.  I then
>> have
>> >> to
>> >>> wait about 10 seconds before pinging the destination host from the VM
>> >>> again, but again only the first echo in a sequence gets a reply.
>> >>>
>> >>> I've tried tcpdump'ing enp0s20f1.102 (the external interface on the
>> >>> hypervisor) and see the pings going out that interface at the rate I
>> >> would
>> >>> expect.  OTOH, when I tcpdump on the destination host, I only see the
>> >> first
>> >>> of the ICMP echo requests in a sequence (for which an echo reply is
>> >> sent).
>> >>>
>> >>> I then added an OVS internal port on the hypervisor (i.e., on br-util)
>> >> and
>> >>> gave it an IP address (10.10.2.1/24).  It is able to ping that same
>> >>> external host just fine.  Likewise, I am able to ping between the VM
>> and
>> >>> the OVS internal port just fine.
>> >>>
>> >>> When I rollback to 4.19, this weirdness about traffic going out of
>> >>> enp0s20f1.102 *for the VM* goes away and everything just works.  Any
>> >> clues
>> >>> while I start ripping into code?
>> >>
>> >> Are you using any of the connection tracking capabilities? I vaguely
>> >> recall some issue that sounds a lot like what you're seeing but do not
>> >> see anything in the git log to stir my memory.  IIRC though it was a
>> >> similar problem.
>> >>
>> >> Maybe provide a dump of your flows.
>> >>
>> >> - Greg
>> >>
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20210213/3e8ff1a0/attachment.html>


More information about the discuss mailing list