[ovs-dev] [PATCH 2/6] compat: Simplify inet_fragment backports.

Joe Stringer joe at ovn.org
Mon Aug 1 21:11:50 UTC 2016


On 1 August 2016 at 13:07, pravin shelar <pshelar at ovn.org> wrote:
> On Mon, Aug 1, 2016 at 11:46 AM, Joe Stringer <joe at ovn.org> wrote:
>> On 1 August 2016 at 10:21, pravin shelar <pshelar at ovn.org> wrote:
>>> On Tue, Jul 12, 2016 at 3:26 PM, Joe Stringer <joe at ovn.org> wrote:
>>>> The core fragmentation handling logic is exported on all supported
>>>> kernels, so it's not necessary to backport the latest version of this.
>>>> This greatly simplifies the code due to inconsistencies between the old
>>>> per-lookup garbage collection and the newer workqueue based garbage
>>>> collection.
>>>>
>>>> As a result of simplifying and removing unnecessary backport code, a few
>>>> bugs are fixed for corner cases such as when some fragments remain in
>>>> the fragment cache when openvswitch is unloaded.
>>>>
>>>> Some backported ip functions need a little extra logic than what is seen
>>>> on the latest code due to this, for instance on kernels <3.17:
>>>> * Call inet_frag_evictor() before defrag
>>>> * Limit hashsize in ip{,6}_fragment logic
>>>>
>>>> The pernet init/exit logic also differs a little from upstream. Upstream
>>>> ipv[46]_defrag logic initializes the various pernet fragment parameters
>>>> and its own global fragments cache. In the OVS backport, the pernet
>>>> parameters are shared while the fragments cache is separate. The
>>>> backport relies upon upstream pernet initialization to perform the
>>>> shared setup, and performs no pernet initialization of its own. When it
>>>> comes to pernet exit however, the backport must ensure that all
>>>> OVS-specific fragment state is cleared, while the shared state remains
>>>> untouched so that the regular ipv[46] logic may do its own cleanup. In
>>>> practice this means that OVS must have its own divergent implementation
>>>> of inet_frags_exit_net().
>>>>
>>>> Fixes the following crash:
>>>>
>>>> Call Trace:
>>>>  <IRQ>
>>>>  [<ffffffff810744f6>] ? call_timer_fn+0x36/0x100
>>>>  [<ffffffff8107548f>] run_timer_softirq+0x1ef/0x2f0
>>>>  [<ffffffff8106cccc>] __do_softirq+0xec/0x2c0
>>>>  [<ffffffff8106d215>] irq_exit+0x105/0x110
>>>>  [<ffffffff81737095>] smp_apic_timer_interrupt+0x45/0x60
>>>>  [<ffffffff81735a1d>] apic_timer_interrupt+0x6d/0x80
>>>>  <EOI>
>>>>  [<ffffffff8104f596>] ? native_safe_halt+0x6/0x10
>>>>  [<ffffffff8101cb2f>] default_idle+0x1f/0xc0
>>>>  [<ffffffff8101d406>] arch_cpu_idle+0x26/0x30
>>>>  [<ffffffff810bf3a5>] cpu_startup_entry+0xc5/0x290
>>>>  [<ffffffff810415ed>] start_secondary+0x21d/0x2d0
>>>> Code:  Bad RIP value.
>>>> RIP  [<ffffffffa0177480>] 0xffffffffa0177480
>>>>  RSP <ffff88003f703e78>
>>>> CR2: ffffffffa0177480
>>>> ---[ end trace eb98ca80ba07bd9c ]---
>>>> Kernel panic - not syncing: Fatal exception in interrupt
>>>>
>>>> Signed-off-by: Joe Stringer <joe at ovn.org>
>>>> ---
>>>> I've tested this on CentOS kernel 3.10.0-327 and Ubuntu kernels
>>>> 3.13.0-68, 3.16.0-70, 3.19.0-58, and 4.2.0-35. Some earlier kernel
>>>> versions may still trigger fragmentation-related crashes due to upstream
>>>> bugs, for instance on Ubuntu's 3.13.0-24.
>>>> ---
>>>>  acinclude.m4                                  |   1 +
>>>>  datapath/linux/compat/include/net/inet_frag.h |  58 ++-
>>>>  datapath/linux/compat/inet_fragment.c         | 486 ++------------------------
>>>>  datapath/linux/compat/ip_fragment.c           |  36 +-
>>>>  datapath/linux/compat/nf_conntrack_reasm.c    |  17 +
>>>>  5 files changed, 91 insertions(+), 507 deletions(-)
>>>>
>>> ...
>>>
>>>> -void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q,
>>>> -                                  const char *prefix)
>>>> -{
>>>> -       static const char msg[] = "inet_frag_find: Fragment hash bucket"
>>>> -               " list length grew over limit " __stringify(INETFRAGS_MAXDEPTH)
>>>> -               ". Dropping fragment.\n";
>>>> +       local_bh_disable();
>>>> +       inet_frag_evictor(nf, f, true);
>>>> +       local_bh_enable();
>>>>
>>>> -       if (PTR_ERR(q) == -ENOBUFS)
>>>> -               net_dbg_ratelimited("%s%s", prefix, msg);
>>>> +       nf->low_thresh = thresh;
>>>>  }
>>>
>>> Upstream inet_frags_exit_net() and back ported looks the same, we can
>>> just use the exported symbol.
>>
>> We share nf->mem with the upstream frag code, so we shouldn't
>> percpu_counter_destroy(&nf->mem) it when we exit the namespace; the
>> upstream frag code will handle that.
>>
>> My other concern was that we use the shared nf->low_thresh to clear
>> out our fragments cache, but if we are exiting our module and the
>> upstream frag code remains, we should restore the nf->low_thresh
>> afterwards so that we don't cause flushing of the upstream frag cache.
>>
> I think we can restore the both objects in compat code after calling
> default exit function.
>
> But the exit code implementation itself is not that complicated so I
> am fine with the code.

I think it's more obvious how we're overriding the upstream exit code
with it here rather than hidden in our compat exit code, so I'd rather
leave it as is.



More information about the dev mailing list