[ovs-dev] kernel panic under heavy receive load

Fri Feb 27 23:14:50 UTC 2015

On Fri, Feb 27, 2015 at 3:06 PM, Xu (Simon) Chen <xchenum at gmail.com> wrote:
>
>
> On Friday, February 27, 2015, Chris Dunlop <chris at onthe.net.au> wrote:
>>
>> On Fri, Feb 27, 2015 at 11:08:21AM -0800, Pravin Shelar wrote:
>> > On Thu, Feb 26, 2015 at 9:13 PM, Chris Dunlop <chris at onthe.net.au>
>> > wrote:
>> > > Hi,
>> > >
>> > > "Me too" on Simon's BUG() described below (apologies for the top
>> > > post).
>> > > Basically:
>> > >
>> > > [ 7318.409796] kernel BUG at net/core/skbuff.c:1041!
>> > > ...
>> > > [ 7318.591562] RIP: 0010:[<ffffffff813eb634>]  [<ffffffff813eb634>]
>> > > pskb_expand_head+0x234/0x270
>> > > ...
>> > > [ 7318.705710]  [<ffffffff813eb6fc>] __pskb_pull_tail+0x4c/0x330
>> > > [ 7318.711571]  [<ffffffff813f8ca7>] skb_checksum_help+0x147/0x1a0
>> > > [ 7318.717599]  [<ffffffffa07de8b0>]
>> > > queue_userspace_packet+0x3f0/0x440 [openvswitch]
>> > >
>> > > I've hit this BUG() several times within hours or days of running on
>> > > v3.14.27
>> > > and v3.14.33, whereas the box previously ran for months on v3.10.33
>> > > without an
>> > > issue.
>> >
>> > Can you reproduce this bug on hypervisor to hypervisor test without any
>> > VMs?
>>
>> I don't know.
>>
>> Sorry, I've only seen the problem after many hours of running in
>> a production environment, and I don't have the spare facilities
>> to run the hypervisor to hypervisor testing and risk the crash.
>>
>> Simon, are you able to try your test running direct hypervisor
>> to hypervisor?
>
>
> Nope...  I have only seen this between VMs. After repeated tests, it has got
> worse that an iperf run would almost immediately trigger a crash.

So it looks like vhost is generating shared skb. Can you try same test
on latest upstream kernel?