[ovs-discuss] Kernel oops running Open vSwitch on 3.3 Kernel (ARM)

Michele Bozier mbozier at Airspan.com
Fri Oct 4 13:31:19 UTC 2013


Jesse

Thanks for your reply.  This is a new platform for us so it is possible there is some problem with the environment.  A colleague is reporting a couple of unexplained issues with our own in-house code (distinct from Open vSwitch) on this platform so we will investigate these first and then come back to looking at Open vSwitch.

Many thanks for your help
Regards
Michele Bozier

-----Original Message-----
From: Jesse Gross [mailto:jesse at nicira.com] 
Sent: 03 October 2013 22:01
To: Michele Bozier
Cc: discuss at openvswitch.org
Subject: Re: [ovs-discuss] Kernel oops running Open vSwitch on 3.3 Kernel (ARM)

Both of these functions are pretty innocuous, don't work with shared data, and shouldn't be architecture-specific. Furthermore, given that the problem remains essentially the same but moves around between versions indicates to me that the issue isn't with the code itself.

It sounds to me that there is a larger issue with corruption - either by something else in memory, running off the end of the stack, etc.
That is obviously difficult to track down but it might explain why the problem appears to be specific to your environment (I've never heard a report of this before).

On Thu, Oct 3, 2013 at 1:38 AM, Michele Bozier <mbozier at airspan.com> wrote:
> Jesse,
>
> Many thanks for your suggestions.
> For the openvswitch.ko module built from the Open vSwitch git repository, the line of code causing the kernel oops appears to be the following in method ovs_flow_to_nlattrs():
>         if (nla_put_u32(skb, OVS_KEY_ATTR_PRIORITY, output->phy.priority))
>                 goto nla_put_failure;
> This is totally repeatable - happens every time.
>
> For the openvswitch.ko module built from the kernel 3.3 sources, the problem is different, but again totally repeatable.
> The Kernel oops is as follows:
>
> Unable to handle kernel NULL pointer dereference at virtual address 
> 00000000 pgd = de3e8000 [00000000] *pgd=9e3c7831, *pte=00000000, 
> *ppte=00000000 Internal error: Oops: 817 [#1] PREEMPT Modules linked 
> in:
> CPU: 0    Not tainted  (3.3.0 #1)
> PC is at ovs_flow_tbl_alloc+0x4e/0x94
> LR is at ovs_flow_tbl_alloc+0x4b/0x94
> pc : [<c027f5d6>]    lr : [<c027f5d3>]    psr: 80000033
> sp : de277c50  ip : 6c6c6c6c  fp : 00000000
> r10: 00000004  r9 : de33bf10  r8 : de32f280
> r7 : 00000000  r6 : de342000  r5 : 00000400  r4 : 00000002
> r3 : de342000  r2 : 00000000  r1 : 00000002  r0 : 00000000
> Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment user
> Control: 50c5387d  Table: 9e3e8019  DAC: 00000015 Process ovs-vswitchd 
> (pid: 454, stack limit = 0xde2762e8)
> Stack: (0xde277c50 to 0xde278000)
>> [<c027f5d6>] (ovs_flow_tbl_alloc+0x4e/0x94) from [<c027eda5>] 
> (ovs_dp_cmd_new+0x51/0x130) [<c027eda5>] (ovs_dp_cmd_new+0x51/0x130) 
> from [<c01c9347>] (genl_rcv_msg+0x15f/0x17c) [<c01c9347>] 
> (genl_rcv_msg+0x15f/0x17c) from [<c01c8c39>] 
> (netlink_rcv_skb+0x65/0x70) [<c01c8c39>] (netlink_rcv_skb+0x65/0x70) 
> from [<c01c91df>] (genl_rcv+0x17/0x20) [<c01c91df>] 
> (genl_rcv+0x17/0x20) from [<c01c888f>] (netlink_unicast+0x117/0x150) 
> [<c01c888f>] (netlink_unicast+0x117/0x150) from [<c01c8ab1>] 
> (netlink_sendmsg+0x185/0x1cc) [<c01c8ab1>] 
> (netlink_sendmsg+0x185/0x1cc) from [<c018f08b>] 
> (sock_sendmsg+0x5f/0x74)
>
> In this case, the line of code causing the problem in ovs_flow_tbl_alloc() is
>     table->buckets = alloc_buckets(new_size);
>
> When I tried to put a printk to dump the new_size property in this second scenario then the problem moved again.
> What else can I try?
> Regards
> Michele Bozier
>
>
> -----Original Message-----
> From: Jesse Gross [mailto:jesse at nicira.com]
> Sent: 01 October 2013 20:35
> To: Michele Bozier
> Cc: discuss at openvswitch.org
> Subject: Re: [ovs-discuss] Kernel oops running Open vSwitch on 3.3 
> Kernel (ARM)
>
> On Tue, Oct 1, 2013 at 2:25 AM, Michele Bozier <mbozier at airspan.com> wrote:
>> I am having trouble running Open vSwitch on the ARM platform after 
>> cross-compiling on an i686 platform.  I am using the latest code from 
>> master from the Open vSwitch git repository - commit Sept 26th
>> (6a8a8528acb05d6d0a520e09ad1ec67e62b99e5e) and the Arago Kernel 3.3.
>>
>>
>>
>> The problem I am seeing when running on the target and trying to 
>> create a switch is as follows:
>>
>>
>>
>> insmod ./openvswitch.ko
>>
>> The module seems to install fine -on the console I get
>>
>> openvswitch: Open vSwitch switching datapath 2.0.90, built Sep 30 
>> 2013
>> 11:33:05
>>
>>
>>
>> ./ovsdb-tool create /usr/local/etc/openvswitch/conf.db
>> ./vswitch.ovsschema ./ovsdb-server --remote=ptcp:6634 
>> --remote=db:Open_vSwitch,Open_vSwitch,manager_options
>> --pidfile=/home/opf/server.pid --detach ./ovs-vsctl
>> --db=tcp:127.0.0.1:6634 --no-wait init ./ovs-vswitchd
>> tcp:127.0.0.1:6634 --pidfile=/home/opf/switch.pid 
>> --log-file=/home/opf/switch.log --detach
>>
>>
>>
>> On the console I see the following:
>>
>> 1970-01-01T00:01:15Z|00001|vlog|INFO|opened log file 
>> /home/opf/switch.log
>>
>> 1970-01-01T00:01:15Z|00002|reconnect|INFO|tcp:127.0.0.1:6634: connecting...
>>
>> 1970-01-01T00:01:15Z|00003|reconnect|INFO|tcp:127.0.0.1:6634:
>> connected
>>
>>
>>
>> I then enter the command to create a switch ./ovs-vsctl
>> --db=tcp:127.0.0.1:6634 add-br opfbr
>>
>>
>>
>> I get the following output to the console
>>
>> device: 'ovs-system': device_add
>>
>> device ovs-system entered promiscuous mode
>>
>> device: 'opfbr0': device_add
>>
>> device opfbr0 entered promiscuous mode
>>
>>
>>
>> Followed shortly afterwards by a kernel oops.
>>
>>
>>
>> [root at synergy opf]# Unable to handle kernel paging request at virtual 
>> address 8d10051d pgd = dd840000 [8d10051d] *pgd=00000000 Internal error:
>> Oops: 5 [#1] PREEMPT Modules linked in: openvswitch(O)
>>
>> CPU: 0    Tainted: G           O  (3.3.0 #7)
>>
>> PC is at ovs_flow_to_nlattrs+0x5/0x430 [openvswitch] LR is at
>> ovs_flow_cmd_fill_info+0x114/0x208 [openvswitch]
>>
>> pc : [<bf80524e>]    lr : [<bf801669>]    psr: 80000033
>>
>> sp : de273c30  ip : 00000058  fp : 00000018
>>
>> r10: de36e540  r9 : 0001fffb  r8 : dd8b8000
>>
>> r7 : 00000013  r6 : 000001cd  r5 : dd8b8088  r4 : 00000070
>>
>> r3 : 00000000  r2 : de36e540  r1 : 8d100505  r0 : 0002001b
>>
>> Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment user
>>
>> Control: 50c5387d  Table: 9d840019  DAC: 00000015 Process ovs-vswitchd (pid:
>> 461, stack limit = 0xde2722e8)
>>
>> Stack: (0xde273c30 to 0xde274000)
>>
>> ...
>>
>> [<bf80524e>] (ovs_flow_to_nlattrs+0x5/0x430 [openvswitch]) from 
>> [<bf801669>]
>> (ovs_flow_cmd_fill_info+0x114/0x208 [openvswitch]) [<bf801669>]
>> (ovs_flow_cmd_fill_info+0x114/0x208 [openvswitch]) from [<bf80179f>] 
>> (ovs_flow_cmd_dump+0x42/0x7c [openvswitch]) [<bf80179f>] 
>> (ovs_flow_cmd_dump+0x42/0x7c [openvswitch]) from [<c01c90fb>]
>> (netlink_dump+0x3b/0x130) [<c01c90fb>] (netlink_dump+0x3b/0x130) from 
>> [<c01c9983>] (netlink_dump_start+0xc7/0x108) [<c01c9983>]
>> (netlink_dump_start+0xc7/0x108) from [<c01cb069>]
>> (genl_rcv_msg+0xc1/0x17c) [<c01cb069>] (genl_rcv_msg+0xc1/0x17c) from 
>> [<c01ca9f9>]
>> (netlink_rcv_skb+0x65/0x70) [<c01ca9f9>] (netlink_rcv_skb+0x65/0x70) 
>> from [<c01caf9f>] (genl_rcv+0x17/0x20) [<c01caf9f>]
>> (genl_rcv+0x17/0x20) from [<c01ca64f>] (netlink_unicast+0x117/0x150) 
>> [<c01ca64f>]
>> (netlink_unicast+0x117/0x150) from [<c01ca871>]
>> (netlink_sendmsg+0x185/0x1cc) [<c01ca871>]
>> (netlink_sendmsg+0x185/0x1cc) from [<c0190e4b>]
>> (sock_sendmsg+0x5f/0x74) [<c0190e4b>]
>> (sock_sendmsg+0x5f/0x74) from [<c01921c1>] (sys_sendto+0x6d/0x80) 
>> [<c01921c1>] (sys_sendto+0x6d/0x80) from [<c01921e3>]
>> (sys_send+0xf/0x14) [<c01921e3>] (sys_send+0xf/0x14) from 
>> [<c000c521>]
>> (ret_fast_syscall+0x1/0x46)
>>
>> Code: bf00 e92d 47f0 b086 (698f) ab06
>>
>> ---[ end trace c6309ab77c3d706d ]---
>>
>>
>>
>> The process I followed to cross-compile the code base is as follows:
>>
>>
>>
>> ./boot.sh
>>
>>
>>
>> ./configure CC=arm-none-linux-gnueabi-gcc 
>> --host=arm-none-linux-gnueabi --target=arm-none-linux-gnueabi 
>> --build=i686-linux --with-linux=/home/mbozier/synergy/kernel/ti
>> KARCH=arm --disable-ssl
>> CPPFLAGS=-I/home/mbozier/tirootfs/usr/inc-L/home/mbozier/tirootfs/usr
>> /
>> lib
>>
>>
>>
>> make CROSS_COMPILE="arm-none-linux-gnueabi-" ARCH="arm"
>> KCC="arm-none-linux-gnueabi-gcc" GCC="arm-none-linux-gnueabi-gcc"
>>
>>
>>
>> The kernel used on the target is built without Open vSwitch support 
>> and the 802.1d bridging support is configured to be loaded as a module.
>>
>>
>>
>> I also tried running the OpenvSwitch kernel module built from the 
>> sources distributed with the 3.3 kernel but with no success either.
>
> Is it the exact same problem on this kernel or is a different one?
>
> Probably the place to start is to use GDB to find exactly where it is faulting, based on the address in the stack trace. Is the problem reproducible?
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss


More information about the discuss mailing list