[ovs-dev] [RFC v3 0/4] netdev-offload: Prerequisites of vport offloading via DPDK.

Eli Britstein elibr at mellanox.com
Sun Dec 8 12:29:20 UTC 2019


On 12/6/2019 4:34 PM, Ilya Maximets wrote:
> On 06.12.2019 14:09, Ophir Munk wrote:
>> Hi Ilya,
>> I applied this series on master ("dpdk: Update to use DPDK 19.11.") and PINGed between two "dpdk" ports (hw-offload=true).
>> It worked fine.
>> Then I watched flow statistics (dpif based) by executing the command in (1):
>> (1) watch ovs-appctl dpif/dump-flows -m <bridge name>
>> It worked fine.
>> Then I watched flow statistics (dpctl based) by executing the command in (2):
>> (2) watch ovs-appctl dpctl/dump-flows
>> In this case OVS crashed after a few seconds.
>>
>> When inspecting the calls trace I notice that pmd->dp->dpif->dpif_class->type is a corrupted memory address.
>>
>> (gdb) bt
>> #0  0x0000000000c26fda in dpif_normalize_type (type=0x7372 <Address 0x7372 out of bounds>) at lib/dpif.c:517
>> #1  0x0000000000c19864 in dp_netdev_flow_offload_put (offload=offload at entry=0x7f11fc00b790) at lib/dpif-netdev.c:2378
>> #2  0x0000000000c19ca8 in dp_netdev_flow_offload_main (data=<optimized out>) at lib/dpif-netdev.c:2467
>> #3  0x0000000000ca3c9d in ovsthread_wrapper (aux_=<optimized out>) at lib/ovs-thread.c:383
>> #4  0x00007f12a2a08e25 in start_thread () from /lib64/libpthread.so.0
>> #5  0x00007f12a222c34d in clone () from /lib64/libc.so.6
>>
>> This scenario repeats whenever running the command in (2) and when the code calls dpif_normalize_type().
>> It seems to me that the memory corruption was there for a while but only now it is exposed due to your rfc series.
>>
>> In order to observe this memory corruption I added the following printouts.
>> I tested it with hw-offload=false:
>> ovs-vsctl set Open_vSwitch . other_config:hw-offload=false
>> I tested it on latest master without your rfc series.
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 1e54936..18a804a 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -3625,6 +3625,10 @@ dpif_netdev_flow_dump_next(struct dpif_flow_dump_thread *thread_,
>>               }
>>           }
>>   
>> +        VLOG_ERR("pmd->dp=%p pmd->dp->dpif=%p pmd->dp->dpif->dpif_class=%p \
>> +                 pmd->dp->dpif->full_name=%s",
>> +                 pmd->dp, pmd->dp->dpif, pmd->dp->dpif->dpif_class,
>> +                 pmd->dp->dpif->full_name);
>>           do {
>>               for (n_flows = 0; n_flows < flow_limit; n_flows++) {
>>                   struct cmap_node *node;
>>
>>
>> Looking at the printouts the memory corruption is observed.
>>
>> Initially all printouts are identical and correct as expected.
>>
>> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x2525e70 pmd->dp->dpif->dpif_class=0xef3e80 pmd->dp->dpif->full_name=netdev at ovs-netdev
>> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x2525e70 pmd->dp->dpif->dpif_class=0xef3e80 pmd->dp->dpif->full_name=netdev at ovs-netdev
>> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x2525e70 pmd->dp->dpif->dpif_class=0xef3e80 pmd->dp->dpif->full_name=netdev at ovs-netdev
>>
>> Around this point I executed the dpctl command in (2) and you can notice that the pointers pmd->dp->dp_dpif and beyond became modified.
>>
>> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x25c3280 pmd->dp->dpif->dpif_class=0x251c790 pmd->dp->dpif->full_name=(null)
>> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x25c3280 pmd->dp->dpif->dpif_class=0x33c03608 pmd->dp->dpif->full_name=
>> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x25c3280 pmd->dp->dpif->dpif_class=0x2605700 pmd->dp->dpif->full_name=
>> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x25c3280 pmd->dp->dpif->dpif_class=0xe784250b pmd->dp->dpif->full_name=memseg-2048k-0-3
>> dpif_netdev|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x2602430 pmd->dp->dpif->dpif_class=0xef3e80 pmd->dp->dpif->full_name=netdev at ovs-netdev
>> dpif_netdev(revalidator82)|ERR|pmd->dp=0x7fdfea3ea010 pmd->dp->dpif=0x2602430 pmd->dp->dpif->dpif_class=0x2606200 pmd->dp->dpif->full_name=(null)
>>
>> The same happens if I test it with hw-offload=true.
>>
>> I hope this scenario can be recreated on other setups as well.
> This is interesting.  I'll try to reproduce this issue on my setup.

Hi Ilya,

I see that this bug is not only with this new patch-set, but now also 
came into effect in master, with acceptance of

30115809da2e ("dpif-netdev: Use netdev-offload API for port lookup while 
offloading.")

-        port = dp_netdev_lookup_port(pmd->dp, in_port);
+        port = netdev_ports_get(in_port, pmd->dp->dpif->dpif_class);
As of the corruption, the "port" is not retrieved by netdev_ports_get, 
and the offloading is not done. I have reverted this commits in my tests.

Though this commit is correct (but should not have any effect before 
vport offloading), maybe we should revert it until this corruption bug 
is fixed.

Thanks,

Eli

>
>> I will look more into it.
> Thanks!
>
> Best regards, Ilya Maximets.


More information about the dev mailing list