[ovs-dev] [PATCH v4 2/2] netdev-dpdk: Add new DPDK RFC 4115 egress policer

Stokes, Ian ian.stokes at intel.com
Tue Jan 14 16:50:31 UTC 2020



On 1/14/2020 4:21 PM, Eelco Chaudron wrote:
> 
> 
> On 14 Jan 2020, at 16:21, Stokes, Ian wrote:
> 
>> On 1/14/2020 2:13 PM, Eelco Chaudron wrote:
>>>
>>>
>>> On 14 Jan 2020, at 12:23, Stokes, Ian wrote:
>>>
>>>> On 1/13/2020 8:32 PM, Stokes, Ian wrote:
>>>
>>> <SNIP>
>>>
>>>> Hi Eelco, I'm seeing a crash in OVS while running this with just a 
>>>> port and a default queue 0 (phy to phy setup). It seems related to 
>>>> the call to rte_meter_trtcm_rfc4115_color_blind_check. I've provided 
>>>> more detail below in the trtcm_policer_run_single_packet function, 
>>>> just wondering if you've come across it?
>>>>
>>>> Heres the output for the qos configuration I'm using
>>>>
>>>> -bash-4.4$ovs-appctl -t ovs-vswitchd qos/show dpdk1
>>>> QoS: dpdk1 trtcm-policer
>>>> eir: 52000
>>>> cbs: 2048
>>>> ebs: 2048
>>>> cir: 52000
>>>>
>>>> Default:
>>>>   eir: 52000
>>>>   cbs: 2048
>>>>   ebs: 2048
>>>>   cir: 52000
>>>>   tx_packets: 672150
>>>>   tx_bytes: 30918900
>>>>   tx_errors: 489562233
>>>>
>>>> I'll try to investigate further with DPDK and GDB also.
>>>
>>> I tried to replicate this, but I’m not able to do so. How did you 
>>> test? Reconfiguring it and start, etc. etc.?
>>
>> Starting a fresh instance of OVS (cleared previous OVSDB etc.).
>>
>> dpdk-socket-mem="1024,0"
>> dpdk-lcore-mask="0x2"
>> pmd-cpu-mask="0xC"
>>
>> 2 phy ports only, 1 rxq per phy port.
>>
>> Flow rules are basic (in port 1 out port 2)
>>
>> Traffic profile is IPv4 UDP 64 byte packets at line rate (10G)
>>
>> QoS Setup with the following
>>
>> sudo $OVS_DIR/utilities/ovs-vsctl --timeout=5 set port dpdk1 
>> qos=@myqos -- \
>> --id=@myqos create qos type=trtcm-policer \
>> other-config:cir=52000 other-config:cbs=2048 \
>> other-config:eir=52000 other-config:ebs=2048
>>
>> From there it's a case of leaving traffic run (between 10 to 15 mins) 
>> before the segfault occurs.
> 
> Tried a couple of runs, but no luck…

We've tried on a second system as well, but were not able to reproduce 
it, it may be specific to the first test board I've used in this case. 
Shouldn't be a blocker. I'll look at the v5 but think we're close to 
merging.

Regards
Ian

> 
>>>
>>> <SNIP>
>>>
>>>>
>>>> A few times during testing I have seen OVS crash with the following
>>>>
>>>> ./launch_vswitch.sh: line 66: 11694 Floating point exception sudo 
>>>> $OVS_DIR/vswitchd/ovs-vswitchd unix:$DB_SOCK --pidfile
>>>>
>>>> Looking into it with GDB it sems related to the 
>>>> rte_meter_trtcm_rfc4115_color_blind_check above. See GDB output below.
>>>>
>>>> Thread 12 "pmd-c03/id:9" received signal SIGFPE, Arithmetic exception.
>>>> [Switching to Thread 0x7f3dce734700 (LWP 26465)]
>>>> 0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
>>>> (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
>>>> /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599 
>>>>
>>>> 599             n_periods_te = time_diff_te / p->eir_period;
>>>> (gdb) bt
>>>> #0  0x0000000000d4b92d in rte_meter_trtcm_rfc4115_color_blind_check 
>>>> (m=0x328e178, p=0x328e148, time=29107058565136113, pkt_len=46) at 
>>>> /opt/istokes/dpdk-19.11//x86_64-native-linuxapp-gcc/include/rte_meter.h:599 
>>>>
>>>> #1  0x0000000000d4abc1 in trtcm_policer_run_single_packet 
>>>> (policer=0x2774200, pkt=0x1508fbd40, time=29107058565136113) at 
>>>> lib/netdev-dpdk.c:4649
>>>> #2  0x0000000000d4ad24 in trtcm_policer_run (conf=0x2774200, 
>>>> pkts=0x7f3db8005100, pkt_cnt=32, should_steal=true) at 
>>>> lib/netdev-dpdk.c:4691
>>>> #3  0x0000000000d45299 in netdev_dpdk_qos_run (dev=0x17fd68840, 
>>>> pkts=0x7f3db8005100, cnt=32, should_steal=true) at 
>>>> lib/netdev-dpdk.c:2421
>>>> #4  0x0000000000d45db0 in netdev_dpdk_send__ (dev=0x17fd68840, 
>>>> qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at 
>>>> lib/netdev-dpdk.c:2683
>>>> #5  0x0000000000d45ee9 in netdev_dpdk_eth_send (netdev=0x17fd688c0, 
>>>> qid=1, batch=0x7f3db80050f0, concurrent_txq=false) at 
>>>> lib/netdev-dpdk.c:2710
>>>> #6  0x0000000000c342ba in netdev_send (netdev=0x17fd688c0, qid=1, 
>>>> batch=0x7f3db80050f0, concurrent_txq=false) at lib/netdev.c:814
>>>> #7  0x0000000000beb3de in dp_netdev_pmd_flush_output_on_port 
>>>> (pmd=0x7f3dce735010, p=0x7f3db80050c0) at lib/dpif-netdev.c:4224
>>>> #8  0x0000000000beb5c4 in dp_netdev_pmd_flush_output_packets 
>>>> (pmd=0x7f3dce735010, force=false) at lib/dpif-netdev.c:4264
>>>> #9  0x0000000000beb814 in dp_netdev_process_rxq_port 
>>>> (pmd=0x7f3dce735010, rxq=0x328d930, port_no=2) at 
>>>> lib/dpif-netdev.c:4319
>>>> #10 0x0000000000bef432 in pmd_thread_main (f_=0x7f3dce735010) at 
>>>> lib/dpif-netdev.c:5556
>>>> #11 0x0000000000cb24e5 in ovsthread_wrapper (aux_=0x326d220) at 
>>>> lib/ovs-thread.c:383
>>>> #12 0x00007f3de11d236d in start_thread (arg=0x7f3dce734700) at 
>>>> pthread_create.c:456
>>>> #13 0x00007f3de06bab4f in clone () at 
>>>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
>>>
>>> Looks like a divide by zero, but that was fixed and seems to be in 
>>> DPDK v19.11, ebe3a769911071450acb808153ec2a2496726906
>>
>> I've confirmed I'm testing against 19.11.0 and that commit is present.
>>
>>
>>>
>>> So for some reason rte_meter_get_tb_params() might return 0 in 
>>> eir_period. Looking at the code, I would say this could only really 
>>> happen if rte_get_tsc_hz() returns 0, which seems odd… Could this 
>>> happen in your system for some reason?
>>
>> I don't think rte_get_tsc_hz is returning 0, at east the values for 
>> the function call in GDB don't seem to suggest this. Snippet below 
>> from rte_meter_trtcm_rfc4115_color_blind_check that I'm checking with 
>> GDB.
>>
>> rte_meter_trtcm_rfc4115_color_blind_check(struct 
>> rte_meter_trtcm_rfc4115 *m, struct rte_meter_trtcm_rfc4115_profile *p, 
>> uint64_t time, uint32_t pkt_len)
>>
>> {
>>
>>
>>
>>     uint64_t time_diff_tc, time_diff_te, n_periods_tc, n_periods_te,
>>     tc, te;
>>
>>
>>
>>
>>
>>
>>     /* Bucket update */
>>     time_diff_tc = time - m->time_tc;
>>     time_diff_te = time - m->time_te;
>>
>>
>>
>>     n_periods_tc = time_diff_tc / p->cir_period;
>>     n_periods_te = time_diff_te / p->eir_period;
>>
>> Looking at values with GDB gives the following
>>
>> (gdb) p *p
>> $1 = {cbs = 2048, ebs = 2048, cir_period = 44230, cir_bytes_per_period 
>> = 1, eir_period = 44230, eir_bytes_per_period = 1}
>> (gdb) p time_diff_tc
>> $2 = 29137292739849292
>> (gdb) p n_periods_tc
>> $3 = 13937399
>> (gdb) p time
>> $4 = 29137292739849292
>> (gdb) p m->time_tc
>> $5 = 29137292739858117
>> (gdb) p *m
>> $6 = {time_tc = 29137292739858117, time_te = 29137292739858117, tc = 
>> 2048, te = 2048}
>> (gdb) p time_diff_te
>> $7 = 29137292739849289
>> (gdb) p p->eir_period
>> $8 = 44230
>> (gdb) p n_periods_te
>> $9 = 140260075883992
>>
>> I don't have another board to test on at the moment but will try.
> 
> Odd, do not see how this would create a SIGFPE…
> 


More information about the dev mailing list