[ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and vhostuser ports

Kevin Traynor ktraynor at redhat.com
Tue Jul 2 20:06:36 UTC 2019


On 02/07/2019 13:30, Sriram Vatala wrote:
> Hi Kevin,
> 
> Please consider this as a gentle remainder regarding below mail discussion and 
> let me know your thoughts if any.
> 

Hi Sriram,

I think your suggested approach is the best available at present i.e.
use a more generic name and document the most likely cause.

I thought about something with rte_vhost_avail_entries() but i'm not
sure it's sufficient. You would need to re-write the OVS vhost send
code, and it seems to return 0 for multiple reasons.

thanks,
Kevin.

PS, I am on leave with limited email access, so best to discuss further
with Ian and Ilya if required.

> Thanks & Regards,
> Sriram.
> 
> -----Original Message-----
> From: Sriram Vatala <sriram.v at altencalsoftlabs.com>
> Sent: 26 June 2019 11:01
> To: 'Venkatesan Pradeep' <venkatesan.pradeep at ericsson.com>; 'Kevin Traynor' 
> <ktraynor at redhat.com>; 'ovs-dev at openvswitch.org' <ovs-dev at openvswitch.org>
> Cc: 'Ilya Maximets' <i.maximets at samsung.com>
> Subject: RE: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and 
> vhostuser ports
> 
> Hi,
> 
> Agreed. As per suggestions, I would go a head with generic name for packet
> drops due to transmission failure (i.e packets left un-transmitted in output
> packet buffer argument after call to transmit API ) and since the most likely
> reason for this to happen is when transmit queue  is full or has been filled
> up . So,  i will highlight this reason in the documentation.
> 
> Thanks & Regards,
> Sriram.
> 
> -----Original Message-----
> From: Venkatesan Pradeep <venkatesan.pradeep at ericsson.com>
> Sent: 24 June 2019 12:30
> To: Sriram Vatala <sriram.v at altencalsoftlabs.com>; 'Kevin Traynor'
> <ktraynor at redhat.com>
> Cc: 'Ilya Maximets' <i.maximets at samsung.com>
> Subject: RE: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and
> vhostuser ports
> 
> Hi,
> 
> Since the reason for introducing a new counter to track drops due to tx queue
> being full was to quickly isolate that the problem was on the VM side rather
> than on OVS, having a generic counter and name wouldn't help. One would still
> need to have some knowledge of the code to understand that the likely reason
> for tx failure was the queue being full. Short of having a new API that can
> return the failure code we won't be able to say for sure the drop reason.  If
> we want to use a generic name perhaps we can update the documentation to
> highlight the likely reason.
> 
> Thanks,
> 
> Pradeep
> -----Original Message-----
> From: ovs-dev-bounces at openvswitch.org <ovs-dev-bounces at openvswitch.org> On
> Behalf Of Sriram Vatala via dev
> Sent: Wednesday, June 19, 2019 8:44 PM
> To: 'Kevin Traynor' <ktraynor at redhat.com>; ovs-dev at openvswitch.org
> Cc: 'Ilya Maximets' <i.maximets at samsung.com>
> Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and
> vhostuser ports
> 
> Hi kevin,
> 
> Thanks for your inputs. I will  send the updated patch ASAP.
> 
> Thanks & Regards,
> Sriram.
> 
> -----Original Message-----
> From: Kevin Traynor <ktraynor at redhat.com>
> Sent: 19 June 2019 19:07
> To: Sriram Vatala <sriram.v at altencalsoftlabs.com>; ovs-dev at openvswitch.org
> Cc: 'Ilya Maximets' <i.maximets at samsung.com>
> Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and
> vhostuser ports
> 
> On 19/06/2019 11:07, Sriram Vatala wrote:
>>
>> Hi Kevin,
>>
>> Thanks for reviewing the patch. Please find my answers to your
>> comments below.
>>
>> Comment-1
>> I'm not sure about this name, you would need to know that it was the
>> only reason rte_eth_tx_burst() and rte_vhost_enqueue_burst() will not
>> send pkts.
>> ---
>> Agree that queue full is not the only reason for which the above DPDK
>> apis will fail to send packets. But queue full is the most common reason.
>> Instead of overlooking other reasons like invalid queue-id, i can
>> change the name of the counter to a generic name something like
>> "tx_failed_drops".
>> what do you think?
>>
> 
> Sounds good to me.
> 
>> Comment-2
>> There can be other drops earlier in this function
>> ("__netdev_dpdk_vhost_send"), should they be logged also?
>> ---
>> These are the drops for invalid queue-id or vhost device id and if
>> device is not up.
>> Again these are very unlikely to happen, but i can add a rate limiting
>> warn log.
>>
> 
> Well I guess it doesn't invalidate your patches which provides drop stats for
> mtu/qos/qfull(or new name), so maybe it's ok to not log those other drops in
> the context of this patchset.
> 
>> Comment-3
>>>      rte_spinlock_lock(&dev->stats_lock);
>>>      netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts,
>>> total_pkts,
>>> +                                         dropped);
>>> +    dev->stats.tx_mtu_drops += mtu_drops;
>>
>> There is a function just above for updating vhost tx stats. Now the
>> updates are split with some updated in the function and some updated
>> here, it would be better to update them all in the same place.
>> ---
>> Agree. will change the implementation here.
>>
>> Comment-4
>>>
>>> +    dropped += mtu_drops + qos_drops + qfull_drops;
>>
>> = is enough, you don't need +=
>> ---
>> Actually in the code above to this part in "dpdk_do_tx_copy" function,
>> dropped will updated if mbuf alloc fails.
>> Here is the code snippet:
>>
>>
>>         pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
>>         if (OVS_UNLIKELY(!pkts[txcnt])) {
>>             dropped += cnt - i;
>>             break;
>>         }
>>
>> To take care not to miss this in total drops, i am using dropped+=,
>> Even if the above part of the code doesn't hit, as dropped variable is
>> initialized to Zero that expression should not result in incorrect
>> value for drops.
>>
> 
> Yes, you're right, I missed it, thanks.
> 
>>
>> Comment-5
>>
>>> +    <command label="ovs-vsctl-get-interface-statistics"
>>> +
>>> filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-iface-st
>>> ats
>>> +    </command>
>>
>> Probably better to be consistent and not wrap this line
>> ---
>> Ok. I wrapped this line as utilities/checkpatch script is giving
>> warning as the line length is exceeding 79 characters. I will unwrap this.
>>
>>
>>
>> Comment-6
>>
>>> I think you also need to update vswitchd.xml with some docs for these
>>> stats
>>>
>>
>> Actually, it doesn't seem needed there, perhaps something in the dpdk
>> docs
>>
>> ---
>> Bit unclear on where to document this new counters. As we have not
>> done any modifications to dpdk APIs, can i document these new counters
>> in man page of ovs-vsctl.
>> what do you think?
>>
> 
> I'm exactly sure where either, I was thinking about somewhere in one of the
> .rst's in the /Documentation directory. Maybe it's not strictly necessary for
> all stats, but when there's small explanations in the commit message for these
> ones, seems like they might be helpful in the docs.
> 
> thanks,
> Kevin.
> 
>> Thanks & Regards,
>> Sriram.
>>
>> -----Original Message-----
>> From: Kevin Traynor <ktraynor at redhat.com>
>> Sent: 19 June 2019 01:09
>> To: Sriram Vatala <sriram.v at altencalsoftlabs.com>;
>> ovs-dev at openvswitch.org
>> Cc: Ilya Maximets <i.maximets at samsung.com>
>> Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per
>> dpdk and vhostuser ports
>>
>> On 18/06/2019 15:05, Kevin Traynor wrote:
>>> Hi Sriram,
>>>
>>> On 14/06/2019 14:38, Sriram Vatala via dev wrote:
>>>> OVS may be unable to transmit packets for multiple reasons and today
>>>> there is a single counter to track packets dropped due to any of
>>>> those reasons. The most common reason is that a VM is unable to read
>>>> packets fast enough causing the vhostuser port transmit queue on the
>>>> OVS side to become full. This manifests as a problem with VNFs not
>>>> receiving all packets. Having a separate drop counter to track
>>>> packets dropped because the transmit queue is full will clearly
>>>> indicate that the problem is on the VM side and not in OVS.
>>>> Similarly maintaining separate counters for all possible drops helps
>>>> in indicating sensible cause for packet drops.
>>>>
>>>> This patch adds counters to track packets dropped due to all
>>>> possible reasons and these counters are displayed along with other
>>>> stats in "ovs-vsctl get interface <iface> statistics"
>>>> command. The detailed stats will be available for both dpdk and
>>>> vhostuser ports.
>>>>
>>>> Following are the details of the new counters :
>>>>
>>>> tx_qfull_drops : These are the packets dropped due to transmit queue
>>>> overrun.
>>>>
>>>
>>> I'm not sure about this name, you would need to know that it was the
>>> only reason rte_eth_tx_burst() and rte_vhost_enqueue_burst() will not
>>> send pkts
>>>
>>>> tx_mtu_exceeded_drops : These are the packets dropped due to MTU
>>>> mismatch (i.e Pkt len > Max Dev MTU).
>>>>
>>>> tx_qos_drops/rx_qos_drops : These are the packets dropped due to
>>>> transmission/reception rate exceeding the configured Egress/Ingress
>>>> policer rate on the interface.
>>>>
>>>
>>> I think you also need to update vswitchd.xml with some docs for these
>>> stats
>>>
>>
>> Actually, it doesn't seem needed there, perhaps something in the dpdk
>> docs
>>
>> <snip>
>>
>>>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index
>>>> 0702cc6..2cb5558 100644
>>>> --- a/vswitchd/bridge.c
>>>> +++ b/vswitchd/bridge.c
>>>> @@ -2407,7 +2407,11 @@ iface_refresh_stats(struct iface *iface)
>>>>      IFACE_STAT(rx_undersized_errors,    "rx_undersized_errors")     \
>>>>      IFACE_STAT(rx_oversize_errors,      "rx_oversize_errors")       \
>>>>      IFACE_STAT(rx_fragmented_errors,    "rx_fragmented_errors")     \
>>>> -    IFACE_STAT(rx_jabber_errors,        "rx_jabber_errors")
>>>> +    IFACE_STAT(rx_jabber_errors,        "rx_jabber_errors")         \
>>>> +    IFACE_STAT(rx_qos_drops,            "rx_qos_drops")             \
>>>> +    IFACE_STAT(tx_qfull_drops,          "tx_qfull_drops")           \
>>>> +    IFACE_STAT(tx_qos_drops,            "tx_qos_drops")             \
>>>> +    IFACE_STAT(tx_mtu_drops,            "tx_mtu_exceeded_drops")
>>>>
>>>>  #define IFACE_STAT(MEMBER, NAME) + 1
>>>>      enum { N_IFACE_STATS = IFACE_STATS };
>>>>
>>>
>>> What about updating show_dpif() to print with ovs-appctl dpctl/show -s ?
>>>
>>
>> scratch that comment, I since saw the discussion on v1
>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> https://protect2.fireeye.com/url?k=94a18803-c82b5d25-94a1c898-86aae65
>>> 64056-0e42d3a83a0fa697&q=1&u=https%3A%2F%2Fmail.openvswitch.org%2Fmai
>>> lman%2Flistinfo%2Fovs-dev
>>>



More information about the dev mailing list