[ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and vhostuser ports

Sriram Vatala sriram.v at altencalsoftlabs.com
Tue Jul 2 12:30:15 UTC 2019


Hi Kevin,

Please consider this as a gentle remainder regarding below mail discussion and 
let me know your thoughts if any.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Sriram Vatala <sriram.v at altencalsoftlabs.com>
Sent: 26 June 2019 11:01
To: 'Venkatesan Pradeep' <venkatesan.pradeep at ericsson.com>; 'Kevin Traynor' 
<ktraynor at redhat.com>; 'ovs-dev at openvswitch.org' <ovs-dev at openvswitch.org>
Cc: 'Ilya Maximets' <i.maximets at samsung.com>
Subject: RE: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and 
vhostuser ports

Hi,

Agreed. As per suggestions, I would go a head with generic name for packet
drops due to transmission failure (i.e packets left un-transmitted in output
packet buffer argument after call to transmit API ) and since the most likely
reason for this to happen is when transmit queue  is full or has been filled
up . So,  i will highlight this reason in the documentation.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Venkatesan Pradeep <venkatesan.pradeep at ericsson.com>
Sent: 24 June 2019 12:30
To: Sriram Vatala <sriram.v at altencalsoftlabs.com>; 'Kevin Traynor'
<ktraynor at redhat.com>
Cc: 'Ilya Maximets' <i.maximets at samsung.com>
Subject: RE: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and
vhostuser ports

Hi,

Since the reason for introducing a new counter to track drops due to tx queue
being full was to quickly isolate that the problem was on the VM side rather
than on OVS, having a generic counter and name wouldn't help. One would still
need to have some knowledge of the code to understand that the likely reason
for tx failure was the queue being full. Short of having a new API that can
return the failure code we won't be able to say for sure the drop reason.  If
we want to use a generic name perhaps we can update the documentation to
highlight the likely reason.

Thanks,

Pradeep
-----Original Message-----
From: ovs-dev-bounces at openvswitch.org <ovs-dev-bounces at openvswitch.org> On
Behalf Of Sriram Vatala via dev
Sent: Wednesday, June 19, 2019 8:44 PM
To: 'Kevin Traynor' <ktraynor at redhat.com>; ovs-dev at openvswitch.org
Cc: 'Ilya Maximets' <i.maximets at samsung.com>
Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and
vhostuser ports

Hi kevin,

Thanks for your inputs. I will  send the updated patch ASAP.

Thanks & Regards,
Sriram.

-----Original Message-----
From: Kevin Traynor <ktraynor at redhat.com>
Sent: 19 June 2019 19:07
To: Sriram Vatala <sriram.v at altencalsoftlabs.com>; ovs-dev at openvswitch.org
Cc: 'Ilya Maximets' <i.maximets at samsung.com>
Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per dpdk and
vhostuser ports

On 19/06/2019 11:07, Sriram Vatala wrote:
>
> Hi Kevin,
>
> Thanks for reviewing the patch. Please find my answers to your
> comments below.
>
> Comment-1
> I'm not sure about this name, you would need to know that it was the
> only reason rte_eth_tx_burst() and rte_vhost_enqueue_burst() will not
> send pkts.
> ---
> Agree that queue full is not the only reason for which the above DPDK
> apis will fail to send packets. But queue full is the most common reason.
> Instead of overlooking other reasons like invalid queue-id, i can
> change the name of the counter to a generic name something like
> "tx_failed_drops".
> what do you think?
>

Sounds good to me.

> Comment-2
> There can be other drops earlier in this function
> ("__netdev_dpdk_vhost_send"), should they be logged also?
> ---
> These are the drops for invalid queue-id or vhost device id and if
> device is not up.
> Again these are very unlikely to happen, but i can add a rate limiting
> warn log.
>

Well I guess it doesn't invalidate your patches which provides drop stats for
mtu/qos/qfull(or new name), so maybe it's ok to not log those other drops in
the context of this patchset.

> Comment-3
>>      rte_spinlock_lock(&dev->stats_lock);
>>      netdev_dpdk_vhost_update_tx_counters(&dev->stats, pkts,
>> total_pkts,
>> +                                         dropped);
>> +    dev->stats.tx_mtu_drops += mtu_drops;
>
> There is a function just above for updating vhost tx stats. Now the
> updates are split with some updated in the function and some updated
> here, it would be better to update them all in the same place.
> ---
> Agree. will change the implementation here.
>
> Comment-4
>>
>> +    dropped += mtu_drops + qos_drops + qfull_drops;
>
> = is enough, you don't need +=
> ---
> Actually in the code above to this part in "dpdk_do_tx_copy" function,
> dropped will updated if mbuf alloc fails.
> Here is the code snippet:
>
>
>         pkts[txcnt] = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
>         if (OVS_UNLIKELY(!pkts[txcnt])) {
>             dropped += cnt - i;
>             break;
>         }
>
> To take care not to miss this in total drops, i am using dropped+=,
> Even if the above part of the code doesn't hit, as dropped variable is
> initialized to Zero that expression should not result in incorrect
> value for drops.
>

Yes, you're right, I missed it, thanks.

>
> Comment-5
>
>> +    <command label="ovs-vsctl-get-interface-statistics"
>> +
>> filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-get-iface-st
>> ats
>> +    </command>
>
> Probably better to be consistent and not wrap this line
> ---
> Ok. I wrapped this line as utilities/checkpatch script is giving
> warning as the line length is exceeding 79 characters. I will unwrap this.
>
>
>
> Comment-6
>
>> I think you also need to update vswitchd.xml with some docs for these
>> stats
>>
>
> Actually, it doesn't seem needed there, perhaps something in the dpdk
> docs
>
> ---
> Bit unclear on where to document this new counters. As we have not
> done any modifications to dpdk APIs, can i document these new counters
> in man page of ovs-vsctl.
> what do you think?
>

I'm exactly sure where either, I was thinking about somewhere in one of the
.rst's in the /Documentation directory. Maybe it's not strictly necessary for
all stats, but when there's small explanations in the commit message for these
ones, seems like they might be helpful in the docs.

thanks,
Kevin.

> Thanks & Regards,
> Sriram.
>
> -----Original Message-----
> From: Kevin Traynor <ktraynor at redhat.com>
> Sent: 19 June 2019 01:09
> To: Sriram Vatala <sriram.v at altencalsoftlabs.com>;
> ovs-dev at openvswitch.org
> Cc: Ilya Maximets <i.maximets at samsung.com>
> Subject: Re: [ovs-dev] [PATCH v2] Detailed packet drop statistics per
> dpdk and vhostuser ports
>
> On 18/06/2019 15:05, Kevin Traynor wrote:
>> Hi Sriram,
>>
>> On 14/06/2019 14:38, Sriram Vatala via dev wrote:
>>> OVS may be unable to transmit packets for multiple reasons and today
>>> there is a single counter to track packets dropped due to any of
>>> those reasons. The most common reason is that a VM is unable to read
>>> packets fast enough causing the vhostuser port transmit queue on the
>>> OVS side to become full. This manifests as a problem with VNFs not
>>> receiving all packets. Having a separate drop counter to track
>>> packets dropped because the transmit queue is full will clearly
>>> indicate that the problem is on the VM side and not in OVS.
>>> Similarly maintaining separate counters for all possible drops helps
>>> in indicating sensible cause for packet drops.
>>>
>>> This patch adds counters to track packets dropped due to all
>>> possible reasons and these counters are displayed along with other
>>> stats in "ovs-vsctl get interface <iface> statistics"
>>> command. The detailed stats will be available for both dpdk and
>>> vhostuser ports.
>>>
>>> Following are the details of the new counters :
>>>
>>> tx_qfull_drops : These are the packets dropped due to transmit queue
>>> overrun.
>>>
>>
>> I'm not sure about this name, you would need to know that it was the
>> only reason rte_eth_tx_burst() and rte_vhost_enqueue_burst() will not
>> send pkts
>>
>>> tx_mtu_exceeded_drops : These are the packets dropped due to MTU
>>> mismatch (i.e Pkt len > Max Dev MTU).
>>>
>>> tx_qos_drops/rx_qos_drops : These are the packets dropped due to
>>> transmission/reception rate exceeding the configured Egress/Ingress
>>> policer rate on the interface.
>>>
>>
>> I think you also need to update vswitchd.xml with some docs for these
>> stats
>>
>
> Actually, it doesn't seem needed there, perhaps something in the dpdk
> docs
>
> <snip>
>
>>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index
>>> 0702cc6..2cb5558 100644
>>> --- a/vswitchd/bridge.c
>>> +++ b/vswitchd/bridge.c
>>> @@ -2407,7 +2407,11 @@ iface_refresh_stats(struct iface *iface)
>>>      IFACE_STAT(rx_undersized_errors,    "rx_undersized_errors")     \
>>>      IFACE_STAT(rx_oversize_errors,      "rx_oversize_errors")       \
>>>      IFACE_STAT(rx_fragmented_errors,    "rx_fragmented_errors")     \
>>> -    IFACE_STAT(rx_jabber_errors,        "rx_jabber_errors")
>>> +    IFACE_STAT(rx_jabber_errors,        "rx_jabber_errors")         \
>>> +    IFACE_STAT(rx_qos_drops,            "rx_qos_drops")             \
>>> +    IFACE_STAT(tx_qfull_drops,          "tx_qfull_drops")           \
>>> +    IFACE_STAT(tx_qos_drops,            "tx_qos_drops")             \
>>> +    IFACE_STAT(tx_mtu_drops,            "tx_mtu_exceeded_drops")
>>>
>>>  #define IFACE_STAT(MEMBER, NAME) + 1
>>>      enum { N_IFACE_STATS = IFACE_STATS };
>>>
>>
>> What about updating show_dpif() to print with ovs-appctl dpctl/show -s ?
>>
>
> scratch that comment, I since saw the discussion on v1
>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://protect2.fireeye.com/url?k=94a18803-c82b5d25-94a1c898-86aae65
>> 64056-0e42d3a83a0fa697&q=1&u=https%3A%2F%2Fmail.openvswitch.org%2Fmai
>> lman%2Flistinfo%2Fovs-dev
>>


More information about the dev mailing list