[ovs-dev] [PATCH] datapath: Fix double counting of packet stats for Linux devices.

Jesse Gross jesse at nicira.com
Wed Jan 5 22:12:19 UTC 2011


On Wed, Jan 5, 2011 at 3:53 PM, Ben Pfaff <blp at nicira.com> wrote:
> On Wed, Jan 05, 2011 at 03:38:52PM -0500, Jesse Gross wrote:
>> On Wed, Jan 5, 2011 at 12:48 PM, Ben Pfaff <blp at nicira.com> wrote:
>> > On Wed, Jan 05, 2011 at 08:33:07AM -0800, Jesse Gross wrote:
>> >> The kernel augments stats for Linux devices that only provide 32-bit stats
>> >> with its own internal 64-bit counters.  When doing this it takes the error
>> >> stats from the device but uses the packet and byte values from its local
>> >> counters.  However, we were also taking the packet and byte counts from
>> >> the device, leading to double counting.
>> >>
>> >> Problem introduced by commit ec61a01cd8ed73b13ffe042ddff4baf41f6b63e7
>> >> 'datapath: Use "struct rtnl_link_stats64" instead of "struct odp_vport_stats".'.
>> >>
>> >> Bug #4327
>> >>
>> >> Reported-by: Krishna Miriyala <krishna at nicira.com>
>> >> Signed-off-by: Jesse Gross <jesse at nicira.com>
>> >
>> > I see the problem but is this the correct solution?  Arguably
>> > netdev_get_stats() should zero those counters in the 32-bit case before
>> > returns.
>>
>> I think this is the right place to do it.  I'd like the vport
>> implementation get_stats() function to return the correct stats to the
>> best of it's ability.  How the generic layer chooses to combine the
>> various pieces of information available to it is a separate matter
>> that the individual vport implementation doesn't have much insight
>> into.
>
> It's hard for me to generalize on the basis of a single example.  I'll
> take your word for it.
>
>> One thing that I've been thinking about is whether it is better to use
>> the generic stats in all cases.  This would remove any inconsistencies
>> about between different vport or device types or when running on
>> 64-bit vs 32-bit machines and would always provide an accounting of
>> the packets that actually flowed through the switch.  It would have a
>> couple of side effects though:
>> * Linux device stats would start from the time that they were attached
>> to the switch, instead of the total since reboot.
>> * We wouldn't get hardware error stats.
>
> I like this idea.  It would be much cleaner.  We would also gain
> consistent counting of bytes, instead of driver-dependent
> interpretations.  My guess is that it would be more acceptable to
> upstream, too, because it would less obviously overlap what network
> drivers already count.
>
> I don't know whether it's worthwhile but we could also reduce the size
> of the stats structure by switching back from rtnl_link_stats64 to
> something smaller.
>
> Userspace could always compensate for the two side effects that you
> mention, by separately querying the underlying network device, if we
> decide that those effects are undesirable.

OK, I'll do this when I have a chance.  I'm going to push this patch
for the moment though as a temporary fix.

The other thing that we really need to get rid of is the offset stats.
 Do you know if XenServer is even still using  the fake bond devices?




More information about the dev mailing list