[ovs-dev] [PATCH RFC v3 4/4] dpif-netdev: Time based output batching.

Wed Aug 30 16:12:54 UTC 2017

Yes we do, sorry for the delay!

We are actively testing the performance of the patch series' with iperf as simulation of kernel app and pktken as DPDK app in the guest:

a) OVS master
b) Ilya's patches w/o time-based batching
c) Ilya's time-based batching using CLOCK_MONOTONIC
d) Our TSC-based time-based batching alternative.

We will publish our detailed results to the mailing list tomorrow.

Regards, Jan

> -----Original Message-----
> From: Darrell Ball [mailto:dball at vmware.com]
> Sent: Wednesday, 30 August, 2017 00:46
> To: Jan Scheurich <jan.scheurich at ericsson.com>; Ilya Maximets
> <i.maximets at samsung.com>; ovs-dev at openvswitch.org; Bhanuprakash
> Bodireddy <bhanuprakash.bodireddy at intel.com>
> Cc: Heetae Ahn <heetae82.ahn at samsung.com>
> Subject: Re: [ovs-dev] [PATCH RFC v3 4/4] dpif-netdev: Time based output
> batching.
> 
> Hi Jan/Ilya/Bhanu
> 
> Just wondering if we are still pursuing this patch series ?
> 
> Thanks Darrell
> 
> 
> 
> 
> On 8/14/17, 8:33 AM, "ovs-dev-bounces at openvswitch.org on behalf of Jan
> Scheurich" <ovs-dev-bounces at openvswitch.org on behalf of
> jan.scheurich at ericsson.com> wrote:
> 
>     > > We have tested the effect of turbo mode on TSC and there is none.
> The
>     > TSC frequency remains at the nominal clock speed, no matter if the
> core is
>     > clocked down or up. So, I believe for PMD threads (where performance
>     > matters) TSC would be an adequate and efficient clock.
>     >
>     > It's highly platform dependent and testing on a few systems doesn't
>     > guarantee anything.
>     > From the other hand POSIX guarantee the monotonic characteristics
> for
>     > CLOCK_MONOTONIC.
> 
>     TSC is also monotonic on a given core. Does CLOCK_MONOTONIC
> guarantee any better accuracy than TSC for PMD threads?
> 
>     > > On PMDs I am a bit concerned about the overhead/latency
> introduced
>     > with the clock_gettime() system call, but I haven't done any
> measurements
>     > to check the actual impact. Have you?
>     >
>     > Have you seen my incremental patches?
>     > There is no overhead, because we're just replacing 'time_msec' with
>     > 'time_usec'.
>     > No difference except converting timespec to usec instead of msec.
> 
>     I did look at you incremental patches and we will test their performance.
> I was concerned about the system call cost on master already before.
> Perhaps I'm paranoid, but I would like to double check by testing.
> 
>     > > If we go for CLOCK_MONOTONIC in microsecond resolution, we
> should
>     > make sure that the clock is read not more than once once every
> iteration
>     > (and cache the us value as now in the pmd ctx struct as suggested in
> your
>     > other patch). But then for consistency also the XPS feature should use
> the
>     > PMD time in us resolution.
>     >
>     > Again, please, look at my incremental patches.
> 
>     As far as I could see you did, for example, not consistently adapt tx_port-
> >last_used to microsecond resolution.
> 
>     > > For non-PMD thread we could actually skip time-based output
> batching
>     > completely. The packet rates and the frequency of calls to
>     > dpif_netdev_run() in the main ovs-vswitchd thread are so low that
> time-
>     > based flushing doesn't seem to make much sense.
> 
>     Have you considered this option?
> 
>     > >
>     > > Below you can find an alternative incremental patch on top of your
> RFC
>     > 4/4 that uses TSC on PMD. We will be comparing the two alternatives
> for
>     > performance both with non-PMD guests (iperf3) as well as PMD guests
>     > (DPDK testpmd).
>     >
>     > In your version you need to move all the output_batching related code
>     > under #ifdef DPDK_NETDEV because it will brake userspace
> networking if
>     > compiled without dpdk and output-max-latency != 0.
> 
>     Not sure. Batching should implicitly be disabled because
> cycles_counter() and cycles_per_microsecond() would both return zero.
> But I agree that would be fairly cryptic design. If we used TSC in PMDs we
> should explicitly not do time-based tx batching on the non-PMD thread.
> 
>     Anyway, if the cost of the clock_gettime() system call proves
> insignificant and our performance tests comparing our TSC-based with
> your CLOCK_MONOTONIC-based implementation show equivalent results,
> we can go for your approach.
> 
>     BR, Jan
>     _______________________________________________
>     dev mailing list
>     dev at openvswitch.org
>     https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__mail.openvswitch.org_mailman_listinfo_ovs-
> 2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-
> uZnsw&m=YSJX1FJ-09MF851q3vAIW-9-
> 2W4nZruCOdyvxUMB9vE&s=QUvdTK7m_i90FSk4aRxMN7d_1TSc46NZK9zf
> V3dI3Cc&e=
>