[ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD statistic.

Flavio Leitner fbl at sysclose.org
Thu Jul 15 18:58:28 UTC 2021


On Thu, Jul 15, 2021 at 01:39:04PM +0000, Ferriter, Cian wrote:
> 
> 
> > -----Original Message-----
> > From: Flavio Leitner <fbl at sysclose.org>
> > Sent: Friday 9 July 2021 18:54
> > To: Ferriter, Cian <cian.ferriter at intel.com>
> > Cc: ovs-dev at openvswitch.org; i.maximets at ovn.org
> > Subject: Re: [ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD statistic.
> > 
> > 
> > 
> > Hi,
> > 
> > After rebasing, the performance of branch master boosted in my env
> > from 12Mpps to 13Mpps. However, this specific patch brings down
> > to 12Mpps. I am using dpif_scalar and generic lookup (no AVX512).
> > 
> 
> Thanks for the investigation. Always great seeing perf numbers and details!
> 
> I just want to check my understanding here with what you're seeing:
> 
> Performance before DPIF patchset
> 12Mpps
> 
> Performance at this patch
> 12Mpps
> 
> Performance after DPIF patchset
> 13Mpps
> 
> So the performance recovers somewhere else in the patchset?


Interesting, which flags are you passing to build OVS?

Thanks for following up!
fbl


> 
> I've checked the performance behaviour in my case. I'm going to report relative performance numbers. They are relative to master branch before AVX512 DPIF was applied (c36c8e3).
> I tried to run a similar testcase, I can see you are using EMC from the memcmp in perf top output. I am also using the scalar DPIF in all the below testcases.
> 
> Master before AVX512 DPIF (c36c8e3)
> 1.000x (0.0%)
> DPIF patch 3 - dpif-avx512: Add ISA implementation of dpif.
> 1.010x (1.0%)
> DPIF patch 4 - dpif-netdev: Add command to switch dpif implementation.
> 1.042x (4.2%)
> DPIF patch 5 - dpif-netdev: Add command to get dpif implementations.
> 1.063x (6.3%)
> DPIF patch 6 - dpif-netdev: Add a partial HWOL PMD statistic.
> 1.069x (6.9%)
> Latest master which has AVX512 DPIF patches (d2e9703)
> 1.075x (7.5%)
> Master before AVX512 DPIF (c36c8e3), with prefetch change
> 0.983x (-1.7%)
> Latest master which has AVX512 DPIF patches (d2e9703), with prefetch change
> 1.080x (8.0%)
> 
> > (I don't think this report should block the patch because the
> > counter are interesting and the analysis below doesn't point
> > directly to the proposed changes.)
> > 
> > This is a diff using all patches applied versus this patch reverted:
> >     21.44%     +6.08%  ovs-vswitchd        [.] miniflow_extract
> >      8.94%     -1.92%  libc-2.28.so        [.] __memcmp_avx2_movbe
> >     14.62%     +1.44%  ovs-vswitchd        [.] dp_netdev_input__
> >      2.80%     -1.08%  ovs-vswitchd        [.] dp_netdev_pmd_flush_output_on_port
> >      3.44%     -0.91%  ovs-vswitchd        [.] netdev_send
> > 
> > This is the code side by side, patch applied on the right side:
> > (sorry, long lines)
> > 
> 
> My mail client has wrapped the below lines, sorry for mangling the output!
> 
> <snip mangled perf diff output>
> Please find it here:
> https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385448.html
> 
> > 
> > 
> > I don't see any relevant optimization difference in the code
> > above, but the "mov %r15w,-0x2(%r13)" on the right side accounts
> > for almost all the difference, though on the left side it seems
> > a bit more spread.
> > 
> > I applied the patch below and it helped to get to 12.7Mpps, so
> > almost at the same levels. I wonder if you see the same result.
> > 
> 
> Since I don't see the drop that you see with this patch, when I apply the below patch to the latest master, I see a smaller benefit.
> The relative performance after adding the below prefetch compared to before (latest master):
> 1.005x (0.5%)
> 
> When I compare before/after performance (including the prefetch code, on latest master), the overall performance difference is 0.5% here.
> 
> > diff --git a/lib/flow.c b/lib/flow.c
> > index 729d59b1b..4572e356b 100644
> > --- a/lib/flow.c
> > +++ b/lib/flow.c
> > @@ -746,6 +746,9 @@ miniflow_extract(struct dp_packet *packet, struct miniflow *dst)
> >      uint8_t *ct_nw_proto_p = NULL;
> >      ovs_be16 ct_tp_src = 0, ct_tp_dst = 0;
> > 
> > +    /* dltype will be updated later. */
> > +    OVS_PREFETCH_WRITE(miniflow_pointer(mf, dl_type));
> > +
> >      /* Metadata. */
> >      if (flow_tnl_dst_is_set(&md->tunnel)) {
> >          miniflow_push_words(mf, tunnel, &md->tunnel,
> > 
> > 
> > fbl
> > 
> 
> <snip actual patch away>
> 
> Thanks,
> Cian

-- 
fbl


More information about the dev mailing list