[ovs-dev] [PATCH 01/12] dpcls: Use 32 packet batches for lookups.

Pravin Shelar pshelar at ovn.org
Wed Oct 12 21:09:31 UTC 2016


On Tue, Oct 11, 2016 at 8:33 AM, Fischetti, Antonio
<antonio.fischetti at intel.com> wrote:
> Comments inline.
>
> Thanks,
> Antonio
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Jarno
>> Rajahalme
>> Sent: Friday, October 7, 2016 10:08 PM
>> To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy at intel.com>
>> Cc: dev at openvswitch.org
>> Subject: Re: [ovs-dev] [PATCH 01/12] dpcls: Use 32 packet batches for
>> lookups.
>>
>>
>> > On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy
>> <bhanuprakash.bodireddy at intel.com> wrote:
>> >
>> > This patch increases the number of packets processed in a batch during a
>> > lookup from 16 to 32. Processing batches of 32 packets improves
>> > performance and also one of the internal loops can be avoided here.
>> >
>>
>> Can you provide some qualification of the performance test conditions? Do
>> you believe the performance difference applies universally?
>>
> [Antonio F] We saw a performance improvement in EMC disabled PHY2PHY loopback testcase with 2 physical ports and few tens of active IXIA streams. With 1 subtable - flow: from port1 to port2 - we got a throughput increment of +150 kpps. With 4 subtables the increment was +80 kpps.
> Also we spent some time profiling OVS with VTune. When comparing stock ovs with the patched one, it’s been observed that for dpcls_lookup(), there is significant reduction in the overall retired instructions (reduced by 729,800,000), CPI rate improved from 0.471 to 0.427 and Front-end bound cycles reduced from 19.1% to 8.5%.
>
>
>> > Signed-off-by: Antonio Fischetti <antonio.fischetti at intel.com>
>> > Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com>
>> > ---
>> > lib/dpif-netdev.c | 109 +++++++++++++++++++++++-------------------------
>> ------
>> > 1 file changed, 46 insertions(+), 63 deletions(-)
>> >
>> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> > index 6e09e44..c002dd3 100644
>> > --- a/lib/dpif-netdev.c
>> > +++ b/lib/dpif-netdev.c
>> > @@ -4975,23 +4975,20 @@ dpcls_lookup(struct dpcls *cls, const struct
>> netdev_flow_key keys[],
>> >              int *num_lookups_p)
>> > {
>> >     /* The received 'cnt' miniflows are the search-keys that will be
>> processed
>> > -     * in batches of 16 elements.  N_MAPS will contain the number of
>> these
>> > -     * 16-elements batches.  i.e. for 'cnt' = 32, N_MAPS will be 2.
>> The batch
>> > -     * size 16 was experimentally found faster than 8 or 32. */
>> > -    typedef uint16_t map_type;
>> > +     * to find a matching entry into the available subtables.
>> > +     * The number of bits in map_type is equal to NETDEV_MAX_BURST. */
>>
>> This needs a build time assertion to catch any future change in
>> NETDEV_MAX_BURST.
>>
>> Preferably, if you can verify that the compiler is able to remove the
>> unnecessary loop in this case, you should consider not removing the extra
>> loop that would kick in only if NETDEV_MAX_BURST becomes larger than 32.
>>
> [Antonio F] Is that ok if we add the following line after the MAP_BITS define?
> BUILD_ASSERT_DECL(MAP_BITS >= NETDEV_MAX_BURST);
>
> It seems NETDEV_MAX_BURST was reduced to 32 from 256 a while ago, and one of the reasons
> being the slow VMs and packet drops at VMs. Can we safely assume that NETDEV_MAX_BURST is
> unlikely to be increased as it has wider consequence on the system stability?
>
>
I think it is safe to assume it for now. Looking at the code I think
it can be easily increased to 64. So it is fine to add the assertion.



More information about the dev mailing list