[ovs-dev] [RFC v2] docs: Describe output packet batching in DPDK guide.

Ilya Maximets i.maximets at samsung.com
Tue Dec 19 12:25:24 UTC 2017


On 19.12.2017 14:56, Jan Scheurich wrote:
> I am OK with Ilya's proposal. 
> 
> The only correction I need to make is that for us 50 us showed good improvements in a "PVP" scenario with iperf3 as kernel application in the guest (not VM-VM).

OK. Thanks. I'll send this version with above correction as a proper [PATCH] soon.
Jan, I also would like to add you as a co-author if you don't mind.

Best regards, Ilya Maximets.

>
> BR, Jan
> 
>> -----Original Message-----
>> From: Stokes, Ian [mailto:ian.stokes at intel.com]
>> Sent: Tuesday, 19 December, 2017 12:40
>> To: Ilya Maximets <i.maximets at samsung.com>; Jan Scheurich <jan.scheurich at ericsson.com>; ovs-dev at openvswitch.org; Bodireddy,
>> Bhanuprakash <bhanuprakash.bodireddy at intel.com>
>> Cc: Heetae Ahn <heetae82.ahn at samsung.com>; Fischetti, Antonio <antonio.fischetti at intel.com>; Eelco Chaudron
>> <echaudro at redhat.com>; Loftus, Ciara <ciara.loftus at intel.com>; Kevin Traynor <ktraynor at redhat.com>
>> Subject: RE: [RFC v2] docs: Describe output packet batching in DPDK guide.
>>
>>> On 19.12.2017 13:39, Stokes, Ian wrote:
>>>>> Hi Ilya,
>>>>>
>>>>> Some more suggestions below to expand a bit on the use cases for
>>>>> tx-flush- interval.
>>>>>
>>>>> BR, Jan
>>>>
>>>> Hi Ilya,
>>>>
>>>> I agree with Jans input here. I've finished validating the output
>>> batching patchset today bit would like to include this documentation also.
>>>>
>>>> Are you planning to re-spin a new version of this patch with the
>>> required changes?
>>>
>>> I have concerns about suggesting exact value of 50 microseconds or even
>>> saying that increasing of 'tx-flush-interval' will increase performance
>>> without stating the exact testing scenario and environment including
>>> hardware.
>>
>> Sure, from case to case this could vary.
>>>
>>> Testing shows that optimal value of 'tx-flush-interval' highly depends on
>>> scenario and possible traffic patterns. For example, tx-flush-interval=50
>>> significantly degrades performance of PVP with bonded HW NICs scenario on
>>> x86.
>>> I didn't finish the full testing, but it also degrades performance of VM-
>>> VM scenario with Linux kernel guests (interrupt based) on my ARMv8 system.
>>>
>>> So, I prefer to avoid saying that this value will increase performance. At
>>> least without full testing scenario description.
>>
>> Sure, I think over time as more testing of different scenarios is completed we can add to this.
>>
>> It probably makes sense to give just a general comment for people getting started with the caveat that users need to experiment to tune
>> for their own needs in specific deployments.
>>
>>>
>>> I'll try to modify Jan's comments according to above concerns.
>>>
>>> What about something like this:
>>> ----------------------
>>> To make advantage of batched transmit functions, OVS collects packets in
>>> intermediate queues before sending when processing a batch of received
>>> packets.
>>> Even if packets are matched by different flows, OVS uses a single send
>>> operation for all packets destined to the same output port.
>>>
>>> Furthermore, OVS is able to buffer packets in these intermediate queues
>>> for a configurable amount of time to reduce the frequency of send bursts
>>> at medium load levels when the packet receive rate is high, but the
>>> receive batch size still very small. This is particularly beneficial for
>>> packets transmitted to VMs using an interrupt-driven virtio driver, where
>>> the interrupt overhead is significant for the OVS PMD, the host operating
>>> system and the guest driver.
>>>
>>> The ``tx-flush-interval`` parameter can be used to specify the time in
>>> microseconds OVS should wait between two send bursts to a given port
>>> (default is ``0``). When the intermediate queue fills up before that time
>>> is over, the buffered packet batch is sent immediately::
>>>
>>>     $ ovs-vsctl set Open_vSwitch . other_config:tx-flush-interval=50
>>>
>>> This parameter influences both throughput and latency, depending on the
>>> traffic load on the port. In general lower values decrease latency while
>>> higher values may be useful to achieve higher throughput.
>>>
>>> Low traffic (``packet rate < 1 / tx-flush-interval``) should not
>>> experience any significant latency or throughput increase as packets are
>>> forwarded immediately.
>>>
>>> At intermediate load levels
>>> (``1 / tx-flush-interval < packet rate < 32 / tx-flush-interval``) traffic
>>> should experience an average latency increase of up to
>>> ``1 / 2 * tx-flush-interval`` and a possible throughput improvement.
>>>
>>> Very high traffic (``packet rate >> 32 / tx-flush-interval``) should
>>> experience the average latency increase equal to ``32 / (2 * packet
>>> rate)``. Most send batches in this case will contain the maximum number of
>>> packets (``32``).
>>>
>>> A ``tx-burst-interval`` value of ``50`` microseconds has shown to provide
>>> a good performance increase in a ``VM-VM`` scenario on x86 system for
>>> interrupt-driven guests while keeping the latency increase at a reasonable
>>> level.
>>>
>>> .. note::
>>>   Throughput impact of this option significantly depends on the scenario
>>> and
>>>   the traffic patterns. For example: ``tx-burst-interval`` value of ``50``
>>>   microseconds shows performance degradation in PVP with bonded PHY
>>> scenario
>>>   while testing with ``256 - 1024`` packet flows:
>>>
>>>     https://mail.openvswitch.org/pipermail/ovs-dev/2017-
>>> December/341700.html
>>>
>>
>> Over all I think above looks good. We'll never be able to detail every test scenario but this gets the general trade off across to a user of
>> latency/throughput in relation to the timing parameter.
>>
>> I'd be happy to take something like above for the initial merge and it could be expanded upon as people test going forward.
>>
>> Ian
>>
>>> ----------------------
>>>
>>>
>>>>
>>>> Thanks
>>>> Ian
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
>>>>>> Sent: Tuesday, 12 December, 2017 14:07
>>>>>> To: ovs-dev at openvswitch.org; Bhanuprakash Bodireddy
>>>>>> <bhanuprakash.bodireddy at intel.com>
>>>>>> Cc: Heetae Ahn <heetae82.ahn at samsung.com>; Antonio Fischetti
>>>>>> <antonio.fischetti at intel.com>; Eelco Chaudron <echaudro at redhat.com>;
>>>>>> Ciara Loftus <ciara.loftus at intel.com>; Kevin Traynor
>>>>>> <ktraynor at redhat.com>; Jan Scheurich <jan.scheurich at ericsson.com>;
>>>>>> Ian Stokes <ian.stokes at intel.com>; Ilya Maximets
>>>>>> <i.maximets at samsung.com>
>>>>>> Subject: [RFC v2] docs: Describe output packet batching in DPDK guide.
>>>>>>
>>>>>> Added information about output packet batching and a way to
>>>>>> configure 'tx-flush-interval'.
>>>>>>
>>>>>> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
>>>>>> ---
>>>>>>
>>>>>> Version 2:
>>>>>> 	* Some grammar/wording corrections. (Eelco Chaudron)
>>>>>>
>>>>>>  Documentation/intro/install/dpdk.rst | 24 ++++++++++++++++++++++++
>>>>>>  1 file changed, 24 insertions(+)
>>>>>>
>>>>>> diff --git a/Documentation/intro/install/dpdk.rst
>>>>>> b/Documentation/intro/install/dpdk.rst
>>>>>> index 3fecb5c..5485dbc 100644
>>>>>> --- a/Documentation/intro/install/dpdk.rst
>>>>>> +++ b/Documentation/intro/install/dpdk.rst
>>>>>> @@ -568,6 +568,30 @@ not needed i.e. jumbo frames are not needed, it
>>>>>> can be forced off by adding  chains of descriptors it will make more
>>>>>> individual virtio descriptors available  for rx to the guest using
>>>>> dpdkvhost ports and this can improve performance.
>>>>>>
>>>>>> +Output Packet Batching
>>>>>> +~~~~~~~~~~~~~~~~~~~~~~
>>>>>> +
>>>>>> +To get advantages of the batched send functions OVS collects
>>>>>> +packets in intermediate queues before sending. This allows using a
>>>>>> +single send for packets matched by different flows but having the
>>>>>> +same output
>>>>> action.
>>>>>> +Furthermore, OVS is able to collect packets for some reasonable
>>>>>> +amount of time before batch sending them which might help when
>>>>>> +input
>>>>> batches are small.
>>>>>
>>>>> To make advantage of batched transmit functions, OVS collects packets
>>>>> in intermediate queues before sending when processing a batch of
>>>>> received packets. Even if packets are matched by different flows, OVS
>>>>> uses a single send operation for all packets destined to the same
>>> output port.
>>>>>
>>>>> Furthermore, OVS is able to buffer packets in these intermediate
>>>>> queues for a configurable amount of time to reduce the frequency of
>>>>> send bursts at medium load levels when the packet receive rate is
>>>>> high, but the receive batch size still very small. This is
>>>>> particularly beneficial for packets transmitted to VMs using an
>>>>> interrupt-driven virtio driver, where the interrupt overhead is
>>>>> significant for the OVS PMD, the host operating system and the guest
>>> driver.
>>>>>
>>>>>> +
>>>>>> +``tx-flush-interval`` config could be used to specify the time in
>>>>>> +microseconds that a packet can wait in an output queue for sending
>>>>> (default is ``0``)::
>>>>>
>>>>> The ``tx-flush-interval`` parameter can be used to specify the time
>>>>> in microseconds OVS should wait between two send bursts to a given
>>>>> port (default is ``0``). When the intermediate queue fills up before
>>>>> that time is over, the buffered packet batch is sent immediately::
>>>>>
>>>>>> +
>>>>>> +    $ ovs-vsctl set Open_vSwitch .
>>>>>> + other_config:tx-flush-interval=50
>>>>>> +
>>>>>> +Lower values decrease latency while higher values may be useful to
>>>>>> +achieve higher performance. For example, increasing of
>>>>>> +``tx-flush-interval`` can be used to decrease the number of
>>>>>> +interrupts
>>>>> for interrupt based guest drivers.
>>>>>> +This may significantly affect the performance. Zero value means
>>>>>> +immediate send at the end of processing a single input batch.
>>>>>
>>>>> This parameter influences both throughput and latency, depending on
>>>>> the traffic load on the port. In general lower values decrease
>>>>> latency while higher values may be useful to achieve higher throughput.
>>>>>
>>>>> Low traffic (packet rate < 1/tx-flush-interval) should not experience
>>>>> any significant latency or throughput increase as packets are
>>>>> forwarded immediately.
>>>>>
>>>>> At intermediate load levels (1/tx-flush-interval < packet rate <
>>>>> 32/tx-
>>>>> flush-interval) traffic should experience an average latency increase
>>>>> of up to 1/2 * tx-flush-interval and a throughput improvement that
>>>>> depends on the average size of send bursts and grows with the traffic
>>> rate.
>>>>>
>>>>> Very high traffic (packet rate >> 32/tx-flush-interval) should
>>>>> experience improved throughput as most send batches contain the
>>>>> maximum number of packets (32). The average latency increase should
>>>>> equal 32/(2 * packet rate).
>>>>>
>>>>> A tx-burst-interval value of 50 microseconds has shown to provide a
>>>>> good performance increase for interrupt-driven guests while keeping
>>>>> the latency increase at a reasonable level.
>>>>>
>>>>>> +
>>>>>> +Average number of packets per output batch could be checked in PMD
>>>>> stats::
>>>>>
>>>>> The average number of packets per output batch can be checked in PMD
>>>>> stats::
>>>>>
>>>>>> +
>>>>>> +    $ ovs-appctl dpif-netdev/pmd-stats-show
>>>>>> +
>>>>>>  Limitations
>>>>>>  ------------
>>>>>>
>>>>>> --
>>>>>> 2.7.4
>>>>
>>>>
>>>>
>>>>


More information about the dev mailing list