[ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during packet transmission.
Bodireddy, Bhanuprakash
bhanuprakash.bodireddy at intel.com
Tue Dec 20 12:19:53 UTC 2016
>-----Original Message-----
>From: Ilya Maximets [mailto:i.maximets at samsung.com]
>Sent: Tuesday, December 20, 2016 8:09 AM
>To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy at intel.com>; Aaron
>Conole <aconole at redhat.com>
>Cc: dev at openvswitch.org; Daniele Di Proietto <diproiettod at vmware.com>;
>Thadeu Lima de Souza Cascardo <cascardo at redhat.com>; Fischetti, Antonio
><antonio.fischetti at intel.com>; Markus Magnusson
><markus.magnusson at ericsson.com>
>Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during
>packet transmission.
>
>On 19.12.2016 21:05, Bodireddy, Bhanuprakash wrote:
>> Thanks Ilya and Aaron for reviewing this patch and providing your
>comments, my reply inline.
>>
>>> -----Original Message-----
>>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
>>> Sent: Monday, December 19, 2016 8:41 AM
>>> To: Aaron Conole <aconole at redhat.com>; Bodireddy, Bhanuprakash
>>> <bhanuprakash.bodireddy at intel.com>
>>> Cc: dev at openvswitch.org; Daniele Di Proietto
>>> <diproiettod at vmware.com>; Thadeu Lima de Souza Cascardo
>>> <cascardo at redhat.com>; Fischetti, Antonio
>>> <antonio.fischetti at intel.com>; Markus Magnusson
>>> <markus.magnusson at ericsson.com>
>>> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue
>>> during packet transmission.
>>>
>>> Hi,
>>>
>>> I didn't test this patch yet. Bhanu, could you please describe your
>>> test scenario and performance results in more details.
>>
>> During the recent performance analysis improvements for classifier, we
>found that bottleneck was also observed at flow batching.
>> This was due to fewer packets in batch. To reproduce this, a simple P2P test
>case can be used with 30 IXIA streams and matching IP flow rules.
>> Eg: For an IXIA stream with src Ip: 2.2.2.1, dst tip: 5.5.5.1
>> ovs-ofctl add-flow br0
>> dl_type=0x0800,nw_src=2.2.2.1,actions=output:2
>>
>> For an IXIA stream with src Ip: 4.4.4.1, dst tip: 15.15.15.1
>> ovs-ofctl add-flow br0
>> dl_type=0x0800,nw_src=4.4.4.1,actions=output:2
>>
>> This leaves fewer packets in batches and packet_batch_per_flow_execute()
>shall be invoked for every batch.
>> With 30 flows, I see the throughput drops to ~7.0 Mpps in PHY2PHY case for
>64 byte udp packets.
>>
>>> It'll be nice if you provide throughput and latency measurement
>>> results for different scenarios and packet sizes. Latency is important here.
>> We are yet to do latency measurements in this case. With 30 IXIA
>> streams comprising of 64 byte udp packets there was an throughput
>> improvement of 30% in P2P case and 13-15% in PVP case(single queue). we
>will try to get the latency stats with and without this patch.
>>
>>>
>>> About the patch itself:
>>>
>>> 1. 'drain' called only for PMD threads. This will lead to
>>> broken connection with non-PMD ports.
>> I see that non-PMD ports are handled with vswitchd thread.
>> Tested PVP loopback case with tap ports and found to be working as
>> expected. Can you let me know the specific case you are referring here so
>that I can verify if the patch breaks it.
>
>I meant something like this:
>
>
> *-----------HOST-1(br-int)-----* *---------HOST-2(br-int)------*
> | | | |
> | internal_port <--> dpdk0 <-------------------> dpdk0 <--> internal_port |
> | 192.168.0.1/24 | | 192.168.0.2/24 |
> *------------------------------* *-----------------------------*
>
> (HOST-1)# ping 192.168.0.2
>
>In this case I'm expecting that first (NETDEV_MAX_BURST - 1) icmp packets
>will stuck in TX queue. The next packet will cause sending the whole batch of
>NETDEV_MAX_BURST icmp packets. That is not good behaviour.
Thanks for test case. I tested this and found to be working with the patch.
The reason being, PMD threads uses intermediate queue implementation by invoking 'netdev_dpdk_eth_tx_queue()',
Whereas the non-pmd thread(vswitchd) continues to call netdev_dpdk_eth_tx_burst() in the patch and no change in previous behavior here.
I had setup the case similar to one mentioned in above diagram using 2 hosts and assigning IP addresses to the respective bridges.
When I ping the Host2(192.168.0.2) from Host 1(192.168.0.1) below is the call path.
(HOST-1)# ping 192.168.0.2 -c 1 (send only one packet)
When sending ICMP echo request packet:
Vswitchd thread: dp_execute_cb()
netdev_send() [ pkt sent on dpdk0, qid: 0]
netdev_dpdk_send__()
dpdk_do_tx_copy()
netdev_dpdk_eth_tx_burst().
On receiving ICMP echo reply packet:
PMD thread: dp_execute_cb()
netdev_send() [pkt sent on br0, qid: 1]
netdev_linux_send()
Regards,
Bhanu Prakash.
>
>>> 2. 'xps_get_tx_qid()' called twice. First time on send and
>>> the second time on drain. This may lead to different
>>> returned 'tx_qid's and packets will stuck forever in
>>> tx buffer.
>>
>> You are right and xps_get_tx_qid() can return different tx_qids.
>> I was always testing for XPS disabled cases and completely overlooked this
>case.
>> This could be a potential problem for us and any suggestions should be very
>helpful.
>>
>>>
>>> 3. 'txq_drain()' must take the 'tx_lock' for queue in case
>>> of dynamic tx queues.
>>
>> Agree, will handle this in next version.
>>
>>>
>>> 4. Waiting for 1024 polling cycles of PMD thread may cause
>>> a huge latency if we have few packets per second on one
>>> port and intensive traffic on others.
>>
>> I agree with you. We discussed this here and thought invoking the
>> drain logic once every 1024 cycles is an optimal solution. But if the
>> community thinks otherwise we can move this in to the main 'for' loop
>> so that it can be invoked more often provided that 'DRAIN_TSC' cycles
>elapsed.
>>
>>>
>>> 5. This patch breaks the counting of 'tx_dropped' packets
>>> in netdev-dpdk.
>>
>> I presume you are referring to below line in the code.
>>
>> - dropped += netdev_dpdk_eth_tx_burst()
>> + netdev_dpdk_eth_tx_queue()
>>
>> Will handle this in v2 of this patch.
>>
>>>
>>> 6. Comments in netdev-provider.h should be fixed to reflect
>>> all the changes.
>>
>> Will make necessary changes in netdev-provider.h
>>
>>>
>>> 7. At last, I agree with Aaron that explicit allowing only
>>> 'dpdk' ports is not a good style. Also, mentioning name of
>>> exact netdev inside the common code is a bad style too.
>>
>> Will handle this appropriately.
>>
>> Regards,
>> Bhanu Prakash.
>>
>>> Best regards, Ilya Maximets.
>>>
>>> On 16.12.2016 22:24, Aaron Conole wrote:
>>>> Hi Bhanu,
>>>>
>>>> Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com> writes:
>>>>
>>>>> In exact match cache processing on an EMC hit, packets are queued
>>>>> in to batches matching the flow. Thereafter, packets are processed
>>>>> in batches for faster packet processing. This particularly is
>>>>> inefficient if there are fewer packets in a batch as
>>>>> rte_eth_tx_burst() incurs expensive MMIO write.
>>>>>
>>>>> This commit adds back intermediate queue implementation. Packets
>>>>> are queued and burst when the packet count >= NETDEV_MAX_BURST.
>>>>> Also drain logic is refactored to handle fewer packets in the tx queues.
>>>>> Testing shows significant performance gains with queueing.
>>>>>
>>>>> Fixes: b59cc14e032d("netdev-dpdk: Use instant sending instead of
>>>>> queueing of packets.")
>>>>> Signed-off-by: Bhanuprakash Bodireddy
>>>>> <bhanuprakash.bodireddy at intel.com>
>>>>> Signed-off-by: Antonio Fischetti <antonio.fischetti at intel.com>
>>>>> Co-authored-by: Antonio Fischetti <antonio.fischetti at intel.com>
>>>>> Signed-off-by: Markus Magnusson
><markus.magnusson at ericsson.com>
>>>>> Co-authored-by: Markus Magnusson
><markus.magnusson at ericsson.com>
>>>>> ---
>>>>
>>>> I've Cc'd Ilya just in hopes that the patch gets a better review
>>>> than I could give. As a general comment, I like the direction -
>>>> batched operations are usually a better way of going.
>>>>
>>>> Just minor below.
>>>>
>>>> ... snip ...
>>>>>
>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>>>>> 3509493..65dff83 100644
>>>>> --- a/lib/dpif-netdev.c
>>>>> +++ b/lib/dpif-netdev.c
>>>>> @@ -622,6 +622,9 @@ static int dpif_netdev_xps_get_tx_qid(const
>>>>> struct dp_netdev_pmd_thread *pmd, static inline bool
>>>>> emc_entry_alive(struct emc_entry *ce); static void
>>>>> emc_clear_entry(struct emc_entry *ce);
>>>>>
>>>>> +static struct tx_port *pmd_send_port_cache_lookup
>>>>> + (const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
>>>>> +
>>>>> static void
>>>>> emc_cache_init(struct emc_cache *flow_cache) { @@ -2877,6
>>>>> +2880,31 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd,
>}
>>>>>
>>>>> static void
>>>>> +dp_netdev_drain_txq_port(struct dp_netdev_pmd_thread *pmd,
>>>>> + struct dp_netdev_port *port,
>>>>> + uint64_t now) {
>>>>> + int tx_qid;
>>>>> +
>>>>> + if (!strcmp(port->type, "dpdk")) {
>>>>
>>>> Any reason to restrict this only to dpdk ports? It looks like
>>>> you've added a new netdev operation, so why not just call the
>>>> netdev_txq_drain unconditionally?
>>>>
>>>> Also, bit of a nit, but tq_qid can be reduced in scope down to the
>>>> if block below.
>>>>
>>>>> + struct tx_port *tx = pmd_send_port_cache_lookup(pmd,
>>>>> + u32_to_odp(port->port_no));
>>>>> +
>>>>> + if (OVS_LIKELY(tx)) {
>>>>> + bool dynamic_txqs = tx->port->dynamic_txqs;
>>>>> +
>>>>> + if (dynamic_txqs) {
>>>>> + tx_qid = dpif_netdev_xps_get_tx_qid(pmd, tx, now);
>>>>> + } else {
>>>>> + tx_qid = pmd->static_tx_qid;
>>>>> + }
>>>>> +
>>>>> + netdev_txq_drain(port->netdev, tx_qid);
>>>>> + }
>>>>> + }
>>>>> +}
>>>>> +
>>>>
>>>>
>>>>
>>
>>
>>
More information about the dev
mailing list