[ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during packet transmission.

Bodireddy, Bhanuprakash bhanuprakash.bodireddy at intel.com
Tue Dec 20 12:19:53 UTC 2016


>-----Original Message-----
>From: Ilya Maximets [mailto:i.maximets at samsung.com]
>Sent: Tuesday, December 20, 2016 8:09 AM
>To: Bodireddy, Bhanuprakash <bhanuprakash.bodireddy at intel.com>; Aaron
>Conole <aconole at redhat.com>
>Cc: dev at openvswitch.org; Daniele Di Proietto <diproiettod at vmware.com>;
>Thadeu Lima de Souza Cascardo <cascardo at redhat.com>; Fischetti, Antonio
><antonio.fischetti at intel.com>; Markus Magnusson
><markus.magnusson at ericsson.com>
>Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue during
>packet transmission.
>
>On 19.12.2016 21:05, Bodireddy, Bhanuprakash wrote:
>> Thanks Ilya and Aaron for reviewing this patch and providing your
>comments, my reply inline.
>>
>>> -----Original Message-----
>>> From: Ilya Maximets [mailto:i.maximets at samsung.com]
>>> Sent: Monday, December 19, 2016 8:41 AM
>>> To: Aaron Conole <aconole at redhat.com>; Bodireddy, Bhanuprakash
>>> <bhanuprakash.bodireddy at intel.com>
>>> Cc: dev at openvswitch.org; Daniele Di Proietto
>>> <diproiettod at vmware.com>; Thadeu Lima de Souza Cascardo
>>> <cascardo at redhat.com>; Fischetti, Antonio
>>> <antonio.fischetti at intel.com>; Markus Magnusson
>>> <markus.magnusson at ericsson.com>
>>> Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Use intermediate queue
>>> during packet transmission.
>>>
>>> Hi,
>>>
>>> I didn't test this patch yet. Bhanu, could you please describe your
>>> test scenario and performance results in more details.
>>
>> During the recent performance analysis improvements for classifier, we
>found that bottleneck was also observed at flow batching.
>> This was due to fewer packets in batch. To reproduce this, a simple P2P test
>case can be used with 30 IXIA streams and matching IP flow rules.
>> Eg:  For an IXIA stream with src Ip: 2.2.2.1, dst tip: 5.5.5.1
>>         ovs-ofctl add-flow br0
>> dl_type=0x0800,nw_src=2.2.2.1,actions=output:2
>>
>>         For an IXIA stream with src Ip: 4.4.4.1, dst tip: 15.15.15.1
>>         ovs-ofctl add-flow br0
>> dl_type=0x0800,nw_src=4.4.4.1,actions=output:2
>>
>> This leaves fewer packets in batches and packet_batch_per_flow_execute()
>shall be invoked for every batch.
>> With 30 flows, I see the throughput drops to ~7.0 Mpps in PHY2PHY case for
>64 byte udp packets.
>>
>>> It'll be nice if you provide throughput and latency measurement
>>> results for different scenarios and packet sizes. Latency is important here.
>> We are yet to do latency measurements in this case. With 30 IXIA
>> streams comprising of 64 byte udp packets there was an throughput
>> improvement of 30% in P2P case and 13-15% in PVP case(single queue). we
>will try to get the latency stats with and without this patch.
>>
>>>
>>> About the patch itself:
>>>
>>>  1. 'drain' called only for PMD threads. This will lead to
>>>     broken connection with non-PMD ports.
>> I see that non-PMD ports are handled with vswitchd thread.
>> Tested PVP loopback case with tap ports and found to be working as
>> expected. Can you let me know the specific case you are referring here so
>that I can verify if the patch breaks it.
>
>I meant something like this:
>
>
> *-----------HOST-1(br-int)-----*               *---------HOST-2(br-int)------*
> |                              |               |                             |
> |   internal_port <--> dpdk0 <-------------------> dpdk0 <--> internal_port  |
> |   192.168.0.1/24             |               |              192.168.0.2/24 |
> *------------------------------*               *-----------------------------*
>
> (HOST-1)# ping 192.168.0.2
>
>In this case I'm expecting that first (NETDEV_MAX_BURST - 1) icmp packets
>will stuck in TX queue. The next packet will cause sending the whole batch of
>NETDEV_MAX_BURST icmp packets. That is not good behaviour.

Thanks for test case. I tested this and found to be working with the patch. 
The reason being, PMD threads uses intermediate queue implementation by invoking 'netdev_dpdk_eth_tx_queue()', 
Whereas the non-pmd thread(vswitchd) continues to call netdev_dpdk_eth_tx_burst() in the patch and no change in previous behavior here.

I had setup the case similar to one mentioned in above diagram using 2 hosts and assigning IP addresses to the respective bridges. 
When I ping the  Host2(192.168.0.2) from Host 1(192.168.0.1) below is the call path.

(HOST-1)# ping 192.168.0.2 -c 1  (send only one packet)

When sending ICMP echo request packet: 
Vswitchd thread:    dp_execute_cb()
                                           netdev_send()           [ pkt sent on dpdk0, qid: 0]
                                                 netdev_dpdk_send__() 
                                                       dpdk_do_tx_copy() 
                                                             netdev_dpdk_eth_tx_burst().
                                                                          

On receiving ICMP echo reply packet:
PMD thread:   dp_execute_cb()
                                    netdev_send()    [pkt sent on br0, qid: 1]
                                         netdev_linux_send()

Regards,
Bhanu Prakash. 

>
>>>  2. 'xps_get_tx_qid()' called twice. First time on send and
>>>     the second time on drain. This may lead to different
>>>     returned 'tx_qid's and packets will stuck forever in
>>>     tx buffer.
>>
>> You are right and xps_get_tx_qid() can return different tx_qids.
>>  I was always testing for XPS disabled cases and completely overlooked this
>case.
>> This could be a potential problem for us and any suggestions should be very
>helpful.
>>
>>>
>>>  3. 'txq_drain()' must take the 'tx_lock' for queue in case
>>>     of dynamic tx queues.
>>
>> Agree, will handle this in next version.
>>
>>>
>>>  4. Waiting for 1024 polling cycles of PMD thread may cause
>>>     a huge latency if we have few packets per second on one
>>>     port and intensive traffic on others.
>>
>> I agree with you. We discussed this here and thought invoking the
>> drain logic once every 1024 cycles is an optimal solution. But if the
>> community thinks otherwise we can move this in to the main 'for' loop
>> so that it can be invoked more often provided that 'DRAIN_TSC' cycles
>elapsed.
>>
>>>
>>>  5. This patch breaks the counting of 'tx_dropped' packets
>>>     in netdev-dpdk.
>>
>> I presume you are referring to below line in the code.
>>
>> - dropped += netdev_dpdk_eth_tx_burst()
>> + netdev_dpdk_eth_tx_queue()
>>
>> Will handle this in v2 of this patch.
>>
>>>
>>>  6. Comments in netdev-provider.h should be fixed to reflect
>>>     all the changes.
>>
>> Will make necessary changes in netdev-provider.h
>>
>>>
>>>  7. At last, I agree with Aaron that explicit allowing only
>>>     'dpdk' ports is not a good style. Also, mentioning name of
>>>     exact netdev inside the common code is a bad style too.
>>
>> Will handle this appropriately.
>>
>> Regards,
>> Bhanu Prakash.
>>
>>> Best regards, Ilya Maximets.
>>>
>>> On 16.12.2016 22:24, Aaron Conole wrote:
>>>> Hi Bhanu,
>>>>
>>>> Bhanuprakash Bodireddy <bhanuprakash.bodireddy at intel.com> writes:
>>>>
>>>>> In exact match cache processing on an EMC hit, packets are queued
>>>>> in to batches matching the flow. Thereafter, packets are processed
>>>>> in batches for faster packet processing. This particularly is
>>>>> inefficient if there are fewer packets in a batch as
>>>>> rte_eth_tx_burst() incurs expensive MMIO write.
>>>>>
>>>>> This commit adds back intermediate queue implementation. Packets
>>>>> are queued and burst when the packet count >= NETDEV_MAX_BURST.
>>>>> Also drain logic is refactored to handle fewer packets in the tx queues.
>>>>> Testing shows significant performance gains with queueing.
>>>>>
>>>>> Fixes: b59cc14e032d("netdev-dpdk: Use instant sending instead of
>>>>> queueing of packets.")
>>>>> Signed-off-by: Bhanuprakash Bodireddy
>>>>> <bhanuprakash.bodireddy at intel.com>
>>>>> Signed-off-by: Antonio Fischetti <antonio.fischetti at intel.com>
>>>>> Co-authored-by: Antonio Fischetti <antonio.fischetti at intel.com>
>>>>> Signed-off-by: Markus Magnusson
><markus.magnusson at ericsson.com>
>>>>> Co-authored-by: Markus Magnusson
><markus.magnusson at ericsson.com>
>>>>> ---
>>>>
>>>> I've Cc'd Ilya just in hopes that the patch gets a better review
>>>> than I could give.  As a general comment, I like the direction -
>>>> batched operations are usually a better way of going.
>>>>
>>>> Just minor below.
>>>>
>>>> ... snip ...
>>>>>
>>>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>>>>> 3509493..65dff83 100644
>>>>> --- a/lib/dpif-netdev.c
>>>>> +++ b/lib/dpif-netdev.c
>>>>> @@ -622,6 +622,9 @@ static int dpif_netdev_xps_get_tx_qid(const
>>>>> struct dp_netdev_pmd_thread *pmd,  static inline bool
>>>>> emc_entry_alive(struct emc_entry *ce);  static void
>>>>> emc_clear_entry(struct emc_entry *ce);
>>>>>
>>>>> +static struct tx_port *pmd_send_port_cache_lookup
>>>>> +    (const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
>>>>> +
>>>>>  static void
>>>>>  emc_cache_init(struct emc_cache *flow_cache)  { @@ -2877,6
>>>>> +2880,31 @@ cycles_count_end(struct dp_netdev_pmd_thread *pmd,
>}
>>>>>
>>>>>  static void
>>>>> +dp_netdev_drain_txq_port(struct dp_netdev_pmd_thread *pmd,
>>>>> +                         struct dp_netdev_port *port,
>>>>> +                         uint64_t now) {
>>>>> +    int tx_qid;
>>>>> +
>>>>> +    if (!strcmp(port->type, "dpdk")) {
>>>>
>>>> Any reason to restrict this only to dpdk ports?  It looks like
>>>> you've added a new netdev operation, so why not just call the
>>>> netdev_txq_drain unconditionally?
>>>>
>>>> Also, bit of a nit, but tq_qid can be reduced in scope down to the
>>>> if block below.
>>>>
>>>>> +        struct tx_port *tx = pmd_send_port_cache_lookup(pmd,
>>>>> +                 u32_to_odp(port->port_no));
>>>>> +
>>>>> +        if (OVS_LIKELY(tx)) {
>>>>> +            bool dynamic_txqs = tx->port->dynamic_txqs;
>>>>> +
>>>>> +            if (dynamic_txqs) {
>>>>> +                tx_qid = dpif_netdev_xps_get_tx_qid(pmd, tx, now);
>>>>> +            } else {
>>>>> +                tx_qid = pmd->static_tx_qid;
>>>>> +            }
>>>>> +
>>>>> +            netdev_txq_drain(port->netdev, tx_qid);
>>>>> +        }
>>>>> +    }
>>>>> +}
>>>>> +
>>>>
>>>>
>>>>
>>
>>
>>


More information about the dev mailing list