[ovs-discuss] [PATCH-RFC 1/2] Improve ARP latency

Thu Oct 8 00:02:27 UTC 2015

Quoting Jesse Gross <jesse at nicira.com>:

> On Thu, Oct 1, 2015 at 4:19 PM,  <dwilder at us.ibm.com> wrote:
>>
>> Quoting Jesse Gross <jesse at nicira.com>:
>>
>>> On Tue, Sep 29, 2015 at 10:50 PM,  <dwilder at us.ibm.com> wrote:
>>>>
>>>> Hi-
>>>>
>>>> I have been conducting scaling tests with OVS and docker. My tests
>>>> revealed
>>>> that the latency of ARP packets can become very large resulting in many
>>>> ARP
>>>> re-transmissions and time-outs. I found the source of the poor latency to
>>>> be
>>>> with the handling of arp packets in ovs_vport_find_upcall_portid().  Each
>>>> packet is hashed in ovs_vport_find_upcall_portid() by calling
>>>> skb_get_hash().  This hash is used to select a netlink socket in which to
>>>> send the packet to userspace.  However, skb_get_hash() is not supporting
>>>> ARP
>>>> packets returning a 0 (invalid hash) for every ARP.  This results in a
>>>> single ovs-vswitchd handler thread processing every arp packet thus
>>>> severely
>>>> impacting the average latency of ARPs. I am purposing a change to
>>>> ovs_vport_find_upcall_portid() that spreads the ARP packets evenly
>>>> between
>>>> all the handler threads (patch to follow).  Please let me know if you
>>>> have
>>>> suggestions/comments.
>>>
>>>
>>> This is definitely an interesting analysis but I'm a little surprised
>>> at the basic scenario. First, I guess it seems to me that the L2
>>> domain is too large if there are this many ARPs.
>>
>>
>> I can imagine running a couple of thousand docker containers, so I think
>> this is a reasonable size test.
>
> Having thousands of nodes (regardless of whether they are containers
> or VMs) on a single L2 segment is really not a good idea. I would
> expect them to be segmented into smaller groups with L3 boundaries in
> the middle.

Something I am not clear on. Creating smaller L2 segments would reduce  
the impact of a broadcast packet, fewer ports to flood the packet to.  
But would it affect the performance of the upcall datapath?  Comparing  
a single 512 port switch to two 256 port switches (on a single host).   
Wont they have the same number of ports, queues, threads and upcalls?

>
>> On a related issue, I am looking into the memory consumed by the netlink
>> sockets, OVS on linux can create many of these sockets. Do you have any
>> thought as to why the current model was picked?
>
> Independent queues are the easiest way to provide lockless access to
> incoming packets on different cores and, in some case, give higher
> priority to certain types of packets.
>
>>> The speed also
>>> generally seems slower than I would expect but in any case I don't
>>> disagree that it is better to spread the load among all the cores.
>>>
>>> On the patch itself, can't we just make skb_get_hash() be able to
>>> decode ARP? It seems like that is cleaner and more generic.
>>
>>
>> My first thought was to make a change in skb_get_hash().  However, the
>> comment in __skb_get_hash() state that the hash is generated from the
>> 4-tuple (address and ports).  ARPs have no ports so a return value of 0
>> looked correct.
>>
>> /*
>>   * __skb_get_hash: calculate a flow hash based on src/dst addresses
>>   * and src/dst port numbers.  Sets hash in skb to non-zero hash value
>>   * on success, zero indicates no valid hash.  Also, sets l4_hash in skb
>>   * if hash is a canonical 4-tuple hash over transport ports.
>>   */
>>
>> What do you think?
>
> I don't think that this is really a strict definition. In particular,
> IP packets that aren't TCP or UDP will still return a hash based on
> the IP addresses.
>
> However, I believe that you are looking at an old version of this
> function. Any changes would need to be made to the upstream Linux
> tree, not purely in OVS.