[ovs-discuss] [PATCH-RFC 1/2] Improve ARP latency
dwilder at us.ibm.com
dwilder at us.ibm.com
Thu Oct 8 00:02:27 UTC 2015
Quoting Jesse Gross <jesse at nicira.com>:
> On Thu, Oct 1, 2015 at 4:19 PM, <dwilder at us.ibm.com> wrote:
>> Quoting Jesse Gross <jesse at nicira.com>:
>>> On Tue, Sep 29, 2015 at 10:50 PM, <dwilder at us.ibm.com> wrote:
>>>> I have been conducting scaling tests with OVS and docker. My tests
>>>> that the latency of ARP packets can become very large resulting in many
>>>> re-transmissions and time-outs. I found the source of the poor latency to
>>>> with the handling of arp packets in ovs_vport_find_upcall_portid(). Each
>>>> packet is hashed in ovs_vport_find_upcall_portid() by calling
>>>> skb_get_hash(). This hash is used to select a netlink socket in which to
>>>> send the packet to userspace. However, skb_get_hash() is not supporting
>>>> packets returning a 0 (invalid hash) for every ARP. This results in a
>>>> single ovs-vswitchd handler thread processing every arp packet thus
>>>> impacting the average latency of ARPs. I am purposing a change to
>>>> ovs_vport_find_upcall_portid() that spreads the ARP packets evenly
>>>> all the handler threads (patch to follow). Please let me know if you
>>> This is definitely an interesting analysis but I'm a little surprised
>>> at the basic scenario. First, I guess it seems to me that the L2
>>> domain is too large if there are this many ARPs.
>> I can imagine running a couple of thousand docker containers, so I think
>> this is a reasonable size test.
> Having thousands of nodes (regardless of whether they are containers
> or VMs) on a single L2 segment is really not a good idea. I would
> expect them to be segmented into smaller groups with L3 boundaries in
> the middle.
Something I am not clear on. Creating smaller L2 segments would reduce
the impact of a broadcast packet, fewer ports to flood the packet to.
But would it affect the performance of the upcall datapath? Comparing
a single 512 port switch to two 256 port switches (on a single host).
Wont they have the same number of ports, queues, threads and upcalls?
>> On a related issue, I am looking into the memory consumed by the netlink
>> sockets, OVS on linux can create many of these sockets. Do you have any
>> thought as to why the current model was picked?
> Independent queues are the easiest way to provide lockless access to
> incoming packets on different cores and, in some case, give higher
> priority to certain types of packets.
>>> The speed also
>>> generally seems slower than I would expect but in any case I don't
>>> disagree that it is better to spread the load among all the cores.
>>> On the patch itself, can't we just make skb_get_hash() be able to
>>> decode ARP? It seems like that is cleaner and more generic.
>> My first thought was to make a change in skb_get_hash(). However, the
>> comment in __skb_get_hash() state that the hash is generated from the
>> 4-tuple (address and ports). ARPs have no ports so a return value of 0
>> looked correct.
>> * __skb_get_hash: calculate a flow hash based on src/dst addresses
>> * and src/dst port numbers. Sets hash in skb to non-zero hash value
>> * on success, zero indicates no valid hash. Also, sets l4_hash in skb
>> * if hash is a canonical 4-tuple hash over transport ports.
>> What do you think?
> I don't think that this is really a strict definition. In particular,
> IP packets that aren't TCP or UDP will still return a hash based on
> the IP addresses.
> However, I believe that you are looking at an old version of this
> function. Any changes would need to be made to the upstream Linux
> tree, not purely in OVS.
More information about the discuss