[ovs-discuss] [PATCH-RFC 1/2] Improve ARP latency

Thu Oct 8 16:36:41 UTC 2015

On Wed, Oct 7, 2015 at 5:02 PM,  <dwilder at us.ibm.com> wrote:
>
> Quoting Jesse Gross <jesse at nicira.com>:
>
>> On Thu, Oct 1, 2015 at 4:19 PM,  <dwilder at us.ibm.com> wrote:
>>>
>>>
>>> Quoting Jesse Gross <jesse at nicira.com>:
>>>
>>>> On Tue, Sep 29, 2015 at 10:50 PM,  <dwilder at us.ibm.com> wrote:
>>>>>
>>>>>
>>>>> Hi-
>>>>>
>>>>> I have been conducting scaling tests with OVS and docker. My tests
>>>>> revealed
>>>>> that the latency of ARP packets can become very large resulting in many
>>>>> ARP
>>>>> re-transmissions and time-outs. I found the source of the poor latency
>>>>> to
>>>>> be
>>>>> with the handling of arp packets in ovs_vport_find_upcall_portid().
>>>>> Each
>>>>> packet is hashed in ovs_vport_find_upcall_portid() by calling
>>>>> skb_get_hash().  This hash is used to select a netlink socket in which
>>>>> to
>>>>> send the packet to userspace.  However, skb_get_hash() is not
>>>>> supporting
>>>>> ARP
>>>>> packets returning a 0 (invalid hash) for every ARP.  This results in a
>>>>> single ovs-vswitchd handler thread processing every arp packet thus
>>>>> severely
>>>>> impacting the average latency of ARPs. I am purposing a change to
>>>>> ovs_vport_find_upcall_portid() that spreads the ARP packets evenly
>>>>> between
>>>>> all the handler threads (patch to follow).  Please let me know if you
>>>>> have
>>>>> suggestions/comments.
>>>>
>>>>
>>>>
>>>> This is definitely an interesting analysis but I'm a little surprised
>>>> at the basic scenario. First, I guess it seems to me that the L2
>>>> domain is too large if there are this many ARPs.
>>>
>>>
>>>
>>> I can imagine running a couple of thousand docker containers, so I think
>>> this is a reasonable size test.
>>
>>
>> Having thousands of nodes (regardless of whether they are containers
>> or VMs) on a single L2 segment is really not a good idea. I would
>> expect them to be segmented into smaller groups with L3 boundaries in
>> the middle.
>
>
>
> Something I am not clear on. Creating smaller L2 segments would reduce the
> impact of a broadcast packet, fewer ports to flood the packet to. But would
> it affect the performance of the upcall datapath?  Comparing a single 512
> port switch to two 256 port switches (on a single host).  Wont they have the
> same number of ports, queues, threads and upcalls?

If it's all on the same host then it would be roughly the same impact
on that OVS. However, OVS instances on other hosts wouldn't have to
see these packets and it's just generally better design to have
smaller L2.