[ovs-discuss] [PATCH-RFC 1/2] Improve ARP latency

Jesse Gross jesse at nicira.com
Thu Oct 1 00:06:18 UTC 2015


On Tue, Sep 29, 2015 at 10:50 PM,  <dwilder at us.ibm.com> wrote:
> Hi-
>
> I have been conducting scaling tests with OVS and docker. My tests revealed
> that the latency of ARP packets can become very large resulting in many ARP
> re-transmissions and time-outs. I found the source of the poor latency to be
> with the handling of arp packets in ovs_vport_find_upcall_portid().  Each
> packet is hashed in ovs_vport_find_upcall_portid() by calling
> skb_get_hash().  This hash is used to select a netlink socket in which to
> send the packet to userspace.  However, skb_get_hash() is not supporting ARP
> packets returning a 0 (invalid hash) for every ARP.  This results in a
> single ovs-vswitchd handler thread processing every arp packet thus severely
> impacting the average latency of ARPs. I am purposing a change to
> ovs_vport_find_upcall_portid() that spreads the ARP packets evenly between
> all the handler threads (patch to follow).  Please let me know if you have
> suggestions/comments.

This is definitely an interesting analysis but I'm a little surprised
at the basic scenario. First, I guess it seems to me that the L2
domain is too large if there are this many ARPs. The speed also
generally seems slower than I would expect but in any case I don't
disagree that it is better to spread the load among all the cores.

On the patch itself, can't we just make skb_get_hash() be able to
decode ARP? It seems like that is cleaner and more generic.



More information about the discuss mailing list