[ovs-discuss] ovswitch performance and stability problem

Chris contact at progbau.de
Wed Nov 19 11:21:42 UTC 2014


Hello,

what version of ovs do you use? I think in Havana the default is 1.11 ?!
Since we upgraded to the latest version 2.3 we experience huge 
performance improvements especially regarding new tcp connections per 
second (TCP_CRR).

Anyway why you use the logical network router from neutron? Like you 
tested by yourself one instance will max it out, it doesn't scale eave 
with a 10 GB interface.
Have a look at "Neutron flat provider network", this will connect the 
instance traffic directly to the physical Layer 3 device.

Regarding the monitoring have a look at: 
http://openvswitch.org/support/config-cookbooks/sflow/

Cheers
Chris

On 2014-11-19 17:14, Krist van Besien wrote:
> Hello,
> 
>  This is my first post here. I come here hoping that someone can give
> me some help/pointers with a problem I'm having.
> 
>  We are using ovswitch as part of an OpenStack Havana install on Red
> Hat Linux
>  We have a dedicated networking node, and this is quite a powerfull
> machine. 2x6 cores, 32 GB ram.
> 
>  The machine has an 1Gb uplink to the internet. Under normal
> circumstances it has no problem coping with the traffic.
> 
>  I also did a test where I fired up a few instances in our openstack
> cloud, and started a bittorrent client in there. I monitored the
> network bandwidth consumption, and saw it go up to about 1Gb and stay
> there. So I can saturate our link. During this test the CPU load on
> the networking node was about 1, which is not an issue on a 12 core
> machine.
> 
>  However, one of the VMs in our cloud got compromised. And this
> machine then started to very aggressively scan the network, and
> initiate lots of connections to different hosts, from different ports.
> 
>  And this managed to bring our networking node to its knees. Not
> through traffic, but, it seems, by overwhelming the userspace
> component. The result was loss of connectivity for all other
> instances.
> 
>  If I understand how openvswitch works correctly then packets get
> matched against flows in the kernel. If no flow is matched it gets
> passed to userspace, and then a flow gets created. I get the
> imperssion that the compromised host behaved caused a lot of packets
> to miss flows.
> 
>  Looking with ovs-dpctl when everything is well I see something like
> this:
> 
>  root at lupin-neutron-r72012014-8ds1202 ~]# ovs-dpctl show
>  system at ovs-system:
>  lookups: hit:10256807 missed:241170 lost:0
>  flows: 32
> 
>  This is shortly after a reboot. I see that most packets seem to be
> hit by an existing flow. There are a few flows defined. Flow numbers
> sometimes increas, up to a few hunders but never mutch.
> 
>  However during the episode with the compromised hosts the readings (I
> don't have a screenshot) were very different. Running ovs-dpct showed
> "missed" was a lot higher than "hit", and increasing rapidly. There
> were thousands of flows, and they were changing all the time.
> 
>  The questions for me now are:
>  - How can I better tune ovswitch so that a compromised host does not
> bring down our network. The instances are started by customers, and I
> cannot guarantee that they all will behave. I need to assume that this
> will happen again.
>  - Is there a way to somehow contain network traffic for misbehaving
> hosts?
> 
>  Thanks,
> 
>  Krist
> 
> 
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss



More information about the discuss mailing list