[ovs-discuss] ovswitch performance and stability problem
Chris
contact at progbau.de
Wed Nov 19 11:21:42 UTC 2014
Hello,
what version of ovs do you use? I think in Havana the default is 1.11 ?!
Since we upgraded to the latest version 2.3 we experience huge
performance improvements especially regarding new tcp connections per
second (TCP_CRR).
Anyway why you use the logical network router from neutron? Like you
tested by yourself one instance will max it out, it doesn't scale eave
with a 10 GB interface.
Have a look at "Neutron flat provider network", this will connect the
instance traffic directly to the physical Layer 3 device.
Regarding the monitoring have a look at:
http://openvswitch.org/support/config-cookbooks/sflow/
Cheers
Chris
On 2014-11-19 17:14, Krist van Besien wrote:
> Hello,
>
> This is my first post here. I come here hoping that someone can give
> me some help/pointers with a problem I'm having.
>
> We are using ovswitch as part of an OpenStack Havana install on Red
> Hat Linux
> We have a dedicated networking node, and this is quite a powerfull
> machine. 2x6 cores, 32 GB ram.
>
> The machine has an 1Gb uplink to the internet. Under normal
> circumstances it has no problem coping with the traffic.
>
> I also did a test where I fired up a few instances in our openstack
> cloud, and started a bittorrent client in there. I monitored the
> network bandwidth consumption, and saw it go up to about 1Gb and stay
> there. So I can saturate our link. During this test the CPU load on
> the networking node was about 1, which is not an issue on a 12 core
> machine.
>
> However, one of the VMs in our cloud got compromised. And this
> machine then started to very aggressively scan the network, and
> initiate lots of connections to different hosts, from different ports.
>
> And this managed to bring our networking node to its knees. Not
> through traffic, but, it seems, by overwhelming the userspace
> component. The result was loss of connectivity for all other
> instances.
>
> If I understand how openvswitch works correctly then packets get
> matched against flows in the kernel. If no flow is matched it gets
> passed to userspace, and then a flow gets created. I get the
> imperssion that the compromised host behaved caused a lot of packets
> to miss flows.
>
> Looking with ovs-dpctl when everything is well I see something like
> this:
>
> root at lupin-neutron-r72012014-8ds1202 ~]# ovs-dpctl show
> system at ovs-system:
> lookups: hit:10256807 missed:241170 lost:0
> flows: 32
>
> This is shortly after a reboot. I see that most packets seem to be
> hit by an existing flow. There are a few flows defined. Flow numbers
> sometimes increas, up to a few hunders but never mutch.
>
> However during the episode with the compromised hosts the readings (I
> don't have a screenshot) were very different. Running ovs-dpct showed
> "missed" was a lot higher than "hit", and increasing rapidly. There
> were thousands of flows, and they were changing all the time.
>
> The questions for me now are:
> - How can I better tune ovswitch so that a compromised host does not
> bring down our network. The instances are started by customers, and I
> cannot guarantee that they all will behave. I need to assume that this
> will happen again.
> - Is there a way to somehow contain network traffic for misbehaving
> hosts?
>
> Thanks,
>
> Krist
>
>
> _______________________________________________
> discuss mailing list
> discuss at openvswitch.org
> http://openvswitch.org/mailman/listinfo/discuss
More information about the discuss
mailing list