[ovs-discuss] OVS 2.3 udp flood - vswitchd OOM

Adam Mazur adam.mazur at tiktalik.com
Fri Dec 5 10:03:09 UTC 2014


Hi Alex,

Comparing version b6a3dd9cca (Nov 22) to 64bb477f05 (6 Oct) memory still 
grows up, but much slower. On production env it was 400MB/hour, and it 
is now (64bb477f05) 100MB/hour.

Python flooding script is not a way to generate the problem, it shows 
different behaviours on production and testing environment.
When run on production env, the memory grows order of magnitude faster. 
However, we still see growth even without flooding, which you can find 
below.

Example growth of exactly 264KB or 2x264KB increments every few seconds 
from our production environment, which had about 1k pps at the moment 
(normal production traffic, without flooding):

# while true; do echo "`date '+%T'`: `ps -Ao 'rsz,cmd' --sort rsz | tail 
-n 1 | cut -c -20;`"; sleep 1; done
10:50:51: 216788 ovs-vswitchd
10:50:52: 216788 ovs-vswitchd
10:50:53: 216788 ovs-vswitchd
10:50:55: 216788 ovs-vswitchd
10:50:56: 216788 ovs-vswitchd
10:50:57: 217052 ovs-vswitchd
10:50:58: 217052 ovs-vswitchd
10:50:59: 217052 ovs-vswitchd
10:51:00: 217052 ovs-vswitchd
10:51:01: 217052 ovs-vswitchd
10:51:02: 217052 ovs-vswitchd
10:51:03: 217052 ovs-vswitchd
10:51:04: 217052 ovs-vswitchd
10:51:05: 217052 ovs-vswitchd
10:51:06: 217580 ovs-vswitchd
10:51:07: 217580 ovs-vswitchd
10:51:09: 217580 ovs-vswitchd
10:51:10: 217580 ovs-vswitchd
10:51:11: 217580 ovs-vswitchd
10:51:12: 217844 ovs-vswitchd
10:51:13: 217844 ovs-vswitchd
10:51:14: 217844 ovs-vswitchd
10:51:15: 217844 ovs-vswitchd
10:51:16: 217844 ovs-vswitchd
10:51:17: 217844 ovs-vswitchd


What is also specific:
We use only OpenFlow1.0 controller.
Running `ovs-vsctl list Flow_Table` gives empty output.

Best,
Adam


W dniu 03.12.2014 o 12:14, Adam Mazur pisze:
> I will try on current head version.
> Meanwhile, answers are below.
>
>
> W dniu 02.12.2014 o 23:24, Alex Wang pisze:
>> Hey Adam,
>>
>> Besides the questions just asked,
>>
>> On Tue, Dec 2, 2014 at 1:11 PM, Alex Wang <alexw at nicira.com 
>> <mailto:alexw at nicira.com>> wrote:
>>
>>     Hey Adam,
>>
>>     Did you use any trick to avoid the arp resolution?
>>
>>     Running your script on my setup causes only arp pkts sent,
>>
>>     Also, there is no change of mem util of ovs.
>>
>
> There is no trick with arp.
> Gateway for VM acts as a "normal" router, with old ovs 1.7.
> The router IS a bottleneck, while it consumes 100% of CPU. But in the 
> same time ovs 2.3 on the hypervisor consumes 400% of CPU and grows in RSS.
>
>
>>     One more thing, did you see the issue without tunnel?
>>     This very recent commit fixes some issue about tunneling,
>>     Could you try again with it?
>>
>
> I will try. These problems was seen on b6a3dd9cca (Nov 22), will try 
> on head version.
>
>>     commit b772066ffd066d59d9ebce092f6665150723d2ad
>>     Author: Pravin B Shelar <pshelar at nicira.com
>>     <mailto:pshelar at nicira.com>>
>>     Date:   Wed Nov 26 11:27:05 2014 -0800
>>
>>         route-table: Remove Unregister.
>>         Since dpif registering for routing table at initialization
>>         there is no need to unregister it. Following patch removes
>>         support for turning routing table notifications on and off.
>>         Due to this change OVS always listens for these
>>         notifications.
>>         Reported-by: YAMAMOTO Takashi <yamamoto at valinux.co.jp
>>     <mailto:yamamoto at valinux.co.jp>>
>>         Signed-off-by: Pravin B Shelar <pshelar at nicira.com
>>     <mailto:pshelar at nicira.com>>
>>         Acked-by: YAMAMOTO Takashi <yamamoto at valinux.co.jp
>>     <mailto:yamamoto at valinux.co.jp>>
>>
>>
>>
>>
>>  Want to ask more questions to help debug:
>>
>> 1. Could you post the 'ovs-vsctl show' output on the xenserver?
>
> http://pastebin.com/pe8YpRwr
>
>> 2. could you post the 'ovs-dpctl dump-flows' output during the run of 
>> script?
>
> Partial output - head: http://pastebin.com/fUkbfeUN and tail: 
> http://pastebin.com/P1QgyH02
> Full output got more than 100MB of text when flooding 400K pps. Would 
> you like gzipped on priv? (less than 1MB)
>
>> 3. if oom is activated, you should see the oom log from syslog or dmeg
>> output, could you provide it?
>
> Don't have one - production logs has been rotated, remote logs during 
> oom was unavailable (network was dead while vswitch has been 
> starting), testing environment is too slow to fast generate oom... 
> first (and much faster) I will try on the head version as you have 
> said there was fixes for such case.
>
>> 4. could you provide the route output on the hypervisor
>
> # route -n
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags Metric Ref    
> Use Iface
> 0.0.0.0         10.2.7.1        0.0.0.0         UG 0      0        0 
> xenbr0
> 10.2.7.0        0.0.0.0         255.255.255.0   U 0      0        0 xenbr0
> 10.30.7.0       0.0.0.0         255.255.255.0   U 0      0        0 ib0
> 37.233.99.0     0.0.0.0         255.255.255.0   U 0      0        0 xapi4
>
>
>>
>> Thanks,
>> Alex Wang,
>>
>>
>>
>>     Thanks,
>>     Alex Wang,
>>
>>     On Mon, Dec 1, 2014 at 2:43 AM, Adam Mazur
>>     <adam.mazur at tiktalik.com <mailto:adam.mazur at tiktalik.com>> wrote:
>>
>>         Hi,
>>
>>         We are testing on kernel 3.18, ovs current master, gre
>>         tunnels / xen server. Following python script leads to fast
>>         ovs-vswitchd memory grow (1GB / minute) and finally OOM kill:
>>
>>
>>         import random, socket, struct, time
>>         sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
>>         while True:
>>             ip_raw = struct.pack('>I', random.randint(1, 0xffffffff))
>>             ip = socket.inet_ntoa(ip_raw)
>>             try:
>>                 sock.sendto("123", (ip, 12345))
>>             except:
>>                 pass
>>             #time.sleep(0.001)
>>
>>
>>         During this test ovs did not show growing flow number, but
>>         memory still grows.
>>
>>         If packets are sent too slow, then memory never grows -
>>         uncomment time.sleep line above.
>>
>>         Best,
>>         Adam
>>         _______________________________________________
>>         discuss mailing list
>>         discuss at openvswitch.org <mailto:discuss at openvswitch.org>
>>         http://openvswitch.org/mailman/listinfo/discuss
>>
>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openvswitch.org/pipermail/ovs-discuss/attachments/20141205/d9e27279/attachment-0002.html>


More information about the discuss mailing list