[ovs-discuss] Vlan handling problem in Linux datapath

Thomas F Herbert thomasfherbert at gmail.com
Tue Mar 10 00:02:57 UTC 2015


All,

First apologies in advance for a lengthy message:

I have seen some strange vlan behavior where there is a miss-match on 
vlan tci. It can be reproduced on every commit I tried since version 2.3 
but cannot be reproduced in version 2.2.

The example below is from commit 
7cc398cb8561a16ae3be5ffc687be5620981d619 in the current master branch.

To reproduce set a specific rule to match on a particular tci. Every 
incoming packet with a miss-matched vlan tag will get the following 
error from the Linux datapath function ovs_nla_get_match():

[449232.750362] openvswitch: netlink: VLAN tag present bit must have an 
exact match (tci_mask=100).
[449233.767975] openvswitch: netlink: VLAN tag present bit must have an 
exact match (tci_mask=100).
[449234.765656] openvswitch: netlink: VLAN tag present bit must have an 
exact match (tci_mask=100).

This is the flow rule to reproduce the problem:

ovs-ofctl --protocols=OpenFlow13 add-flow br0 
in_port=2,vlan_tci=0x1e36,priority=20000,actions=pop_vlan,output:1

I believe that the miss matched packet is getting up-called but when the 
flow comes back to the kernel it doesn't have the mask set correctly in 
the netlink attributes so the kernel has trouble parsing it, the flow is 
never set in the kernel so every subsequent packet also misses.

=====Work Around=============
The problem can be made to disappear by setting a lower priority flow to 
explicitly drop all other incoming packets whose vlan tci's don't match 
as follows:

ovs-ofctl --protocols=OpenFlow13 add-flow br0 
in_port=2,vlan_tci=0x1000/0x1000,priority=20,actions=drop

I am pretty sure this is a bug but I am looking for feedback before I 
fix it because some may argue that the above is correct behavior but I 
have my doubts. Also, having every incoming miss-matched packet get 
bounced to user space and back causes the performance to drop by two 
orders of magnitude.

=============log with debug set===============

Mar  9 19:43:47 lubuntu1310 ovs-vswitchd: ovs|07894|poll_loop|DBG|wakeup 
due to [POLLIN] on fd 14 (FIFO pipe:[3806013]) at 
ofproto/ofproto-dpif.c:1633 (0% CPU usage)
Mar  9 19:43:47 lubuntu1310 ovs-vswitchd: 
ovs|09287|poll_loop(handler7)|DBG|wakeup due to [POLLIN] on fd 19 
(unknown anon_inode:[eventpoll]) at lib/dpif-netlink.c:2201 (0% CPU usage)
Mar  9 19:43:47 lubuntu1310 ovs-vswitchd: 
ovs|09288|dpif(handler7)|DBG|system at ovs-system: miss upcall:
Mar  9 19:43:47 lubuntu1310 ovs-vswitchd: 
recirc_id(0),dp_hash(0),skb_priority(0),in_port(3),skb_mark(0),eth(src=26:58:26:16:3c:38,dst=ff:ff:ff:ff:ff:ff),eth_type(0x8100),vlan(vid=999,pcp=0),encap(eth_type(0x0806),arp(sip=192.168.1.3,tip=192.168.1.2,op=1,sha=26:58:26:16:3c:38,tha=00:00:00:00:00:00))
Mar  9 19:43:47 lubuntu1310 ovs-vswitchd: 
arp,in_port=0,dl_vlan=999,dl_vlan_pcp=0,dl_src=26:58:26:16:3c:38,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.1.3,arp_tpa=192.168.1.2,arp_op=1,arp_sha=26:58:26:16:3c:38,arp_tha=00:00:00:00:00:00
Mar  9 19:43:47 lubuntu1310 ovs-vswitchd: 
ovs|09289|dpif(handler7)|WARN|system at ovs-system: failed to put[create] 
(Invalid argument) ufid:09fbc634c913835be0aed9d5f728f5a1 
recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(3),skb_mark(0/0),eth(src=26:58:26:16:3c:38/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(0x8100),vlan(vid=999/0x100,pcp=0/0x0),encap(eth_type(0x0806),arp(sip=192.168.1.3/0.0.0.0,tip=192.168.1.2/0.0.0.0,op=1/0,sha=26:58:26:16:3c:38/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00))
Mar  9 19:43:47 lubuntu1310 ovs-vswitchd: 
ovs|09290|poll_loop(handler7)|DBG|wakeup due to 0-ms timeout at 
ofproto/ofproto-dpif-upcall.c:622 (0% CPU usage)
Mar  9 19:43:47 lubuntu1310 kernel: [450932.727558] openvswitch: 
netlink: VLAN tag present bit must have an exact match (tci_mask=100).
Mar  9 19:43:48 lubuntu1310 ovs-vswitchd: 
ovs|15966|poll_loop(revalidator6)|DBG|wakeup due to 499-ms timeout at 
ofproto/ofproto-dpif-upcall.c:802 (0% CPU usage)
Mar  9 19:43:48 lubuntu1310 ovs-vswitchd: 
ovs|15967|netlink_socket(revalidator6)|DBG|Dropped 21 log messages in 
last 1 seconds (most recently, 1 seconds ago) due to excessive rate
Mar  9 19:43:48 lubuntu1310 ovs-vswitchd: 
ovs|15968|netlink_socket(revalidator6)|DBG|nl_sock_transact_multiple__ 
(Success): nl(len:24, type=128(ovs_datapath), flags=9[REQUEST][ECHO], 
seq=2a4e, pid=4294962693,genl(cmd=3,version=2)
Mar  9 19:43:48 lubuntu1310 ovs-vswitchd: 
ovs|15969|dpif(revalidator6)|DBG|system at ovs-system: get_stats success
Mar  9 19:43:48 lubuntu1310 ovs-vswitchd: 
ovs|15970|dpif(revalidator6)|DBG|system at ovs-system: dumped all flows
Mar  9 19:43:48 lubuntu1310 ovs-vswitchd: 
ovs|05115|poll_loop(urcu3)|DBG|wakeup due to [POLLIN] on fd 25 (FIFO 
pipe:[3806112]) at lib/ovs-rcu.c:273 (0% CPU usage)
Mar  9 19:43:48 lubuntu1310 ovs-vswitchd: 
ovs|05116|poll_loop(urcu3)|DBG|wakeup due to [POLLIN] on fd 25 (FIFO 
pipe:[3806112]) at lib/ovs-rcu.c:205 (0% CPU usage)
Mar  9 19:43:48 lubuntu1310 ovs-vswitchd: 
ovs|05117|poll_loop(urcu3)|DBG|wakeup due to [POLLIN] on fd 25 (FIFO 
pipe:[3806112]) at lib/ovs-rcu.c:205 (0% CPU usage)
Mar  9 19:43:48 lubuntu1310 ovs-vswitchd: 
ovs|15971|dpif(revalidator6)|DBG|system at ovs-system: flow_dump_destroy 
success
Mar  9 19:43:48 lubuntu1310 ovs-vswitchd: ovs|07895|poll_loop|DBG|wakeup 
due to [POLLIN] on fd 14 (FIFO pipe:[3806013]) at 
ofproto/ofproto-dpif.c:1633 (0% CPU usage)

Thanks in advance,

--Tom



-- 
Thomas F. Herbert




More information about the discuss mailing list