<div dir="ltr"><div>We did some more investigation. This issue is seen only when OVN native dhcp is used and with kernel datapath which doesn't support OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4. The reason for this failure is because ovs-vswitchd includes the attribute OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4 when it sends the packet back to the datapath after the dhcp reply packet is resumed.</div><div><br></div><div>When the dhcp packet is sent to ovn-controller, the ct_state value is set to 0x21 and dl_type is set to 0 in the flow metadata. When the packet is resumed, the function nxt_resume() calls 'pkt_metadata_from_flow()' which neither sets 'md->ct_orig_tuple' or memsets it [1] because is_ct_valid() returns true and dl_type is 0. And the function odp_key_from_dp_packet() adds OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4 [2]</div><div><br></div><div>This issue is not seen in master because of this commit - "f6fabcc624 ofproto-dpif: Mark packets as "untracked" after call to ct()" [3]</div><div><br></div><div>This patch clears the conn track variables after the ct() action.</div><div><br></div><div>I suppose we cannot apply this patch to OVS 2.8 branch because it was reverted [4] due to some issues. </div><div><br></div><div>I think we can solve this problem either with the below fixe or by setting dl_type to proper value when the packet is sent to controller.</div><div><br></div><div>***********************************</div><div>diff --git a/lib/flow.h b/lib/flow.h</div><div>index 6ae5a674d..076ce36f1 100644</div><div>--- a/lib/flow.h</div><div>+++ b/lib/flow.h</div><div>@@ -947,6 +947,8 @@ pkt_metadata_from_flow(struct pkt_metadata *md, const struct flow *flow)</div><div> flow->ct_tp_dst,</div><div> flow->ct_nw_proto,</div><div> };</div><div>+ } else {</div><div>+ memset(&md->ct_orig_tuple, 0, sizeof md->ct_orig_tuple);</div><div> }</div><div> } else {</div><div> memset(&md->ct_orig_tuple, 0, sizeof md->ct_orig_tuple);</div><div>**********************************</div><div><br></div><div>Please let me know if this fix makes sense ? Or if there is a better way to solve it ?</div><div><br></div><div><br></div><div><br></div><div>[1] - <a href="https://github.com/openvswitch/ovs/blob/master/lib/flow.h#L933">https://github.com/openvswitch/ovs/blob/master/lib/flow.h#L933</a></div><div>[2] - <a href="https://github.com/openvswitch/ovs/blob/master/lib/odp-util.c#L5131">https://github.com/openvswitch/ovs/blob/master/lib/odp-util.c#L5131</a></div><div>[3] - <a href="https://github.com/openvswitch/ovs/commit/f6fabcc62458d656046c9852ee80fcff3e516e6e#diff-8bbf76f24a1b6618ed3aaaccad70a0a5">https://github.com/openvswitch/ovs/commit/f6fabcc62458d656046c9852ee80fcff3e516e6e#diff-8bbf76f24a1b6618ed3aaaccad70a0a5</a></div><div>[4] - <a href="https://patchwork.ozlabs.org/patch/808397/">https://patchwork.ozlabs.org/patch/808397/</a></div><div><br></div><div>Thanks</div><div>Numan</div><div><br></div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Oct 19, 2017 at 6:07 PM, Daniel Alvarez Sanchez <span dir="ltr"><<a href="mailto:dalvarez@redhat.com" target="_blank">dalvarez@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">System information:<br>
===============<br>
<br>
OS: CentOS Linux release 7.3.1611 (Core)<br>
Kernel version: 3.10.0-693.2.2.el7.x86_64 #1 SMP<br>
OVS version: v2.8.1 (git tag)<br>
#ovs-vswitchd --version<br>
ovs-vswitchd (Open vSwitch) 2.8.1<br>
<br>
Bug description:<br>
============<br>
<br>
Right now, OVN doesn't work using OVS 2.8.1 on Centos 7.3 and conntrack.<br>
Numan Siddique and I have been doing some research on this and we have come<br>
up with the following conclusions:<br>
<br>
When doing a DHCP request on the mentioned system above, the kernel throws<br>
the following error (see Reproducer section below):<br>
<br>
netlink: Key 26 has unexpected len 16 expected 0<br>
<br>
Apparently, this commit [0], introduced that key (26<br>
/OVS_KEY_ATTR_CT_ORIG_TUPLE_<wbr>IPV4 and looks like the OVS modules in the above<br>
kernel doesn't have that key. When ovs-vswitchd sends those extra bytes,<br>
the kernel<br>
module can't find the key and fails with the netlink error above:<br>
<br>
2017-10-18T08:00:18Z|00444|<wbr>netlink_socket|DBG|nl_sock_<wbr>transact_multiple__<br>
(Success): nl(len:496, type=30(ovs_packet), flags=1[REQUEST], seq=6d,<br>
pid=5939,genl(cmd=3,version=1)<br>
<br>
However, if we run OVS master, everything works ok and ovs-vswitchd sends<br>
20 bytes less (4 bytes of the header + 16 bytes of data) so it looks like<br>
it's adapting to the kernel datapath in some way:<br>
<br>
2017-10-18T07:59:03Z|00391|<wbr>netlink_socket|DBG|nl_sock_<wbr>transact_multiple__<br>
(Success): nl(len:476, type=30(ovs_packet), flags=1[REQUEST], seq=32,<br>
pid=4294962064,genl(cmd=3,<wbr>version=1)<br>
<br>
Note lengths in both cases: 496 vs 476 (working case). In the first case<br>
(496) the kernel throws the netlink error ("netlink: Key 26 has unexpected<br>
len 16 expected 0").<br>
<br>
I've checked that running an OVS version up to [1] fixes it but can't find<br>
the exact commit which fixes the current bug.<br>
<br>
<br>
[0]<br>
<a href="https://github.com/openvswitch/ovs/commit/c30b4ceafa235d11a1a9ded5fed11fec86182ee0" rel="noreferrer" target="_blank">https://github.com/<wbr>openvswitch/ovs/commit/<wbr>c30b4ceafa235d11a1a9ded5fed11f<wbr>ec86182ee0</a><br>
[1]<br>
<a href="https://github.com/openvswitch/ovs/commit/80cee1163e6301dd1c0bd01c5f0323fb1a45adf4" rel="noreferrer" target="_blank">https://github.com/<wbr>openvswitch/ovs/commit/<wbr>80cee1163e6301dd1c0bd01c5f0323<wbr>fb1a45adf4</a><br>
<br>
<br>
Reproducer:<br>
=========<br>
<br>
ovn-nbctl ls-add sw0<br>
ovn-nbctl lsp-add sw0 sw0-port1<br>
ovn-nbctl lsp-set-addresses sw0-port1 "50:54:00:00:00:01 192.168.0.2"<br>
<br>
ovn-nbctl --wait=hv acl-add sw0 from-lport 1001 'inport == "sw0-port1" &&<br>
ip' allow-related<br>
ovn-nbctl --wait=hv acl-add sw0 to-lport 1001 'outport == "sw0-port1" &&<br>
ip' drop<br>
ovn-nbctl acl-list sw0<br>
<br>
<br>
add_phys_port() {<br>
name=$1<br>
mac=$2<br>
ip=$3<br>
mask=$4<br>
gw=$5<br>
iface_id=$6<br>
ip netns add $name<br>
ovs-vsctl add-port br-int $name -- set interface $name type=internal<br>
ip link set $name netns $name<br>
ip netns exec $name ip link set $name address $mac<br>
ip netns exec $name ip addr add $ip/$mask dev $name<br>
ip netns exec $name ip link set $name up<br>
ip netns exec $name ip route add default via $gw<br>
ovs-vsctl set Interface $name external_ids:iface-id=$iface_<wbr>id<br>
}<br>
<br>
d1="$(ovn-nbctl create DHCP_Options cidr=<a href="http://192.168.0.0/24" rel="noreferrer" target="_blank">192.168.0.0/24</a> \<br>
options="\"server_id\"=\"192.<wbr>168.0.1\" \"server_mac\"=\"ff:10:00:00:<wbr>00:01\"<br>
\<br>
\"lease_time\"=\"3600\" \"router\"=\"192.168.0.1\"")"<br>
<br>
ovn-nbctl lsp-set-dhcpv4-options sw0-port1 ${d1}<br>
<br>
# when you run the below command it should list the dhcp options just added<br>
ovn-nbctl list dhcp_options<br>
<br>
add_phys_port vm1 50:54:00:00:00:01 192.168.0.2 24 192.168.0.1 sw0-port1<br>
<br>
# the below command should get the ip address from the OVN<br>
ip netns exec vm1 dhclient -d vm1<br>
<br>
At this point, the DHCP request won't succeed and the error can be seen<br>
using<br>
'dmesg'.<br>
______________________________<wbr>_________________<br>
dev mailing list<br>
<a href="mailto:dev@openvswitch.org">dev@openvswitch.org</a><br>
<a href="https://mail.openvswitch.org/mailman/listinfo/ovs-dev" rel="noreferrer" target="_blank">https://mail.openvswitch.org/<wbr>mailman/listinfo/ovs-dev</a><br>
</blockquote></div><br></div></div>