[ovs-dev] [PATCH v9 0/7] OVS-DPDK flow offload with rte_flow

Shahaf Shuler shahafs at mellanox.com
Fri Jun 22 07:06:48 UTC 2018


Finn,

It is great that we are aligned w/ the finding.

I agree w/ your approach to take the relevant fields on the tcp_parse and limit the offload rules to hw based on validated action.

There was one more issue from flavio about the single core performance w/ HWOL.
He saw performance drop when enabling HWOL in a simple phy2phy loopback test.
I wasn’t able to reproduce it, in fact saw big improvement.
Have you tried it? Would really like to address this issue before the v11.
________________________________
From: Finn Christensen <fc at napatech.com>
Sent: Thursday, June 21, 2018 10:23:22 PM
To: Shahaf Shuler; Koujalagi, MalleshX; yliu at fridaylinux.org
Cc: dev at openvswitch.org; Ergin, Mesut A; Tsai, James; Olga Shern
Subject: RE: [ovs-dev] [PATCH v9 0/7] OVS-DPDK flow offload with rte_flow


Hi Shahaf,



These are exactly the same bugs I found today.

I added the calculation of the offset l3_ofs and l4_ofs into the parse_tcp_flags() and here these can be calculated more or less without additional performance penalty.

I also made it not use the MARK in emc_processing if recirc depth > 0. Then VxLan with partial hw offload worked. However, I think the MARK also could be cleared in the tunnel pop function, before recirculation. To use md_is_valid flag is a bit more confusing to me.



I very much agree with you in your concerns, however, I do believe we must bypass the miniflow_extract, because otherwise nothing is gained. Then only full offload makes sense (or any other partly full offload, not defined yet).



To make sure we do not have such an issue again, which we have not thought of, we could change the flow offload check, so that we also check on the flow action list, before offloading into HW, and only accept the actions we have tested and knows work.

Then extend while we continue improve partial hw offload to cover more and more flows. To me it would make sense, also when extending to full offload, where all actions will have to be validated and accepted anyway.



Regarding inner match, I think it may be better handled by offloading VTEP/tunnel encap/decap in HW and then also have the possibility to match on inner packet while still in HW. But that’s just my thoughts.



I’m all for starting limited, and then improve later.



/Finn



From: Shahaf Shuler <shahafs at mellanox.com>
Sent: 21. juni 2018 19:26
To: Finn Christensen <fc at napatech.com>; Koujalagi, MalleshX <malleshx.koujalagi at intel.com>; yliu at fridaylinux.org
Cc: dev at openvswitch.org; Ergin, Mesut A <mesut.a.ergin at intel.com>; Tsai, James <james.tsai at intel.com>; Olga Shern <olgas at mellanox.com>
Subject: RE: [ovs-dev] [PATCH v9 0/7] OVS-DPDK flow offload with rte_flow



Hi Finn,



Finally I was able to reproduce the error behavior w/ VXLAN traffic.



I found 2 issues related to the design being made:

  1.  The VXLAN decap requires more fields apart from the TCP header like the l3_ofs and l4_ofs which are missing because of bypassing the miniflow extract.
  2.  Tunnel packets are being recirculated after the outer headers decap (dp_netdev_recirculate). On the second processing stage the packet still carries the flow mark therefore the action for the inner headers Is also pop_tunnel.



I haven’t established yet the final way to solve those issues. My current thoughts:

  1.  Even though attractive we cannot bypass the miniflow extract. VXLAN is just one case. Probably other cases will need different fields from the flow.
  2.  Use the md_is_valid flag on dp_netdev_input__ to recognize the non-first round and to skip the mark to flow search. I don’t really like this approach because

     *   It is a bit hackish to relay on the md_is_valid. Alternatively we can clear the mbuf mark flag but this basically the same.
     *   w/ tunnel packets the massive amount of flow rules is in the inner headers. The outer doesn’t change much. w/ the current design it means we offload the less consuming part.

Having said that,  the above approach looks the fastest w/o changing the current design. We can further improve this part later.



Be happy to hear more suggestions.





From: Finn Christensen [mailto:fc at napatech.com]
Sent: Thursday, June 21, 2018 5:46 PM
To: Koujalagi, MalleshX <malleshx.koujalagi at intel.com<mailto:malleshx.koujalagi at intel.com>>; Shahaf Shuler <shahafs at mellanox.com<mailto:shahafs at mellanox.com>>; yliu at fridaylinux.org<mailto:yliu at fridaylinux.org>
Cc: dev at openvswitch.org<mailto:dev at openvswitch.org>; Ergin, Mesut A <mesut.a.ergin at intel.com<mailto:mesut.a.ergin at intel.com>>; Tsai, James <james.tsai at intel.com<mailto:james.tsai at intel.com>>
Subject: RE: [ovs-dev] [PATCH v9 0/7] OVS-DPDK flow offload with rte_flow



Hi Mallesh and Shahaf,



I have tried to reproduce the issue Mallesh is reporting, using a Napatech NIC. The result is that I’m able to reproduce the error and are not getting the decap functionality to work when using VxLan tunneling together with partial hw-offload. Mainly the VxLAN POP is not performed in OVS in my setup.

The VxLan setup I’m using is straight forward with 2 hosts and br-int and br-phy bridges, like the typical examples.



I don’t know how far you have got into debugging this, but what I se is that the vxlan pop functionality needs packet information originally gathered in the miniflow extract function, which is omitted with hw-offload. Therefore, it does not work.



I’m still investigating how this may be solved, but I wanted to hear where you are in debugging this issue, and if you already have got to a solution.



Regards,

Finn



From: Koujalagi, MalleshX <malleshx.koujalagi at intel.com<mailto:malleshx.koujalagi at intel.com>>
Sent: 20. juni 2018 00:59
To: Shahaf Shuler <shahafs at mellanox.com<mailto:shahafs at mellanox.com>>; yliu at fridaylinux.org<mailto:yliu at fridaylinux.org>; Finn Christensen <fc at napatech.com<mailto:fc at napatech.com>>
Cc: dev at openvswitch.org<mailto:dev at openvswitch.org>; Ergin, Mesut A <mesut.a.ergin at intel.com<mailto:mesut.a.ergin at intel.com>>; Tsai, James <james.tsai at intel.com<mailto:james.tsai at intel.com>>
Subject: RE: [ovs-dev] [PATCH v9 0/7] OVS-DPDK flow offload with rte_flow



Hi Shahaf,



Thanks for setting up VXLAN env. and trying to reproduce, appreciate!!



As you suggested, I moved to DPDK17.11.3 stable and OvS 2.9.0, still observed same issue.



An attached Vxlan_diagram.jpg, to get clear idea, we are seeing issue while traffic injected from Traffic Generator-->HOST B[eth1-->Vxlan-->eth0]-->Vxlan Tunnel Packet-->HOST A[eth0-->Vxlan-->eth1]-->Traffic Generator direction,  at Host A (eth0-->Vxlan-->eth1), we see Vxlan DECAP issue in case of HW-offload enabled. Vxlan Tunnel packets are not DECAP from eth0 to eth1, not observed any packet receiving @ eth1.



We injected random destination ip address(196.0.0.(0/1)),

  1.   Find tcpdump @ Host A (eth0).

07:24:12.013848 68:05:ca:33:ff:99 > 68:05:ca:33:fe:f9, ethertype IPv4 (0x0800), length 110: 172.168.1.2.46816 > 172.168.1.1.4789: VXLAN, flags [I] (0x08), vni 0

00:10:94:00:00:12 > 00:00:01:00:00:01, ethertype IPv4 (0x0800), length 60: 197.0.0.0.1024 > 196.0.0.1.1024: UDP, length 18

07:24:12.013860 68:05:ca:33:ff:99 > 68:05:ca:33:fe:f9, ethertype IPv4 (0x0800), length 110: 172.168.1.2.50892 > 172.168.1.1.4789: VXLAN, flags [I] (0x08), vni 0

00:10:94:00:00:12 > 00:00:01:00:00:01, ethertype IPv4 (0x0800), length 60: 197.0.0.0.1024 > 196.0.0.0.1024: UDP, length 18



  1.  Ovs logs.
  2.  Sure, further we will debug this issue using Skype/webex session. Please let me know good time.





Best regards

-/Mallesh





From: Shahaf Shuler [mailto:shahafs at mellanox.com]
Sent: Monday, June 18, 2018 4:05 AM
To: Koujalagi, MalleshX <malleshx.koujalagi at intel.com<mailto:malleshx.koujalagi at intel.com>>; yliu at fridaylinux.org<mailto:yliu at fridaylinux.org>; fc at napatech.com<mailto:fc at napatech.com>
Cc: dev at openvswitch.org<mailto:dev at openvswitch.org>; Ergin, Mesut A <mesut.a.ergin at intel.com<mailto:mesut.a.ergin at intel.com>>; Tsai, James <james.tsai at intel.com<mailto:james.tsai at intel.com>>
Subject: RE: [ovs-dev] [PATCH v9 0/7] OVS-DPDK flow offload with rte_flow



Mallesh,



I was finally able to setup the vxlan testing w/ OVS. Instead of using OVS on both sides I used vxlan i/f to inject the traffic on one host and run ovs with vxlan tunnel configuration you specified on the other.



I was not able to reproduce your case. Actually I see the rules are created successfully (see attached ovs log “vxlan_tep”)



Few observations:

  1.  The rule created for the VXLAN is for the outer pattern up to the UDP, hence supported by the device

     *   2018-06-18T08:11:17.466Z|00057|dpif_netdev(pmd53)|DBG|flow match: recirc_id=0,eth,udp,vlan_tci=0x0000,dl_src=e4:1d:2d:

ca:ca:6a,dl_dst=e4:1d:2d:a3:aa:74,nw_dst=2.2.2.1,nw_frag=no,tp_dst=4789

  1.  I do see segfault on mlx5 PMD on 17.11.0 and this was resolved on 17.11.3 stable.
  2.  I used MLNX_OFED_LINUX-4.3-2.0.1.0, however I don’t expect it to cause any diff



The packets being sent to the ovs w/ vxlan tunnel logic are[1]. tcpdump after the ovs logic is [2], the packet outer headers were removed as expected.



On top of all, I tried to force the validate function to fail. Indeed the flow was not created but the decap functionality still happens (see attached ovs log “vxlan_tep_force_err”)



Can you please:

  1.  Provide me the type of packets you inject and don’t see being decap.
  2.  Try w/ 17.11.3 stable to avoid any issues from the DPDK side
  3.  If you still see the issue can we set webex/skype meeting for joint debug, currently I don’t see the error you reported on.





[1]

###[ Ethernet ]###

  dst= e4:1d:2d:a3:aa:74

  src= e4:1d:2d:ca:ca:6a

  type= IPv4

###[ IP ]###

     version= 4

     ihl= None

     tos= 0x0

     len= None

     id= 1

     flags=

     frag= 0

     ttl= 64

     proto= udp

     chksum= None

     src= 2.2.2.2

     dst= 2.2.2.1

     \options\

###[ UDP ]###

        sport= 34550

        dport= 4789

        len= None

        chksum= None

###[ VXLAN ]###

           flags= I

           reserved1= 0

           vni= 1000

           reserved2= 0x0

###[ Ethernet ]###

              dst= 00:00:5e:00:01:01

              src= 18:66:da:f5:37:64

              type= IPv4

###[ IP ]###

                 version= 4

                 ihl= None

                 tos= 0x0

                 len= None

                 id= 1

                 flags=

                 frag= 0

                 ttl= 64

                 proto= tcp

                 chksum= None

                 src= 10.7.12.62

                 dst= 1.1.1.1

                 \options\

###[ TCP ]###

                    sport= ftp_data

                    dport= http

                    seq= 0

                    ack= 0

                    dataofs= None

                    reserved= 0

                    flags= S

                    window= 8192

                    chksum= None

                    urgptr= 0

                    options= {}



[2]

11:13:59.477667 18:66:da:f5:37:64 > 00:00:5e:00:01:01, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 64, id 1, off

set 0, flags [none], proto TCP (6), length 40)

    10.7.12.62.20 > 1.1.1.1.80: Flags [S], cksum 0x7738 (correct), seq 0, win 8192, length 0



--Shahaf



From: Koujalagi, MalleshX <malleshx.koujalagi at intel.com<mailto:malleshx.koujalagi at intel.com>>
Sent: Friday, June 15, 2018 9:29 PM
To: Shahaf Shuler <shahafs at mellanox.com<mailto:shahafs at mellanox.com>>; yliu at fridaylinux.org<mailto:yliu at fridaylinux.org>; fc at napatech.com<mailto:fc at napatech.com>
Cc: dev at openvswitch.org<mailto:dev at openvswitch.org>; Ergin, Mesut A <mesut.a.ergin at intel.com<mailto:mesut.a.ergin at intel.com>>; Tsai, James <james.tsai at intel.com<mailto:james.tsai at intel.com>>
Subject: RE: [ovs-dev] [PATCH v9 0/7] OVS-DPDK flow offload with rte_flow



Hi Shahaf,



Thanks for pointing NIC/protocol support.



Find inline comments:



>>When you say DECAP functionality is broken you mean the flow is not actually inserted to the HW right?

 [Mallesh]: Yes, DECAP flows are not inserted to HW.



>>The datapath should still DECAP the flow correctly.

 [Mallesh]: Agree, However the datapath failed to DECAP.

 If VXLAN protocol support in netdev_dpdk_validate_flow failed (return -1) then, should be fall back normal or default behavior of datapath right?



Have you tried w/o VXLAN tunnel rules?

[Mallesh]: Yes, tried, but same behavior.



Best regards,

-/Mallesh

Disclaimer: This email and any files transmitted with it may contain confidential information intended for the addressee(s) only. The information is not to be surrendered or copied to unauthorized persons. If you have received this communication in error, please notify the sender immediately and delete this e-mail from your system.

Disclaimer: This email and any files transmitted with it may contain confidential information intended for the addressee(s) only. The information is not to be surrendered or copied to unauthorized persons. If you have received this communication in error, please notify the sender immediately and delete this e-mail from your system.


More information about the dev mailing list