[ovs-dev] [RFC] [PATCH 00/11] Data Path Classifier Offloading

Finn Christensen fc at napatech.com
Thu Jul 6 10:04:51 UTC 2017


It seems that this patch implements a partial hw-offload using DPDK RTE FLOW api.
Using a flow_tag to circumvent the EMC, and get the flow from NIC-delivered flow_tag on input.

In general I think that RTE FLOW is capable of doing more, like full flow offload, and therefore we should aim at a more full implementation of RTE FLOW usage in OVS. We may not always be in the output datapath in OVS, but OVS should still be fully updated and in control. 
At least we will need that.
The RTE FLOW feature: rte_flow_query() can be used to retrieve NIC flow statistics. And that is what we need here.

Furthermore, we would also like to have the possibility to offload virtual ports (at least at some point later), so that the in_port may not be the native pmd port_id. This feature is also available in RTE FLOW - item type: RTE_FLOW_ITEM_TYPE_PORT
And again, this RTE FLOW feature is more or less, what is needed for this. 

Besides that, I have some questions about this implementation. It is probably me that have not read the code carefully enough, but maybe you could clarify them for me:

You build an item array out of the flow wildcard mask:
...
    if (memcmp(&mask->dl_dst,&eth_mac,sizeof(struct eth_addr))!=0
    || memcmp(&mask->dl_src,&eth_mac,sizeof(struct eth_addr))!=0) {
        VLOG_INFO("rte_item_eth\n");
        item_any_flow[ii++].set = rte_item_set_eth;
        *buf_size += sizeof(struct rte_flow_item_eth);
        *buf_size += sizeof(struct rte_flow_item_eth);
        if (mask->nw_src!=0 || mask->nw_dst!=0) {
            VLOG_INFO("rte_item_ip\n");
            item_any_flow[ii++].set = rte_item_set_ip;
            *buf_size += sizeof(struct rte_flow_item_ipv4);
            *buf_size += sizeof(struct rte_flow_item_ipv4);
            if (mask->tp_dst!=0 || mask->tp_src!=0) {
                item_any_flow[ii++].set = rte_item_set_udp;
                *buf_size += sizeof(struct rte_flow_item_udp);
                *buf_size += sizeof(struct rte_flow_item_udp);
            }
        }
    }
Taken from the function hw_pipeline_item_array_build().

I would like to know how you make sure that all important bits in the flow/mask does match the flow that you are about to program into the NIC?
Theoretically, would it be possible to program an UDP RTE FLOW filter in NIC by using a TCP flow?


It seems as if the flows are never programmed into NIC:

From hw_pipeline_thread():
...
    while (1) {
        // listen to read_socket :
        // call the rte_flow_create ( flow , wildcard mask)
        ret = hw_pipeline_msg_queue_dequeue(msgq,&ptr_rule);
        if (ret != 0) {
            continue;
        }
        if (ptr_rule.mode == HW_PIPELINE_REMOVE_RULE) {
            ret =hw_pipeline_remove_flow(dp,&ptr_rule.data.rm_flow);
            if (OVS_UNLIKELY(ret)) {
                VLOG_ERR(" hw_pipeline_remove_flow failed to remove flow  \n");
            }
        }
        ptr_rule.mode = HW_PIPELINE_NO_RULE;
    }
Shouldn't hw_pipeline_insert_flow() function be called?


Finn Christensen


-----Original Message-----
From: ovs-dev-bounces at openvswitch.org [mailto:ovs-dev-bounces at openvswitch.org] On Behalf Of Shachar Beiser
Sent: 6. juli 2017 06:33
To: ovs-dev at openvswitch.org
Subject: Re: [ovs-dev] [RFC] [PATCH 00/11] Data Path Classifier Offloading

Hi ,

     I would like to clarify since I did not add the label [RFC] to the subject of my patches , but only to the cover letter that All the patches I have sent were sent as a RFC:

[RFC] [PATCH 00/11] Data Path Classifier Offloading [RFC][PATCH 1/11] ovs/dp-cls ....
[RFC][PATCH 2/11] ovs/dp-cls ....
[RFC] [PATCH 3/11] ovs/dp-cls ....
[RFC] [PATCH 4/11] ovs/dp-cls ....
[RFC] [PATCH 5/11] ovs/dp-cls ....
[RFC][PATCH 6/11] ovs/dp-cls ....
[RFC] [PATCH 7/11] ovs/dp-cls ....
[RFC] [PATCH 8/11] ovs/dp-cls ....
[RFC] [PATCH 9/11] ovs/dp-cls ....
[RFC] [PATCH 10/11] ovs/dp-cls ....
[RFC] [PATCH 11/11] ovs/dp-cls ....

                -Shachar Beiser


-----Original Message-----
From: Shachar Beiser [mailto:shacharbe at mellanox.com]
Sent: Wednesday, July 5, 2017 3:27 PM
To: ovs-dev at openvswitch.org
Cc: Shachar Beiser <shacharbe at mellanox.com>; Mark Bloch <markb at mellanox.com>; Olga Shern <olgas at mellanox.com>
Subject: [PATCH 00/11] Data Path Classifier Offloading

Request for comments Subject: OVS data-path classifier offload 

Versions:
  OVS master.
  DPDK 17.05.1.

Purpose & Scope
This RFC describes the flows’ hardware offloading over DPDK .
The motivation of hardware flows’ offloading is accelerating the OVS DPDK data plane.The classification is done by the hardware.
A flow tag that represents the matched rule in the hardware, received by the OVS and saves the lookup time.
OVS data-path classifier has to support additional functionality.
If the hardware supports flows table offloading and the user activates the feature,the classifier rules are offloaded to the hardware classifier in addition to the data-path classifier.
When flows are removed from the classifier the flows have to be removed from the hardware flows table.
In the OVS classification data-path there are 4 stages:

Read packets from DPDK-PMD by OVS.
Find Matching flows.
Group packets by flows.
Execute action.
The suggested design intervene in the first two stages the 3rd and 4th stages left untouched.
OVS additional code new file lib/hw-pipeline.c new file lib/hw-pipeline.h

The hw-pipeline.c implements three significant objects.
An offloading thread, a pipe and flow tag pool.
Inserting and removing classifiers rules to/from the hardware requires time which is translated to latency. A new OVS thread takes care of the insertion/deletion of rules and prevents blocking the PMD thread context.
A pipe ( mkpipe() ) transfers the classifier rules to the new thread context.
For each classifier rule, it is required to add a unique flow tag.
This same tag is attached later to the packet metadata by DPDK PMD if there is a match in the hardware. OVS uses this tag as an index to find the relevant software flow.
The lookup processing becomes very efficient .In order to generate a unique tag, the design is introducing a new flow tag pool.
A flow tag pool is a list implemented in an array.
The flow tag is generated and returned efficiently.

OVS changes

lib/dpif-netdev.c changes:

1) ceate_dp_netdev() initializes the new flow tag pool data base, the classifier offload thread and pipe. 
2) dp_netdev_free() frees the new flow tag pool data base,  the classifier offload thread and pipe. 
3) dp_netdev_pmd_remove_flow() if the feature exists and active,  the function sends the classifier rule through the mkpipe to new thread.
The new thread removes the rule from the hardware.
4) dp_netdev_flow_add() if the feature exists and active,  the function sends the classifier rule through the mkpipe to new thread.
The new thread inserts the rule to the hardware.
5) emc_processing() if the feature exists and active & OVS received a valid tag from the hardware: OVS skips emc processing.
6) fast_path_processing() if the feature exists and active and the function  received a valid tag from the hardware:
a.OVS looks for the tag that is attached to the packet metadata and then OVS finds the flow according to the tag.
b.The functions: 
	dp_netdev_pmd_lookup_flow(), dpcls_lookup() are not called by OVS.
c.Group packets by flow and executing the action are done 
	the same way as before.

In the file lib/netdev-dpdk.c:

struct netdev_class is enhanced .
The structure has additional pointer to a function called get_pipeline().
A new function is introduced called netdev_dpdk_get_pipeline.
The function netdev_dpdk_get_pipeline reads the received tag from the hardware if there is a match between the packet and the inserted rules.
if (mbuf->ol_flags & PKT_RX_FDIR_ID) {
 ppl_md->id = HW_OFFLOAD_PIPELINE;
 ppl_md->flow_tag = mbuf->hash.fdir.hi;
}

The source code:
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenvswitch%2Fovs%2Fcompare%2Fmaster...shacharbe%3Adp-cls-offload-no-tunnel-rfc%3Fexpand%3D1&data=02%7C01%7Cshacharbe%40mellanox.com%7C3c9dc6562d8c4ad8322708d4c3a1560a%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636348545082600173&sdata=UBVDp8NWPNwnUaZfm0SvyTutTmaR49QJy1lbqITA6q0%3D&reserved=0
References Intel work: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.ozlabs.org%2Fpatch%2F701623%2F&data=02%7C01%7Cshacharbe%40mellanox.com%7C3c9dc6562d8c4ad8322708d4c3a1560a%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636348545082600173&sdata=FzNGca72ySTJRcy1Dv5EHp9wbpRyyHslBWcBa0onuI4%3D&reserved=0
OVS presentation: fall 2016 summit:
https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fopenvswitch.org%2Fsupport%2Fovscon2016%2F7%2F1450-stringer.pdf&data=02%7C01%7Cshacharbe%40mellanox.com%7C3c9dc6562d8c4ad8322708d4c3a1560a%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C1%7C636348545082610175&sdata=wal%2Fhe3jwegxQ8Y2SFsiNay%2BfocTl8lq%2FcZ8YXWK72I%3D&reserved=0


Shachar Beiser (11):
  ovs/dp-cls: fetching the mark id from hw
  ovs/dp-cls: moving structures to dpif-netdev.h
  ovs/dp-cls: saving rx queue identifier
  ovs/dp-cls: initializing HW pipeline
  ovs/dp-cls: free HW pipeline
  ovs/dp-cls: remove data-path classifier rule
  ovs/dp-cls: inserting data-path classifier rule
  ovs/dp-cls: tag lookup and processing
  ovs/dp-cls: flow tag read
  ovs/dp-cls: removing flow from HW in the dp-cls offload thread
  ovs/dp-cls: inserting rule to HW from offloading thread context

 lib/automake.mk       |    4 +-
 lib/dp-packet.h       |    1 +
 lib/dpif-netdev.c     |  300 +++++--------
 lib/dpif-netdev.h     |  298 ++++++++++++-
 lib/hw-pipeline.c     | 1146 +++++++++++++++++++++++++++++++++++++++++++++++++
 lib/hw-pipeline.h     |   48 +++
 lib/netdev-bsd.c      |    1 +
 lib/netdev-dpdk.c     |   49 +++
 lib/netdev-dpdk.h     |   22 +-
 lib/netdev-dummy.c    |    1 +
 lib/netdev-linux.c    |    1 +
 lib/netdev-provider.h |    7 +-
 lib/netdev-vport.c    |    1 +
 lib/netdev.c          |    2 +
 14 files changed, 1684 insertions(+), 197 deletions(-)  create mode 100644 lib/hw-pipeline.c  create mode 100644 lib/hw-pipeline.h

--
1.8.3.1

_______________________________________________
dev mailing list
dev at openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


More information about the dev mailing list