[ovs-dev] [PATCH] dpif-netdev: Forwarding optimization for direct output flows.

Ilya Maximets i.maximets at samsung.com
Mon May 27 13:44:46 UTC 2019


On 24.05.2019 19:20, Ben Pfaff wrote:
> On Fri, May 24, 2019 at 03:18:50PM +0300, Ilya Maximets wrote:
>> There are some cases where users want to have simple forwarding or drop
>> rules for all packets received from particular port, i.e :
>>
>>   "in_port=1,actions=2"
>>   "in_port=1,actions=IN_PORT"
>>   "in_port=1,actions=drop"
>>
>> There are also cases where complex OF flows could be simplified down
>> to simple forwarding/drop datapath flows.
>>
>> In theory, we don't need to parse packets at all to follow these flows.
>> "Direct output forwarding" optimization is intended to speed up above
>> cases.
>>
>> Design:
>>
>> Due to various implementation restrictions userspace datapath has
>> following flow fields always in exact match (i.e. it's required to
>> match at least these fields of a packet even if the OF rule doesn't
>> need that):
>>
>>   - recirc_id
>>   - in_port
>>   - packet_type
>>   - dl_type
>>   - vlan_tci
>>   - nw_frag (for ip packets)
>>
>> Not all of these fields are related to packet itself. We already
>> know the current 'recirc_id' and the 'in_port' before starting the
>> packet processing. It also seems safe to assume that we're working
>> with Ethernet packets. dpif-netdev sets exact match on 'vlan_tci'
>> to avoid issues with flow format conversion and we don't really need
>> to match with it until ofproto layer didn't ask us to.
>> So, for the simple forwarding OF rule we need to match only with
>> 'dl_type' and 'nw_frag'.
>>
>> 'in_port', 'dl_type' and 'nw_frag' could be combined in a single
>> 64bit integer that could be used as a hash in hash map.
>>
>> New per-PMD flow table 'direct_output_table' introduced to store
>> direct output flows only. 'dp_netdev_flow_add' adds flow to the
>> usual 'flow_table' and to 'direct_output_table' if the flow meets
>> following constraints:
>>
>>   - 'recirc_id' in flow match is 0.
>>   - Flow wildcards originally had wildcarded 'vlan_tci'.
>>   - Flow has no actions (drop) or exactly one action equal to
>>     OVS_ACTION_ATTR_OUTPUT.
>>   - Flow wildcards contains only minimal set of non-wildcarded fields
>>     (listed above).
>>
>> If the number of flows for current 'in_port' in regular 'flow_table'
>> equals number of flows for current 'in_port' in 'direct_output_table',
>> we may use direct output optimization, because all the flows we have
>> are direct output flows. This means that we only need to parse
>> 'dl_type' and 'nw_frag' to perform packet matching.
>> Now we making the unique flow mark from the 'in_port', 'dl_type' and
>> 'nw_frag' and looking for it in 'direct_output_table'.
>> On successful lookup we don't need to make full 'miniflow_extract()'.
>>
>> Unsuccessful lookup technically means that we have no sufficient flow
>> in datapath and upcall will be required. We may optimize this path
>> in the future by bypassing the EMC, SMC and dpcls lookups in this case.
>>
>> Performance improvement of this solution on a 'direct output' flows
>> should be comparable with partial HW offloading, because it parses same
>> packet fields and uses similar flow lookup scheme.
>> However, unlike partial HW offloading, it works for all port types
>> including virtual ones.
>>
>> Signed-off-by: Ilya Maximets <i.maximets at samsung.com>
>> ---
>>
>> This patch was made as a point for "virtio-forwarder" discussion:
>>    https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/358686.html
>> However, it might be very useful by itself for usual cases too.
>>
>> Testing is very welcome. I didn't run the performance tests on real
>> systems, so I don't know the real performance impact yet.
> 
> The changes to parse_tcp_flags() look good to me.  It may be worth
> updating the comments on that function to say that the output arguments
> are set only if the function returns nonzero.
> 
> In general, I really support the idea that OVS is general-purpose but
> that we should optimize it for important use cases.  That's basically
> what we did at Nicira/VMware for network virtualization: we wanted to
> make sure that OVS was as general as possible so that many people would
> use it and adopt it, while at the same time making sure that it
> performed well for NVP/NSX.  I mean, we didn't want it to run slowly in
> other cases, obviously, but they weren't what we were internally
> benchmarking.
> 
> In my view, this patch is entirely in line with that philosophy.

Thanks!
About comments, It's a good point. I'm going to add following description:

  * 'dl_type_p' will be set only if 'packet' is an Ethernet packet.
  * 'nw_frag_p' will be set only if 'packet' is an IP packet.

It's because I need these fields even if it's not an IP or TCP packet.
I also noticed that I didn't initialize the parameters, so I might
be using them uninitialized for non-IP packets. I'll fix that too.

Few more self-review observations:
1. No need to check if 'dp_netdev_direct_output_enabled' for each packet.
   This could be done once for a batch.
2. I missed checking that packet_type equals to PT_ETH before adding to
   'direct_output_table'.

So, I'll send v2 with the following incremental:

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index b455bdc7b..993997c3b 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3435,8 +3435,10 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
                 dp_netdev_flow_hash(&flow->ufid));
     ccmap_inc(&pmd->n_flows, odp_to_u32(in_port));
 
-    if (vlan_tci_wc_faked && match->flow.recirc_id == 0 &&
-        dp_netdev_flow_is_direct_output(&match->wc, actions, actions_len)) {
+    if (vlan_tci_wc_faked
+        && match->flow.recirc_id == 0
+        && match->flow.packet_type == htonl(PT_ETH)
+        && dp_netdev_flow_is_direct_output(&match->wc, actions, actions_len)) {
         dp_netdev_direct_output_insert(pmd, flow);
     }
 
@@ -6599,6 +6601,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
     bool smc_enable_db;
     size_t map_cnt = 0;
     bool batch_enable = true;
+    bool direct_output_enabled = dp_netdev_direct_output_enabled(pmd, port_no);
 
     atomic_read_relaxed(&pmd->dp->smc_enable_db, &smc_enable_db);
     pmd_perf_update_counter(&pmd->perf_stats,
@@ -6631,9 +6634,9 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
                 }
             }
 
-            if (!flow && dp_netdev_direct_output_enabled(pmd, port_no)) {
-                ovs_be16 dl_type;
-                uint8_t nw_frag;
+            if (!flow && direct_output_enabled) {
+                ovs_be16 dl_type = 0;
+                uint8_t nw_frag = 0;
 
                 tcp_flags = parse_tcp_flags(packet, &dl_type, &nw_frag);
                 flow = dp_netdev_direct_output_lookup(pmd, port_no,
diff --git a/lib/flow.c b/lib/flow.c
index 9c4222a93..88c54a37b 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -1089,6 +1089,8 @@ parse_dl_type(const struct eth_header *data_, size_t size)
 
 /* Parses and return the TCP flags in 'packet', converted to host byte order.
  * If 'packet' is not an Ethernet packet embedding TCP, returns 0.
+ * 'dl_type_p' will be set only if 'packet' is an Ethernet packet.
+ * 'nw_frag_p' will be set only if 'packet' is an IP packet.
  *
  * The caller must ensure that 'packet' is at least ETH_HEADER_LEN bytes
  * long.'*/
---


Best regards, Ilya Maximets.


More information about the dev mailing list