[ovs-discuss] set dpdk packet refcnt when flow output to group.

Daniele Di Proietto diproiettod at vmware.com
Tue Oct 20 20:00:46 UTC 2015



On 20/10/2015 11:41, "David Evans" <davidjoshuaevans at gmail.com> wrote:

>Thanks Daniele 
>
>may_steal sounds like the right thing, but i couldn’t see how it’s set by
>the group OFPGT11_ALL code path.
>Also - maybe refcnt + cloning would be a more efficient option (assuming
>no packet mod’s would occur after the group)

The datapath (dpif-netdev, in this case) doesn't deal with OFPGT11_ALL (or
any OpenFlow), it just deals with a list of what we call ODP (or datapath)
actions.

The ofproto layer translates OpenFlow actions into ODP (or datapath)
actions. If a packet need to multiple destination, the ODP action list
will contain more than one OVS_ACTION_ATTR_OUTPUT.

The code in odp_execute_actions() (called by the datapath), contains the
logic to properly set `may_steal` for each call to dp_execute_cb(): only
if there are no more actions `may_steal` is set to true, otherwise it will
be false and the packet will be copied eventually.

If you're not familiar with the distinction between the ofproto layer and
the datapath, I would advise taking a look at the paper that Ben suggested.

>My main concern is where there are multiple packets for a flow where it
>may not go through this path for the 2nd and following packets.
>to refcnt and or clone the packets for each output.

I'm not sure I understand your concern.

>This use case may not really be a part of your general switching
>direction though. I don’t expect that OFPGT11_ALL is a really popular use
>case.
>
>Thanks for replying
>
>Dave.
>
>> On Oct 20, 2015, at 1:33 PM, Daniele Di Proietto
>><diproiettod at vmware.com> wrote:
>> 
>> Hi,
>> 
>> Currently every DPDK mbuf in OVS has the `refcnt` set to one. Output to
>> multiple ports is handled by making a copy of the packet's payload (see
>> `may_steal` in dp_netdev_execute_actions(), and in netdev_send()).
>> 
>> You're right, having a `refcnt` != 1 might be necessary to use
>> rte_ipv4_fragment_packet() or to support certain offloading capabilities
>> (currently not implemented in OVS).
>> 
>> Does this answer you question?
>> 
>> Daniele
>> 
>> 
>> On 15/10/2015 12:41, "Ben Pfaff" <blp at nicira.com> wrote:
>> 
>>> I don't understand what you're asking for.
>>> 
>>> Daniele or Pravin, I think that you know the DPDK datapath well.  Do
>>>you
>>> understand what David wants or why?
>>> 
>>> On Thu, Oct 15, 2015 at 01:15:11PM -0500, David Evans wrote:
>>>> Thanks Ben,
>>>> 
>>>> If that¹s the case, then it would be better to be adding custom action
>>>> that applies prior to this group action, to update the refcnt.
>>>> 
>>>> I expect it just has to happen some time before the first PMD has
>>>> finished processing the packet so that the packet does not get deleted
>>>> by the tx routine before other PMD¹s have seen the packet.
>>>> 
>>>> Cheers
>>>> Dave.
>>>> 
>>>> 
>>>> 
>>>>> On Oct 12, 2015, at 12:23 PM, Ben Pfaff <blp at nicira.com> wrote:
>>>>> 
>>>>> Your change isn't going to have much effect because most packets
>>>>>don't
>>>>> go through the translation process.  If you try to force all packets
>>>>> through translation, it will kill performance.
>>>>> 
>>>>> I think that you should read this paper that describes the various
>>>>> caching layers in Open vSwitch:
>>>>> 
>>>> 
>>>>https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_sup
>>>>po
>>>> 
>>>>rt_papers_nsdi2015.pdf&d=BQIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNt
>>>>Xt
>>>> 
>>>>-uEs&r=SmB5nZacmXNq0gKCC1s_Cw5yUNjxgD4v5kJqZ2uWLlE&m=xaCdqbPumJKYqzipA2
>>>>A5
>>>> 
>>>>CYRDmbv1Q_lFRe2Aw2_bqpQ&s=GdugumekoH_nwJ4XnY2ip92yy-YoGNIV8Rj_tQkQ_b0&e
>>>>=
>>>>> 
>>>>> On Mon, Oct 12, 2015 at 11:56:03AM -0500, David Evans wrote:
>>>>>> Hi Ben,
>>>>>> 
>>>>>> When i use the OFPGT11_ALL group action, the packets for a  flow
>>>> will be sent out all buckets in a group. (in my case all the buckets
>>>>are
>>>> ports to transmit out)
>>>>>> 
>>>>>> I added a group_bucket_count to the context
>>>>>> and 
>>>>>> in xlate_all_group fn the following.
>>>>>> 
>>>>>>   group_dpif_get_buckets(group, &buckets);
>>>>>> +    if(ctx->group_bucket_count == 0){
>>>>>> +    	LIST_FOR_EACH (bucket, list_node, buckets) {
>>>>>> +    		ctx->group_bucket_count++;
>>>>>> +        }
>>>>>> +    }
>>>>>> +    if(ctx->xin->packet)
>>>>>> +    	if(ctx->xin->packet->source == DPBUF_DPDK)
>>>>>> +    
>>>> 
>>>>		rte_pktmbuf_refcnt_update(&ctx->xin->packet->mbuf,ctx->group_bucket_c
>>>>ou
>>>> nt);
>>>>>> 	LIST_FOR_EACH (bucket, list_node, buckets) {
>>>>>> 
>>>>>> this stops the transmit pmd¹s attempting to free the packet until
>>>> all the buckets( ports ) have transmitted it.
>>>>>> My switch also does reassembly on rx - this refcnt is necessary for
>>>> handling multi-segment dpdk buffers too.
>>>>>> I also changed the segment free to rte_pktmbuf_free in netdev-dpdk.c
>>>> for this purpose.
>>>>>> I¹m expecting it will also be important for tso or the possibility
>>>> of using rte_ipv4_fragment_packet on an outgoing port.
>>>>>> 
>>>>>> i have between 6 and 12 PMD¹s depending on the number of dpdk ports
>>>> running at any time, and if i use OFPGT11_ALL with many output
>>>> buckets(ports) buffers will disappear from under some pmd¹s and cause
>>>> segfaults etc..
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Dave.
>>>>>> 
>>>>>>> On Oct 12, 2015, at 11:38 AM, Ben Pfaff <blp at nicira.com> wrote:
>>>>>>> 
>>>>>>> On Wed, Oct 07, 2015 at 05:36:18PM -0500, David Evans wrote:
>>>>>>>> While using netdev-dpdk - When i add a rule for which the action
>>>> is to
>>>>>>>> send to a group (type=all) containing (x) output buckets (ports)
>>>> how
>>>>>>>> can i increment the dp_packet->pkt_mbuf¹s refcnt to (x) so that
>>>>>>>>the
>>>>>>>> packet is not deleted before it has transmitted all ports(buckets)
>>>> in
>>>>>>>> the group.
>>>>>>>> 
>>>>>>>> Perhaps in ofproto-dpif-xlate.c function xlate_all_group find the
>>>>>>>> packet and apply the ctx->xin->packet->mbuf->refcnt ?  Will that
>>>> work
>>>>>>>> for all packets for a ctx?
>>>>>>> 
>>>>>>> I don't understand what relationship you expect here.  A group has
>>>> no
>>>>>>> direct relationship to a packet.  Translation produces a flat list
>>>> of
>>>>>>> simple actions that don't refer back to the group.
>>>>>> 
>>>> 
>> 
>



More information about the discuss mailing list