[ovs-discuss] set dpdk packet refcnt when flow output to group.

David Evans davidjoshuaevans at gmail.com
Tue Oct 20 18:41:32 UTC 2015


Thanks Daniele 

may_steal sounds like the right thing, but i couldn’t see how it’s set by the group OFPGT11_ALL code path.
Also - maybe refcnt + cloning would be a more efficient option (assuming no packet mod’s would occur after the group)

My main concern is where there are multiple packets for a flow where it may not go through this path for the 2nd and following packets.
to refcnt and or clone the packets for each output.

This use case may not really be a part of your general switching direction though. I don’t expect that OFPGT11_ALL is a really popular use case.

Thanks for replying

Dave.

> On Oct 20, 2015, at 1:33 PM, Daniele Di Proietto <diproiettod at vmware.com> wrote:
> 
> Hi,
> 
> Currently every DPDK mbuf in OVS has the `refcnt` set to one. Output to
> multiple ports is handled by making a copy of the packet's payload (see
> `may_steal` in dp_netdev_execute_actions(), and in netdev_send()).
> 
> You're right, having a `refcnt` != 1 might be necessary to use
> rte_ipv4_fragment_packet() or to support certain offloading capabilities
> (currently not implemented in OVS).
> 
> Does this answer you question?
> 
> Daniele
> 
> 
> On 15/10/2015 12:41, "Ben Pfaff" <blp at nicira.com> wrote:
> 
>> I don't understand what you're asking for.
>> 
>> Daniele or Pravin, I think that you know the DPDK datapath well.  Do you
>> understand what David wants or why?
>> 
>> On Thu, Oct 15, 2015 at 01:15:11PM -0500, David Evans wrote:
>>> Thanks Ben,
>>> 
>>> If that¹s the case, then it would be better to be adding custom action
>>> that applies prior to this group action, to update the refcnt.
>>> 
>>> I expect it just has to happen some time before the first PMD has
>>> finished processing the packet so that the packet does not get deleted
>>> by the tx routine before other PMD¹s have seen the packet.
>>> 
>>> Cheers
>>> Dave.
>>> 
>>> 
>>> 
>>>> On Oct 12, 2015, at 12:23 PM, Ben Pfaff <blp at nicira.com> wrote:
>>>> 
>>>> Your change isn't going to have much effect because most packets don't
>>>> go through the translation process.  If you try to force all packets
>>>> through translation, it will kill performance.
>>>> 
>>>> I think that you should read this paper that describes the various
>>>> caching layers in Open vSwitch:
>>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_suppo
>>> rt_papers_nsdi2015.pdf&d=BQIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt
>>> -uEs&r=SmB5nZacmXNq0gKCC1s_Cw5yUNjxgD4v5kJqZ2uWLlE&m=xaCdqbPumJKYqzipA2A5
>>> CYRDmbv1Q_lFRe2Aw2_bqpQ&s=GdugumekoH_nwJ4XnY2ip92yy-YoGNIV8Rj_tQkQ_b0&e=
>>>> 
>>>> On Mon, Oct 12, 2015 at 11:56:03AM -0500, David Evans wrote:
>>>>> Hi Ben,
>>>>> 
>>>>> When i use the OFPGT11_ALL group action, the packets for a  flow
>>> will be sent out all buckets in a group. (in my case all the buckets are
>>> ports to transmit out)
>>>>> 
>>>>> I added a group_bucket_count to the context
>>>>> and 
>>>>> in xlate_all_group fn the following.
>>>>> 
>>>>>   group_dpif_get_buckets(group, &buckets);
>>>>> +    if(ctx->group_bucket_count == 0){
>>>>> +    	LIST_FOR_EACH (bucket, list_node, buckets) {
>>>>> +    		ctx->group_bucket_count++;
>>>>> +        }
>>>>> +    }
>>>>> +    if(ctx->xin->packet)
>>>>> +    	if(ctx->xin->packet->source == DPBUF_DPDK)
>>>>> +    
>>> 		rte_pktmbuf_refcnt_update(&ctx->xin->packet->mbuf,ctx->group_bucket_cou
>>> nt);
>>>>> 	LIST_FOR_EACH (bucket, list_node, buckets) {
>>>>> 
>>>>> this stops the transmit pmd¹s attempting to free the packet until
>>> all the buckets( ports ) have transmitted it.
>>>>> My switch also does reassembly on rx - this refcnt is necessary for
>>> handling multi-segment dpdk buffers too.
>>>>> I also changed the segment free to rte_pktmbuf_free in netdev-dpdk.c
>>> for this purpose.
>>>>> I¹m expecting it will also be important for tso or the possibility
>>> of using rte_ipv4_fragment_packet on an outgoing port.
>>>>> 
>>>>> i have between 6 and 12 PMD¹s depending on the number of dpdk ports
>>> running at any time, and if i use OFPGT11_ALL with many output
>>> buckets(ports) buffers will disappear from under some pmd¹s and cause
>>> segfaults etc..
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Dave.
>>>>> 
>>>>>> On Oct 12, 2015, at 11:38 AM, Ben Pfaff <blp at nicira.com> wrote:
>>>>>> 
>>>>>> On Wed, Oct 07, 2015 at 05:36:18PM -0500, David Evans wrote:
>>>>>>> While using netdev-dpdk - When i add a rule for which the action
>>> is to
>>>>>>> send to a group (type=all) containing (x) output buckets (ports)
>>> how
>>>>>>> can i increment the dp_packet->pkt_mbuf¹s refcnt to (x) so that the
>>>>>>> packet is not deleted before it has transmitted all ports(buckets)
>>> in
>>>>>>> the group.
>>>>>>> 
>>>>>>> Perhaps in ofproto-dpif-xlate.c function xlate_all_group find the
>>>>>>> packet and apply the ctx->xin->packet->mbuf->refcnt ?  Will that
>>> work
>>>>>>> for all packets for a ctx?
>>>>>> 
>>>>>> I don't understand what relationship you expect here.  A group has
>>> no
>>>>>> direct relationship to a packet.  Translation produces a flat list
>>> of
>>>>>> simple actions that don't refer back to the group.
>>>>> 
>>> 
> 




More information about the discuss mailing list