[ovs-dev] [PATCH v7] datapath: Add Stateless TCP Tunneling protocol.

Pravin Shelar pshelar at nicira.com
Thu Apr 23 03:55:20 UTC 2015


On Wed, Apr 22, 2015 at 8:50 PM, Jesse Gross <jesse at nicira.com> wrote:
> On Wed, Apr 22, 2015 at 8:37 PM, Pravin Shelar <pshelar at nicira.com> wrote:
>> On Wed, Apr 22, 2015 at 8:29 PM, Jesse Gross <jesse at nicira.com> wrote:
>>> On Wed, Apr 22, 2015 at 8:22 PM, Pravin Shelar <pshelar at nicira.com> wrote:
>>>> On Wed, Apr 22, 2015 at 7:48 PM, Jesse Gross <jesse at nicira.com> wrote:
>>>>> On Wed, Apr 22, 2015 at 3:38 PM, Pravin Shelar <pshelar at nicira.com> wrote:
>>>>>> On Wed, Apr 22, 2015 at 3:24 PM, Jesse Gross <jesse at nicira.com> wrote:
>>>>>>> On Wed, Apr 22, 2015 at 3:19 PM, Pravin Shelar <pshelar at nicira.com> wrote:
>>>>>>>> On Wed, Apr 22, 2015 at 11:31 AM, Jesse Gross <jesse at nicira.com> wrote:
>>>>>>>>> On Wed, Apr 22, 2015 at 11:09 AM, Pravin Shelar <pshelar at nicira.com> wrote:
>>>>>>>>>> On Wed, Apr 22, 2015 at 10:52 AM, Jesse Gross <jesse at nicira.com> wrote:
>>>>>>>>>>> On Wed, Apr 22, 2015 at 10:03 AM, Pravin Shelar <pshelar at nicira.com> wrote:
>>>>>>>>>>>> On Mon, Apr 20, 2015 at 9:54 PM, Jesse Gross <jesse at nicira.com> wrote:
>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 7:33 PM, Pravin Shelar <pshelar at nicira.com> wrote:
>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 5:56 PM, Jesse Gross <jesse at nicira.com> wrote:
>>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 3:54 PM, Pravin Shelar <pshelar at nicira.com> wrote:
>>>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 3:36 PM, Jesse Gross <jesse at nicira.com> wrote:
>>>>>>>>>>>>>>>>> On Mon, Apr 20, 2015 at 1:29 PM, Pravin B Shelar <pshelar at nicira.com> wrote:
>>>>>>>>>>>>>>>>>> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
>>>>>>>>>>>>>>>>>> new file mode 100644
>>>>>>>>>>>>>>>>>> index 0000000..209bf1a
>>>>>>>>>>>>>>>>>> --- /dev/null
>>>>>>>>>>>>>>>>>> +++ b/datapath/linux/compat/stt.c
>>>>>>>>>>>>>>>>>> +static void update_headers(struct sk_buff *skb, bool head,
>>>>>>>>>>>>>>>>>> +                              unsigned int l4_offset, unsigned int hdr_len,
>>>>>>>>>>>>>>>>>> +                              bool ipv4, u32 tcp_seq)
>>>>>>>>>>>>>>>>> [...]
>>>>>>>>>>>>>>>>>> +       skb->truesize = SKB_TRUESIZE(skb_end_offset(skb)) + skb->data_len;
>>>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I wonder if there are any possible edge cases with resetting truesize
>>>>>>>>>>>>>>>>> where the packet is still in someone's transmit queue (such as if we
>>>>>>>>>>>>>>>>> are looping back packet). Do we need to orphan it first?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ok, I will orphan it in update_headers.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Just to clarify - I was mostly just thinking aloud on orphaning it.
>>>>>>>>>>>>>>> I'm not totally sure if that is the right thing to do or if this is
>>>>>>>>>>>>>>> the right place to do it. I'm not sure what the conceptual
>>>>>>>>>>>>>>> justification would be for it and it could potentially result in the
>>>>>>>>>>>>>>> sender's buffers not being properly limited. Perhaps not resetting the
>>>>>>>>>>>>>>> truesize is the right thing too...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have seen warning msg if we do no keep truesize update along with
>>>>>>>>>>>>>> changes to skb.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hmm, interesting, what is the warning? I don't think that I have seen
>>>>>>>>>>>>> that before.
>>>>>>>>>>>>
>>>>>>>>>>>> Actually skb_try_coalesce() is updating it correctly. so there no need
>>>>>>>>>>>> to change truesize anymore. I will update patch accordingly.
>>>>>>>>>>>
>>>>>>>>>>> That's much nicer. I also checked and other receive side code (like
>>>>>>>>>>> TCP input) doesn't worry about the case where a local sender may still
>>>>>>>>>>> be accounting for the packet since any type of loopback device does
>>>>>>>>>>> call skb_orphan() in some form.
>>>>>>>>>>>
>>>>>>>>>>> I hate to bring this up but what about on transmit? In cases where we
>>>>>>>>>>> merge or split skbs (skb_try_coalesce() and normalize_frag_list()
>>>>>>>>>>> respectively) we do track the truesize for correctly for the result
>>>>>>>>>>> but the individual pieces might not have the right destructors or
>>>>>>>>>>> might not have their truesize updated for the destructor they do have.
>>>>>>>>>>
>>>>>>>>>> How about about we update merged skb stats (len, data_len, truesize)
>>>>>>>>>> according to *delta_truesize we get from skb_try_coalesce() and then
>>>>>>>>>> free the skb?
>>>>>>>>>
>>>>>>>>> I think that would work for the skb_try_coalesce() case (although I
>>>>>>>>> would only worry about truesize, not the lengths). For
>>>>>>>>> normalize_frag_list() I think we would either have to add a destructor
>>>>>>>>> or not update truesize. I'm not sure that the condition where we have
>>>>>>>>> frag_lists and a destructor actually ever happens in practice though.
>>>>>>>>
>>>>>>>> I am not sure why do we need a destructor for normalize_frag_list
>>>>>>>> changes. Even with the changes in the function truesize is consistent
>>>>>>>> for given skb memory usage.
>>>>>>>
>>>>>>> If you have a packet with a frag_list, all of the memory for the
>>>>>>> individual elements will be accounted for in the truesize in the top
>>>>>>> level skb. This skb could also be accounted to some socket and have a
>>>>>>> destructor. When we break apart the list, we remove the truesize from
>>>>>>> the head because the memory is now part of individual packets.
>>>>>>> However, the destructor is presumably only on the head and so only its
>>>>>>> memory will be removed from the socket accounting when it is freed but
>>>>>>> not each of the other skbs that came from it.
>>>>>>
>>>>>> ok. In that case we can not have our own destructor since there is one
>>>>>> already (I am not sure if we can use skb->cb to restore original). not
>>>>>> changing true size can complicate skb coalesce, since it does update
>>>>>> truesize. Easy option would be orphan skb if we are going to coalesce
>>>>>> its fragments.
>>>>>
>>>>> I'm not sure that we need our own destructor. What do you think about
>>>>> just replicating the original destructor onto each of the newly
>>>>> generated skbs?
>>>>>
>>>> In that case we assume there is no state associated with the skb, that
>>>> might not be always true.
>>>
>>> What state do you mean? If you mean a destructor on the individual
>>> skbs, I think that is already true because only the top level
>>> destructor will get called when the original skb is freed.
>>
>> If destructor is replicated on the individual skbs then it will be
>> called for each of those skbs when it is freed.
>
> Right; if we've correctly apportioned truesize among the individual
> skbs isn't that the goal?

It works for destructors which only care for skb-truesize. But
destructor callback also takes argument, we also need to replicate
that and we do not have enough information to do it.



More information about the dev mailing list