[ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case

pravin shelar pshelar at ovn.org
Fri Apr 1 19:58:47 UTC 2016


On Thu, Mar 31, 2016 at 9:06 PM, Jesse Gross <jesse at kernel.org> wrote:
> On Thu, Mar 31, 2016 at 2:30 PM, Pravin B Shelar <pshelar at ovn.org> wrote:
>> STT implementation we saw performance improvements with linearizing
>> skb for SLUB case.  So following patch skips zero copy operation
>> for such a case.
>>
>> Tested-By: Vasmi Abidi <vabidi at vmware.com>
>> Signed-off-by: Pravin B Shelar <pshelar at ovn.org>
>
> Can you add some performance numbers to the commit message?
>
ok.

>> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
>> index eb397e8..ae33d5e 100644
>> --- a/datapath/linux/compat/stt.c
>> +++ b/datapath/linux/compat/stt.c
>>  static int coalesce_skb(struct sk_buff **headp)
>>  {
>> -       struct sk_buff *frag, *head, *prev;
>> +       struct sk_buff *frag, *head;
>> +#ifndef SKIP_ZERO_COPY
>> +       struct sk_buff *prev;
>> +#endif
>>         int err;
>>
>>         err = straighten_frag_list(headp);
>
> I don't think that we need to straighten the frag list in the
> SKIP_ZERO_COPY case. It's basically just undoing what the for loop
> that forms the frag list will do. __skb_linearize() will be able to
> handle it in any case.
>
I can skip straightening and work on skb with frag-list, but I do not
want to complicate the code.  This case is pretty rare after the
change in reassemble where complete skb is allocated on first set
segment.

>> +#ifdef SKIP_ZERO_COPY
>> +       if (skb_shinfo(head)->frag_list) {
>> +               err = __skb_linearize(head);
>> +               return err;
>> +       }
>> +#endif
>
> Maybe move this to try_to_segment()? It seems like it is a little more
> consistent.
>
But it is not same. I think we only need to linearize of there is
frag-list, AFAIK non linear data in skb_shinfo can be safely handled
in skb-segment.

>>  static int segment_skb(struct sk_buff **headp, bool csum_partial,
>>                        bool ipv4, bool tcp, int l4_offset)
>>  {
>> +#ifndef SKIP_ZERO_COPY
>>         int err;
>>
>>         err = coalesce_skb(headp);
>> @@ -543,6 +575,7 @@ static int segment_skb(struct sk_buff **headp, bool csum_partial,
>>         if (skb_shinfo(*headp)->frag_list)
>>                 return __try_to_segment(*headp, csum_partial,
>>                                         ipv4, tcp, l4_offset);
>> +#endif
>>         return 0;
>>  }
>
> Is this OK? It will retain a skb with a frag_list on the transmit path
> where this wasn't possible before. This used to cause kernel panics
> since the skb geometry wasn't even when the packet went to be
> segmented. I don't remember if this is still the case but if not then
> it seems like we should be able to simply the code regardless of this
> patch.
>
right, I missed it. I will keep the segmentation here.

>> @@ -1093,6 +1141,15 @@ static struct sk_buff *reassemble(struct sk_buff *skb)
>>
>>         frag = lookup_frag(dev_net(skb->dev), stt_percpu, &key, hash);
>>         if (!frag->skbs) {
>> +               int err;
>> +
>> +               err = pskb_expand_head(skb, skb_headroom(skb),
>> +                                      skb->data_len + tot_len, GFP_ATOMIC);
>> +               if (likely(!err)) {
>> +                       if (unlikely(!__pskb_pull_tail(skb, skb->data_len)))
>> +                               BUG();
>> +               }
>
> This linearizes the packet even in non-SKIP_ZERO_COPY cases. I guess
> we probably don't want to do that. It's also possible that the first
> skb received isn't necessarily the first packet in the reassembled
> packet.
>
ok, I will fix it.

> This is effectively optimizing for the case where all packets are
> large and will generate a frag list after reassembly. However, it's
> possible in many situations that packets can be reassembled with zero
> copy that doesn't later need to be split (maybe the total packet is
> around 20k). Do you know how the performance compares vs. just
> linearizing at the end in that case? Is there a benefit to doing an
> early copy?
>
We are not trying to coalesce skb in case of SKIP_ZERO_COPY, so any
partial skb segment will generate multiple inner segments. I have seen
better performance with this approach rather than linearizing at the
end.

>> @@ -1154,8 +1216,10 @@ static struct sk_buff *reassemble(struct sk_buff *skb)
>>
>>         FRAG_CB(frag->skbs)->first.set_ecn_ce |= INET_ECN_is_ce(iph->tos);
>>         FRAG_CB(frag->skbs)->first.rcvd_len += skb->len;
>> -       FRAG_CB(frag->skbs)->first.mem_used += skb->truesize;
>> -       stt_percpu->frag_mem_used += skb->truesize;
>> +       if (!copied_skb) {
>> +               FRAG_CB(frag->skbs)->first.mem_used += skb->truesize;
>> +               stt_percpu->frag_mem_used += skb->truesize;
>> +       }
>
> pskb_expand_head() doesn't actually change the truesize so our count
> will be more inaccurate than before. However, we don't have to worry
> about the attack case of very small packets consuming a large amount
> of memory due to having many copies of struct sk_buff.

ok. I will fix true-size accounting.



More information about the dev mailing list