[ovs-dev] [PATCH] stt: linearize for CONFIG_SLUB case
pravin shelar
pshelar at ovn.org
Fri Apr 1 19:58:47 UTC 2016
On Thu, Mar 31, 2016 at 9:06 PM, Jesse Gross <jesse at kernel.org> wrote:
> On Thu, Mar 31, 2016 at 2:30 PM, Pravin B Shelar <pshelar at ovn.org> wrote:
>> STT implementation we saw performance improvements with linearizing
>> skb for SLUB case. So following patch skips zero copy operation
>> for such a case.
>>
>> Tested-By: Vasmi Abidi <vabidi at vmware.com>
>> Signed-off-by: Pravin B Shelar <pshelar at ovn.org>
>
> Can you add some performance numbers to the commit message?
>
ok.
>> diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c
>> index eb397e8..ae33d5e 100644
>> --- a/datapath/linux/compat/stt.c
>> +++ b/datapath/linux/compat/stt.c
>> static int coalesce_skb(struct sk_buff **headp)
>> {
>> - struct sk_buff *frag, *head, *prev;
>> + struct sk_buff *frag, *head;
>> +#ifndef SKIP_ZERO_COPY
>> + struct sk_buff *prev;
>> +#endif
>> int err;
>>
>> err = straighten_frag_list(headp);
>
> I don't think that we need to straighten the frag list in the
> SKIP_ZERO_COPY case. It's basically just undoing what the for loop
> that forms the frag list will do. __skb_linearize() will be able to
> handle it in any case.
>
I can skip straightening and work on skb with frag-list, but I do not
want to complicate the code. This case is pretty rare after the
change in reassemble where complete skb is allocated on first set
segment.
>> +#ifdef SKIP_ZERO_COPY
>> + if (skb_shinfo(head)->frag_list) {
>> + err = __skb_linearize(head);
>> + return err;
>> + }
>> +#endif
>
> Maybe move this to try_to_segment()? It seems like it is a little more
> consistent.
>
But it is not same. I think we only need to linearize of there is
frag-list, AFAIK non linear data in skb_shinfo can be safely handled
in skb-segment.
>> static int segment_skb(struct sk_buff **headp, bool csum_partial,
>> bool ipv4, bool tcp, int l4_offset)
>> {
>> +#ifndef SKIP_ZERO_COPY
>> int err;
>>
>> err = coalesce_skb(headp);
>> @@ -543,6 +575,7 @@ static int segment_skb(struct sk_buff **headp, bool csum_partial,
>> if (skb_shinfo(*headp)->frag_list)
>> return __try_to_segment(*headp, csum_partial,
>> ipv4, tcp, l4_offset);
>> +#endif
>> return 0;
>> }
>
> Is this OK? It will retain a skb with a frag_list on the transmit path
> where this wasn't possible before. This used to cause kernel panics
> since the skb geometry wasn't even when the packet went to be
> segmented. I don't remember if this is still the case but if not then
> it seems like we should be able to simply the code regardless of this
> patch.
>
right, I missed it. I will keep the segmentation here.
>> @@ -1093,6 +1141,15 @@ static struct sk_buff *reassemble(struct sk_buff *skb)
>>
>> frag = lookup_frag(dev_net(skb->dev), stt_percpu, &key, hash);
>> if (!frag->skbs) {
>> + int err;
>> +
>> + err = pskb_expand_head(skb, skb_headroom(skb),
>> + skb->data_len + tot_len, GFP_ATOMIC);
>> + if (likely(!err)) {
>> + if (unlikely(!__pskb_pull_tail(skb, skb->data_len)))
>> + BUG();
>> + }
>
> This linearizes the packet even in non-SKIP_ZERO_COPY cases. I guess
> we probably don't want to do that. It's also possible that the first
> skb received isn't necessarily the first packet in the reassembled
> packet.
>
ok, I will fix it.
> This is effectively optimizing for the case where all packets are
> large and will generate a frag list after reassembly. However, it's
> possible in many situations that packets can be reassembled with zero
> copy that doesn't later need to be split (maybe the total packet is
> around 20k). Do you know how the performance compares vs. just
> linearizing at the end in that case? Is there a benefit to doing an
> early copy?
>
We are not trying to coalesce skb in case of SKIP_ZERO_COPY, so any
partial skb segment will generate multiple inner segments. I have seen
better performance with this approach rather than linearizing at the
end.
>> @@ -1154,8 +1216,10 @@ static struct sk_buff *reassemble(struct sk_buff *skb)
>>
>> FRAG_CB(frag->skbs)->first.set_ecn_ce |= INET_ECN_is_ce(iph->tos);
>> FRAG_CB(frag->skbs)->first.rcvd_len += skb->len;
>> - FRAG_CB(frag->skbs)->first.mem_used += skb->truesize;
>> - stt_percpu->frag_mem_used += skb->truesize;
>> + if (!copied_skb) {
>> + FRAG_CB(frag->skbs)->first.mem_used += skb->truesize;
>> + stt_percpu->frag_mem_used += skb->truesize;
>> + }
>
> pskb_expand_head() doesn't actually change the truesize so our count
> will be more inaccurate than before. However, we don't have to worry
> about the attack case of very small packets consuming a large amount
> of memory due to having many copies of struct sk_buff.
ok. I will fix true-size accounting.
More information about the dev
mailing list