[ovs-dev] [PATCH v8 06/13] dp-packet: Handle multi-seg mbufs in helper funcs.

Eelco Chaudron echaudro at redhat.com
Tue Jun 26 10:31:40 UTC 2018



On 22 Jun 2018, at 21:04, Lam, Tiago wrote:

> On 18/06/2018 12:50, Eelco Chaudron wrote:
>>
>>
>> On 11 Jun 2018, at 18:21, Tiago Lam wrote:
>>
>>> Most helper functions in dp-packet assume that the data held by a
>>> dp_packet is contiguous, and perform operations such as pointer
>>> arithmetic under that assumption. However, with the introduction of
>>> multi-segment mbufs, where data is non-contiguous, such assumptions
>>> are
>>> no longer possible. Some examples of Such helper functions are
>>> dp_packet_tail(), dp_packet_tailroom(), dp_packet_end(),
>>> dp_packet_get_allocated() and dp_packet_at().
>>>
>>> Thus, instead of assuming contiguous data in dp_packet, they  now
>>> iterate over the (non-contiguous) data in mbufs to perform their
>>> calculations.
>>>
>>> Finally, dp_packet_use__() has also been modified to perform the
>>> initialisation of the packet (and setting the source) before
>>> continuing
>>> to set its size and data length, which now depends on the type of
>>> packet.
>>>
>>> Co-authored-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
>>>
>>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
>>> Signed-off-by: Tiago Lam <tiago.lam at intel.com>
>>> ---
>>>  lib/dp-packet.c |   4 +-
>>>  lib/dp-packet.h | 242
>>> +++++++++++++++++++++++++++++++++++++++++++++-----------
>>>  2 files changed, 197 insertions(+), 49 deletions(-)
>>>
>>> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>>> index 782e7c2..2aaeaae 100644
>>> --- a/lib/dp-packet.c
>>> +++ b/lib/dp-packet.c
>>> @@ -41,11 +41,11 @@ static void
>>>  dp_packet_use__(struct dp_packet *b, void *base, size_t allocated,
>>>               enum dp_packet_source source)
>>>  {
>>> +    dp_packet_init__(b, allocated, source);
>>> +
>>>      dp_packet_set_base(b, base);
>>>      dp_packet_set_data(b, base);
>>>      dp_packet_set_size(b, 0);
>>> -
>>> -    dp_packet_init__(b, allocated, source);
>>>  }
>>>
>>>  /* Initializes 'b' as an empty dp_packet that contains the
>>> 'allocated' bytes of
>>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>>> index c301ed5..272597f 100644
>>> --- a/lib/dp-packet.h
>>> +++ b/lib/dp-packet.h
>>> @@ -133,6 +133,10 @@ static inline void *dp_packet_at(const struct
>>> dp_packet *, size_t offset,
>>>                                   size_t size);
>>>  static inline void *dp_packet_at_assert(const struct dp_packet *,
>>>                                          size_t offset, size_t 
>>> size);
>>> +#ifdef DPDK_NETDEV
>>> +static inline void * dp_packet_at_offset(const struct dp_packet *b,
>>> +                                         size_t offset);
>>> +#endif
>>>  static inline void *dp_packet_tail(const struct dp_packet *);
>>>  static inline void *dp_packet_end(const struct dp_packet *);
>>>
>>> @@ -180,40 +184,6 @@ dp_packet_delete(struct dp_packet *b)
>>>      }
>>>  }
>>>
>>> -/* If 'b' contains at least 'offset + size' bytes of data, returns 
>>> a
>>> pointer to
>>> - * byte 'offset'.  Otherwise, returns a null pointer. */
>>> -static inline void *
>>> -dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
>>> -{
>>> -    return offset + size <= dp_packet_size(b)
>>> -           ? (char *) dp_packet_data(b) + offset
>>> -           : NULL;
>>> -}
>>> -
>>> -/* Returns a pointer to byte 'offset' in 'b', which must contain at
>>> least
>>> - * 'offset + size' bytes of data. */
>>> -static inline void *
>>> -dp_packet_at_assert(const struct dp_packet *b, size_t offset, 
>>> size_t
>>> size)
>>> -{
>>> -    ovs_assert(offset + size <= dp_packet_size(b));
>>> -    return ((char *) dp_packet_data(b)) + offset;
>>> -}
>>> -
>>> -/* Returns a pointer to byte following the last byte of data in use
>>> in 'b'. */
>>> -static inline void *
>>> -dp_packet_tail(const struct dp_packet *b)
>>> -{
>>> -    return (char *) dp_packet_data(b) + dp_packet_size(b);
>>> -}
>>> -
>>> -/* Returns a pointer to byte following the last byte allocated for
>>> use (but
>>> - * not necessarily in use) in 'b'. */
>>> -static inline void *
>>> -dp_packet_end(const struct dp_packet *b)
>>> -{
>>> -    return (char *) dp_packet_base(b) + dp_packet_get_allocated(b);
>>> -}
>>> -
>>>  /* Returns the number of bytes of headroom in 'b', that is, the
>>> number of bytes
>>>   * of unused space in dp_packet 'b' before the data that is in use.
>>> (Most
>>>   * commonly, the data in a dp_packet is at its beginning, and thus
>>> the
>>> @@ -229,6 +199,14 @@ dp_packet_headroom(const struct dp_packet *b)
>>>  static inline size_t
>>>  dp_packet_tailroom(const struct dp_packet *b)
>>>  {
>>> +#ifdef DPDK_NETDEV
>>> +    if (b->source == DPBUF_DPDK) {
>>> +        struct rte_mbuf *tmbuf = CONST_CAST(struct rte_mbuf *,
>>> &b->mbuf);
>>> +        tmbuf = rte_pktmbuf_lastseg(tmbuf);
>>> +
>>
>> What if the last two chains contain no user data, we have more
>> tailroom?!? Should we not call 	(tmbuf)?
>
> I've covered this in my reply to the cover letter, hopefully it is 
> more
> clear now. This shouldn't be a possibility in the implementation. 
> Also,
> we do call rte_pktmbuf_tailroom(), just below.

I know but I do not see the reason for not calling 
rte_pktmbuf_tailroom() on the mbuf directly. This will work for all 
cases, add’s less complexity, even in the future if we move to a 
design where we will support multiple buffers. Trying to “limit” the 
usual mbuf cases does not sound right, and it will confuse people.

>>
>>> +        return rte_pktmbuf_tailroom(tmbuf);
>>> +    }
>>> +#endif
>>>      return (char *) dp_packet_end(b) - (char *) dp_packet_tail(b);
>>>  }
>>>
>>> @@ -236,8 +214,13 @@ dp_packet_tailroom(const struct dp_packet *b)
>>>  static inline void
>>>  dp_packet_clear(struct dp_packet *b)
>>>  {
>>> +#ifdef DPDK_NETDEV
>>> +    dp_packet_set_size(b, 0);
>>> +    rte_pktmbuf_reset(&b->mbuf);
>>
>> This would probably assert on none DPBUF_DPDK, do we need a “if
>> (b->source == DPBUF_DPDK)” here also?
>>
>
> We do, I'll add it to next iteration. It only asserts when using DPDK
> with `RTE_LIBRTE_MBUF_DEBUG`, but still...

Yes, but you do not want this assert when you are trying to debug 
something else ;)

>>> +#else
>>>      dp_packet_set_data(b, dp_packet_base(b));
>>>      dp_packet_set_size(b, 0);
>>> +#endif
>>>  }
>>>
>>>  /* Removes 'size' bytes from the head end of 'b', which must 
>>> contain
>>> at least
>>> @@ -252,12 +235,32 @@ dp_packet_pull(struct dp_packet *b, size_t 
>>> size)
>>>      return data;
>>>  }
>>>
>>> +#ifdef DPDK_NETDEV
>>> +/* Similar to dp_packet_try_pull() but doesn't actually pull any
>>> code, only
>>> + * returns true or false if it would be possible to actually pull 
>>> any
>>> code.
>>> + * Valid for dp_packets carrying mbufs only. */
>>> +static inline bool
>>> +dp_packet_mbuf_may_pull(const struct dp_packet *b, size_t size) {
>>> +    if (size > b->mbuf.data_len) {
>>> +        return false;
>>> +    }
>>> +
>>> +    return true;
>>> +}
>>> +#endif
>>> +
>>>  /* If 'b' has at least 'size' bytes of data, removes that many 
>>> bytes
>>> from the
>>>   * head end of 'b' and returns the first byte removed.  Otherwise,
>>> returns a
>>>   * null pointer without modifying 'b'. */
>>>  static inline void *
>>>  dp_packet_try_pull(struct dp_packet *b, size_t size)
>>>  {
>>> +#ifdef DPDK_NETDEV
>>> +    if (!dp_packet_mbuf_may_pull(b, size)) {
>>> +        return NULL;
>>> +    }
>>> +#endif
>>> +
>>>      return dp_packet_size(b) - dp_packet_l2_pad_size(b) >= size
>>>          ? dp_packet_pull(b, size) : NULL;
>>>  }
>>> @@ -311,6 +314,12 @@ dp_packet_set_l2_pad_size(struct dp_packet *b,
>>> uint8_t pad_size)
>>>  static inline void *
>>>  dp_packet_l2_5(const struct dp_packet *b)
>>>  {
>>> +#ifdef DPDK_NETDEV
>>> +    if (!dp_packet_mbuf_may_pull(b, b->l2_5_ofs)) {
>>> +        return NULL;
>>> +    }
>>> +#endif
>>> +
>>>      return b->l2_5_ofs != UINT16_MAX
>>>             ? (char *) dp_packet_data(b) + b->l2_5_ofs
>>>             : NULL;
>>> @@ -327,6 +336,12 @@ dp_packet_set_l2_5(struct dp_packet *b, void
>>> *l2_5)
>>>  static inline void *
>>>  dp_packet_l3(const struct dp_packet *b)
>>>  {
>>> +#ifdef DPDK_NETDEV
>>> +    if (!dp_packet_mbuf_may_pull(b, b->l3_ofs)) {
>>> +        return NULL;
>>> +    }
>>> +#endif
>>> +
>>
>> Can we not use the dp_packet_at_offset() function here (and the 
>> similar
>> functions)? And include the above and below checks there?
>>
>
> They are aimed at different purposes. dp_packet_at_offset() should 
> walk
> the chain of mbufs and return a pointer to the offset, while here we
> actually want to know if the offset is within the first mbuf. If these
> headers are not within the first mbuf, we don't return it as it 
> crosses
> mbufs.

My bad, I was referring to dp_packet_at(), which probably also need the 
may_pull() if there are control protocols with jumbo packets?


>>>      return b->l3_ofs != UINT16_MAX
>>>             ? (char *) dp_packet_data(b) + b->l3_ofs
>>>             : NULL;
>>> @@ -341,6 +356,12 @@ dp_packet_set_l3(struct dp_packet *b, void *l3)
>>>  static inline void *
>>>  dp_packet_l4(const struct dp_packet *b)
>>>  {
>>> +#ifdef DPDK_NETDEV
>>> +    if (!dp_packet_mbuf_may_pull(b, b->l4_ofs)) {
>>> +        return NULL;
>>> +    }
>>> +#endif
>>> +
>>>      return b->l4_ofs != UINT16_MAX
>>>             ? (char *) dp_packet_data(b) + b->l4_ofs
>>>             : NULL;
>>> @@ -355,10 +376,37 @@ dp_packet_set_l4(struct dp_packet *b, void 
>>> *l4)
>>>  static inline size_t
>>>  dp_packet_l4_size(const struct dp_packet *b)
>>>  {
>>> +#ifdef DPDK_NETDEV
>>> +    if (b->l4_ofs == UINT16_MAX) {
>>> +        return 0;
>>> +    }
>>> +
>>> +    struct rte_mbuf *hmbuf = CONST_CAST(struct rte_mbuf *, 
>>> &b->mbuf);
>>> +    struct rte_mbuf *tmbuf = dp_packet_tail(b);
>>> +
>>> +    size_t l4_size = 0;
>>> +    if (hmbuf == tmbuf) {
>>
>> Do not think this optimisation help much, as the dp_packet_tail(b) 
>> does
>> the same walk as in the "while (hmbuf)" clause below, so why not just
>> keet the else case?	
>>
>
> Yeah, good point. I'll change that in the next iteration.
>
>>> +        l4_size = hmbuf->data_len - b->l4_ofs;
>>> +    } else {
>>> +        l4_size = hmbuf->data_len - b->l4_ofs;
>>> +        hmbuf = hmbuf->next;
>>> +
>>> +        while (hmbuf) {
>>> +            l4_size += tmbuf->data_len;
>>> +
>>> +            hmbuf = hmbuf->next;
>>> +        }
>>> +    }
>>> +
>>> +    l4_size -= dp_packet_l2_pad_size(b);
>>> +
>>> +    return l4_size;
>>> +#else
>>>      return b->l4_ofs != UINT16_MAX
>>>          ? (const char *)dp_packet_tail(b) - (const char
>>> *)dp_packet_l4(b)
>>>          - dp_packet_l2_pad_size(b)
>>>          : 0;
>>> +#endif
>>>  }
>>>
>>>  static inline const void *
>>> @@ -491,7 +539,7 @@ __packet_set_data(struct dp_packet *b, uint16_t 
>>> v)
>>>  static inline uint16_t
>>>  dp_packet_get_allocated(const struct dp_packet *b)
>>>  {
>>> -    return b->mbuf.buf_len;
>>> +    return b->mbuf.nb_segs * b->mbuf.buf_len;
>>>  }
>>>
>>>  static inline void
>>> @@ -499,7 +547,107 @@ dp_packet_set_allocated(struct dp_packet *b,
>>> uint16_t s)
>>>  {
>>>      b->mbuf.buf_len = s;
>>>  }
>>> +
>>> +static inline void *
>>> +dp_packet_at_offset(const struct dp_packet *b, size_t offset)
>>> +{
>>> +    if (b->source == DPBUF_DPDK) {
>>> +        struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *,
>>> &b->mbuf);
>>> +
>>> +        while (buf && offset > buf->data_len) {
>>> +            offset -= buf->data_len;
>>> +
>>> +            buf = buf->next;
>>> +        }
>>> +        return buf ? rte_pktmbuf_mtod_offset(buf, char *, offset) :
>>> NULL;
>>> +    } else {
>>> +        return (char *) dp_packet_data(b) + offset;
>>> +    }
>>> +}
>>> +
>>> +/* If 'b' contains at least 'offset + size' bytes of data, returns 
>>> a
>>> pointer to
>>> + * byte 'offset'.  Otherwise, returns a null pointer. */
>>> +static inline void *
>>> +dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
>>> +{
>>> +    return offset + size <= dp_packet_size(b)
>>> +        ? dp_packet_at_offset(b, offset)
>>> +        : NULL;
>>> +}
>>> +
>>> +/* Returns a pointer to byte 'offset' in 'b', which must contain at
>>> least
>>> + * 'offset + size' bytes of data. */
>>> +static inline void *
>>> +dp_packet_at_assert(const struct dp_packet *b, size_t offset, 
>>> size_t
>>> size)
>>> +{
>>> +    ovs_assert(offset + size <= dp_packet_size(b));
>>> +    return dp_packet_at_offset(b, offset);
>>> +}
>>> +
>>> +/* Returns a pointer to byte following the last byte of data in use
>>> in 'b'. */
>>> +static inline void *
>>> +dp_packet_tail(const struct dp_packet *b)
>>> +{
>>> +    struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
>>> +    if (b->source == DPBUF_DPDK) {
>>> +        /* Find last segment where data ends, meaning the tail of 
>>> the
>>> chained
>>> +           mbufs is there */
>>> +        buf = rte_pktmbuf_lastseg(buf);
>>> +    }
>>> +
>>
>> What if the last two chains contains no user data, we should find the
>> last buffer with data.
>> Looks like we assume only the last in the chain contains room, and we
>> achieve this by stripping off buffers (but we never add them again 
>> when
>> needed, so something is wrong in this implementation).
>>
>
> Does my reply to the cover letter help here as well? That's a valid
> point, that if the size is decreased and mbufs are freed then mbufs 
> are
> not allocated again if size needs to be incremented. But as explained,
> this is to enforce that data is contiguous and so the tail points to 
> the
> last mbuf. I haven't yet encountered cases where this would lead to
> complications, but maybe you have something in mind?

I feel like we should not design the general APIs to force the current 
assumed limitation.
As rte_pktmbuf_lastseg() is already walking the three, we could do it 
here instead and checking for the mailroom.


More information about the dev mailing list