[ovs-dev] [PATCH v8 06/13] dp-packet: Handle multi-seg mbufs in helper funcs.

Lam, Tiago tiago.lam at intel.com
Wed Jun 27 10:09:23 UTC 2018


On 26/06/2018 11:31, Eelco Chaudron wrote:
> 
> 
> On 22 Jun 2018, at 21:04, Lam, Tiago wrote:
> 
>> On 18/06/2018 12:50, Eelco Chaudron wrote:
>>>
>>>
>>> On 11 Jun 2018, at 18:21, Tiago Lam wrote:
>>>
>>>> Most helper functions in dp-packet assume that the data held by a
>>>> dp_packet is contiguous, and perform operations such as pointer
>>>> arithmetic under that assumption. However, with the introduction of
>>>> multi-segment mbufs, where data is non-contiguous, such assumptions
>>>> are
>>>> no longer possible. Some examples of Such helper functions are
>>>> dp_packet_tail(), dp_packet_tailroom(), dp_packet_end(),
>>>> dp_packet_get_allocated() and dp_packet_at().
>>>>
>>>> Thus, instead of assuming contiguous data in dp_packet, they  now
>>>> iterate over the (non-contiguous) data in mbufs to perform their
>>>> calculations.
>>>>
>>>> Finally, dp_packet_use__() has also been modified to perform the
>>>> initialisation of the packet (and setting the source) before
>>>> continuing
>>>> to set its size and data length, which now depends on the type of
>>>> packet.
>>>>
>>>> Co-authored-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
>>>>
>>>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
>>>> Signed-off-by: Tiago Lam <tiago.lam at intel.com>
>>>> ---
>>>>  lib/dp-packet.c |   4 +-
>>>>  lib/dp-packet.h | 242
>>>> +++++++++++++++++++++++++++++++++++++++++++++-----------
>>>>  2 files changed, 197 insertions(+), 49 deletions(-)
>>>>
>>>> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>>>> index 782e7c2..2aaeaae 100644
>>>> --- a/lib/dp-packet.c
>>>> +++ b/lib/dp-packet.c
>>>> @@ -41,11 +41,11 @@ static void
>>>>  dp_packet_use__(struct dp_packet *b, void *base, size_t allocated,
>>>>               enum dp_packet_source source)
>>>>  {
>>>> +    dp_packet_init__(b, allocated, source);
>>>> +
>>>>      dp_packet_set_base(b, base);
>>>>      dp_packet_set_data(b, base);
>>>>      dp_packet_set_size(b, 0);
>>>> -
>>>> -    dp_packet_init__(b, allocated, source);
>>>>  }
>>>>
>>>>  /* Initializes 'b' as an empty dp_packet that contains the
>>>> 'allocated' bytes of
>>>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>>>> index c301ed5..272597f 100644
>>>> --- a/lib/dp-packet.h
>>>> +++ b/lib/dp-packet.h
>>>> @@ -133,6 +133,10 @@ static inline void *dp_packet_at(const struct
>>>> dp_packet *, size_t offset,
>>>>                                   size_t size);
>>>>  static inline void *dp_packet_at_assert(const struct dp_packet *,
>>>>                                          size_t offset, size_t 
>>>> size);
>>>> +#ifdef DPDK_NETDEV
>>>> +static inline void * dp_packet_at_offset(const struct dp_packet *b,
>>>> +                                         size_t offset);
>>>> +#endif
>>>>  static inline void *dp_packet_tail(const struct dp_packet *);
>>>>  static inline void *dp_packet_end(const struct dp_packet *);
>>>>
>>>> @@ -180,40 +184,6 @@ dp_packet_delete(struct dp_packet *b)
>>>>      }
>>>>  }
>>>>
>>>> -/* If 'b' contains at least 'offset + size' bytes of data, returns 
>>>> a
>>>> pointer to
>>>> - * byte 'offset'.  Otherwise, returns a null pointer. */
>>>> -static inline void *
>>>> -dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
>>>> -{
>>>> -    return offset + size <= dp_packet_size(b)
>>>> -           ? (char *) dp_packet_data(b) + offset
>>>> -           : NULL;
>>>> -}
>>>> -
>>>> -/* Returns a pointer to byte 'offset' in 'b', which must contain at
>>>> least
>>>> - * 'offset + size' bytes of data. */
>>>> -static inline void *
>>>> -dp_packet_at_assert(const struct dp_packet *b, size_t offset, 
>>>> size_t
>>>> size)
>>>> -{
>>>> -    ovs_assert(offset + size <= dp_packet_size(b));
>>>> -    return ((char *) dp_packet_data(b)) + offset;
>>>> -}
>>>> -
>>>> -/* Returns a pointer to byte following the last byte of data in use
>>>> in 'b'. */
>>>> -static inline void *
>>>> -dp_packet_tail(const struct dp_packet *b)
>>>> -{
>>>> -    return (char *) dp_packet_data(b) + dp_packet_size(b);
>>>> -}
>>>> -
>>>> -/* Returns a pointer to byte following the last byte allocated for
>>>> use (but
>>>> - * not necessarily in use) in 'b'. */
>>>> -static inline void *
>>>> -dp_packet_end(const struct dp_packet *b)
>>>> -{
>>>> -    return (char *) dp_packet_base(b) + dp_packet_get_allocated(b);
>>>> -}
>>>> -
>>>>  /* Returns the number of bytes of headroom in 'b', that is, the
>>>> number of bytes
>>>>   * of unused space in dp_packet 'b' before the data that is in use.
>>>> (Most
>>>>   * commonly, the data in a dp_packet is at its beginning, and thus
>>>> the
>>>> @@ -229,6 +199,14 @@ dp_packet_headroom(const struct dp_packet *b)
>>>>  static inline size_t
>>>>  dp_packet_tailroom(const struct dp_packet *b)
>>>>  {
>>>> +#ifdef DPDK_NETDEV
>>>> +    if (b->source == DPBUF_DPDK) {
>>>> +        struct rte_mbuf *tmbuf = CONST_CAST(struct rte_mbuf *,
>>>> &b->mbuf);
>>>> +        tmbuf = rte_pktmbuf_lastseg(tmbuf);
>>>> +
>>>
>>> What if the last two chains contain no user data, we have more
>>> tailroom?!? Should we not call 	(tmbuf)?
>>
>> I've covered this in my reply to the cover letter, hopefully it is 
>> more
>> clear now. This shouldn't be a possibility in the implementation. 
>> Also,
>> we do call rte_pktmbuf_tailroom(), just below.
> 
> I know but I do not see the reason for not calling 
> rte_pktmbuf_tailroom() on the mbuf directly. This will work for all 
> cases, add’s less complexity, even in the future if we move to a 
> design where we will support multiple buffers. Trying to “limit” the 
> usual mbuf cases does not sound right, and it will confuse people.
>

I think there's some confusion here. rte_pktmbuf_tailroom() doesn't
calculate the case above where "the last two chains contain no user
data". It only calculates the tailroom for the passed mbuf.

In fact, most of the rte_pktmbuf_* functions does not deal with such
cases. They only do the calculations for the passed mbuf, and it is up
to the user of the library to add logic on top (using helper functions
such as rte_pktmbuf_lastseg() to traverse the chain). Since this series
doesn't allow for an mbuf in the chain to be free of data after the
tail, I didn't add the extra logic.

Hopefully this clarifies why we are calling rte_pktmbuf_tailroom() in
the tail mbuf only.

Tiago.

>>>
>>>> +        return rte_pktmbuf_tailroom(tmbuf);
>>>> +    }
>>>> +#endif
>>>>      return (char *) dp_packet_end(b) - (char *) dp_packet_tail(b);
>>>>  }
>>>>
>>>> @@ -236,8 +214,13 @@ dp_packet_tailroom(const struct dp_packet *b)
>>>>  static inline void
>>>>  dp_packet_clear(struct dp_packet *b)
>>>>  {
>>>> +#ifdef DPDK_NETDEV
>>>> +    dp_packet_set_size(b, 0);
>>>> +    rte_pktmbuf_reset(&b->mbuf);
>>>
>>> This would probably assert on none DPBUF_DPDK, do we need a “if
>>> (b->source == DPBUF_DPDK)” here also?
>>>
>>
>> We do, I'll add it to next iteration. It only asserts when using DPDK
>> with `RTE_LIBRTE_MBUF_DEBUG`, but still...
> 
> Yes, but you do not want this assert when you are trying to debug 
> something else ;)
> 
>>>> +#else
>>>>      dp_packet_set_data(b, dp_packet_base(b));
>>>>      dp_packet_set_size(b, 0);
>>>> +#endif
>>>>  }
>>>>
>>>>  /* Removes 'size' bytes from the head end of 'b', which must 
>>>> contain
>>>> at least
>>>> @@ -252,12 +235,32 @@ dp_packet_pull(struct dp_packet *b, size_t 
>>>> size)
>>>>      return data;
>>>>  }
>>>>
>>>> +#ifdef DPDK_NETDEV
>>>> +/* Similar to dp_packet_try_pull() but doesn't actually pull any
>>>> code, only
>>>> + * returns true or false if it would be possible to actually pull 
>>>> any
>>>> code.
>>>> + * Valid for dp_packets carrying mbufs only. */
>>>> +static inline bool
>>>> +dp_packet_mbuf_may_pull(const struct dp_packet *b, size_t size) {
>>>> +    if (size > b->mbuf.data_len) {
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    return true;
>>>> +}
>>>> +#endif
>>>> +
>>>>  /* If 'b' has at least 'size' bytes of data, removes that many 
>>>> bytes
>>>> from the
>>>>   * head end of 'b' and returns the first byte removed.  Otherwise,
>>>> returns a
>>>>   * null pointer without modifying 'b'. */
>>>>  static inline void *
>>>>  dp_packet_try_pull(struct dp_packet *b, size_t size)
>>>>  {
>>>> +#ifdef DPDK_NETDEV
>>>> +    if (!dp_packet_mbuf_may_pull(b, size)) {
>>>> +        return NULL;
>>>> +    }
>>>> +#endif
>>>> +
>>>>      return dp_packet_size(b) - dp_packet_l2_pad_size(b) >= size
>>>>          ? dp_packet_pull(b, size) : NULL;
>>>>  }
>>>> @@ -311,6 +314,12 @@ dp_packet_set_l2_pad_size(struct dp_packet *b,
>>>> uint8_t pad_size)
>>>>  static inline void *
>>>>  dp_packet_l2_5(const struct dp_packet *b)
>>>>  {
>>>> +#ifdef DPDK_NETDEV
>>>> +    if (!dp_packet_mbuf_may_pull(b, b->l2_5_ofs)) {
>>>> +        return NULL;
>>>> +    }
>>>> +#endif
>>>> +
>>>>      return b->l2_5_ofs != UINT16_MAX
>>>>             ? (char *) dp_packet_data(b) + b->l2_5_ofs
>>>>             : NULL;
>>>> @@ -327,6 +336,12 @@ dp_packet_set_l2_5(struct dp_packet *b, void
>>>> *l2_5)
>>>>  static inline void *
>>>>  dp_packet_l3(const struct dp_packet *b)
>>>>  {
>>>> +#ifdef DPDK_NETDEV
>>>> +    if (!dp_packet_mbuf_may_pull(b, b->l3_ofs)) {
>>>> +        return NULL;
>>>> +    }
>>>> +#endif
>>>> +
>>>
>>> Can we not use the dp_packet_at_offset() function here (and the 
>>> similar
>>> functions)? And include the above and below checks there?
>>>
>>
>> They are aimed at different purposes. dp_packet_at_offset() should 
>> walk
>> the chain of mbufs and return a pointer to the offset, while here we
>> actually want to know if the offset is within the first mbuf. If these
>> headers are not within the first mbuf, we don't return it as it 
>> crosses
>> mbufs.
> 
> My bad, I was referring to dp_packet_at(), which probably also need the 
> may_pull() if there are control protocols with jumbo packets?
> 
> 
>>>>      return b->l3_ofs != UINT16_MAX
>>>>             ? (char *) dp_packet_data(b) + b->l3_ofs
>>>>             : NULL;
>>>> @@ -341,6 +356,12 @@ dp_packet_set_l3(struct dp_packet *b, void *l3)
>>>>  static inline void *
>>>>  dp_packet_l4(const struct dp_packet *b)
>>>>  {
>>>> +#ifdef DPDK_NETDEV
>>>> +    if (!dp_packet_mbuf_may_pull(b, b->l4_ofs)) {
>>>> +        return NULL;
>>>> +    }
>>>> +#endif
>>>> +
>>>>      return b->l4_ofs != UINT16_MAX
>>>>             ? (char *) dp_packet_data(b) + b->l4_ofs
>>>>             : NULL;
>>>> @@ -355,10 +376,37 @@ dp_packet_set_l4(struct dp_packet *b, void 
>>>> *l4)
>>>>  static inline size_t
>>>>  dp_packet_l4_size(const struct dp_packet *b)
>>>>  {
>>>> +#ifdef DPDK_NETDEV
>>>> +    if (b->l4_ofs == UINT16_MAX) {
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    struct rte_mbuf *hmbuf = CONST_CAST(struct rte_mbuf *, 
>>>> &b->mbuf);
>>>> +    struct rte_mbuf *tmbuf = dp_packet_tail(b);
>>>> +
>>>> +    size_t l4_size = 0;
>>>> +    if (hmbuf == tmbuf) {
>>>
>>> Do not think this optimisation help much, as the dp_packet_tail(b) 
>>> does
>>> the same walk as in the "while (hmbuf)" clause below, so why not just
>>> keet the else case?	
>>>
>>
>> Yeah, good point. I'll change that in the next iteration.
>>
>>>> +        l4_size = hmbuf->data_len - b->l4_ofs;
>>>> +    } else {
>>>> +        l4_size = hmbuf->data_len - b->l4_ofs;
>>>> +        hmbuf = hmbuf->next;
>>>> +
>>>> +        while (hmbuf) {
>>>> +            l4_size += tmbuf->data_len;
>>>> +
>>>> +            hmbuf = hmbuf->next;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    l4_size -= dp_packet_l2_pad_size(b);
>>>> +
>>>> +    return l4_size;
>>>> +#else
>>>>      return b->l4_ofs != UINT16_MAX
>>>>          ? (const char *)dp_packet_tail(b) - (const char
>>>> *)dp_packet_l4(b)
>>>>          - dp_packet_l2_pad_size(b)
>>>>          : 0;
>>>> +#endif
>>>>  }
>>>>
>>>>  static inline const void *
>>>> @@ -491,7 +539,7 @@ __packet_set_data(struct dp_packet *b, uint16_t 
>>>> v)
>>>>  static inline uint16_t
>>>>  dp_packet_get_allocated(const struct dp_packet *b)
>>>>  {
>>>> -    return b->mbuf.buf_len;
>>>> +    return b->mbuf.nb_segs * b->mbuf.buf_len;
>>>>  }
>>>>
>>>>  static inline void
>>>> @@ -499,7 +547,107 @@ dp_packet_set_allocated(struct dp_packet *b,
>>>> uint16_t s)
>>>>  {
>>>>      b->mbuf.buf_len = s;
>>>>  }
>>>> +
>>>> +static inline void *
>>>> +dp_packet_at_offset(const struct dp_packet *b, size_t offset)
>>>> +{
>>>> +    if (b->source == DPBUF_DPDK) {
>>>> +        struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *,
>>>> &b->mbuf);
>>>> +
>>>> +        while (buf && offset > buf->data_len) {
>>>> +            offset -= buf->data_len;
>>>> +
>>>> +            buf = buf->next;
>>>> +        }
>>>> +        return buf ? rte_pktmbuf_mtod_offset(buf, char *, offset) :
>>>> NULL;
>>>> +    } else {
>>>> +        return (char *) dp_packet_data(b) + offset;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* If 'b' contains at least 'offset + size' bytes of data, returns 
>>>> a
>>>> pointer to
>>>> + * byte 'offset'.  Otherwise, returns a null pointer. */
>>>> +static inline void *
>>>> +dp_packet_at(const struct dp_packet *b, size_t offset, size_t size)
>>>> +{
>>>> +    return offset + size <= dp_packet_size(b)
>>>> +        ? dp_packet_at_offset(b, offset)
>>>> +        : NULL;
>>>> +}
>>>> +
>>>> +/* Returns a pointer to byte 'offset' in 'b', which must contain at
>>>> least
>>>> + * 'offset + size' bytes of data. */
>>>> +static inline void *
>>>> +dp_packet_at_assert(const struct dp_packet *b, size_t offset, 
>>>> size_t
>>>> size)
>>>> +{
>>>> +    ovs_assert(offset + size <= dp_packet_size(b));
>>>> +    return dp_packet_at_offset(b, offset);
>>>> +}
>>>> +
>>>> +/* Returns a pointer to byte following the last byte of data in use
>>>> in 'b'. */
>>>> +static inline void *
>>>> +dp_packet_tail(const struct dp_packet *b)
>>>> +{
>>>> +    struct rte_mbuf *buf = CONST_CAST(struct rte_mbuf *, &b->mbuf);
>>>> +    if (b->source == DPBUF_DPDK) {
>>>> +        /* Find last segment where data ends, meaning the tail of 
>>>> the
>>>> chained
>>>> +           mbufs is there */
>>>> +        buf = rte_pktmbuf_lastseg(buf);
>>>> +    }
>>>> +
>>>
>>> What if the last two chains contains no user data, we should find the
>>> last buffer with data.
>>> Looks like we assume only the last in the chain contains room, and we
>>> achieve this by stripping off buffers (but we never add them again 
>>> when
>>> needed, so something is wrong in this implementation).
>>>
>>
>> Does my reply to the cover letter help here as well? That's a valid
>> point, that if the size is decreased and mbufs are freed then mbufs 
>> are
>> not allocated again if size needs to be incremented. But as explained,
>> this is to enforce that data is contiguous and so the tail points to 
>> the
>> last mbuf. I haven't yet encountered cases where this would lead to
>> complications, but maybe you have something in mind?
> 
> I feel like we should not design the general APIs to force the current 
> assumed limitation.
> As rte_pktmbuf_lastseg() is already walking the three, we could do it 
> here instead and checking for the mailroom.
> 


More information about the dev mailing list