[ovs-dev] [PATCH v2 05/14] dp-packet: Fix data_len handling multi-seg mbufs.

Eelco Chaudron echaudro at redhat.com
Thu Jul 5 10:03:07 UTC 2018



On 5 Jul 2018, at 11:54, Lam, Tiago wrote:

> On 05/07/2018 09:59, Eelco Chaudron wrote:
>>
>>
>> On 4 Jul 2018, at 20:06, Tiago Lam wrote:
>>
>>> When a dp_packet is from a DPDK source, and it contains multi-segment
>>> mbufs, the data_len is not equal to the packet size, pkt_len. Instead,
>>> the data_len of each mbuf in the chain should be considered while
>>> distributing the new (provided) size.
>>>
>>> To account for the above dp_packet_set_size() has been changed so
>>> that,
>>> in the multi-segment mbufs case, only the data_len on the last mbuf of
>>> the chain and the total size of the packet, pkt_len, are changed. The
>>> data_len on the intermediate mbufs preceeding the last mbuf is not
>>> changed by dp_packet_set_size(). Furthermore, in some cases
>>> dp_packet_set_size() may be used to set a smaller size than the
>>> current
>>> packet size, thus effectively trimming the end of the packet. In the
>>> multi-segment mbufs case this may lead to lingering mbufs that may
>>> need
>>> freeing.
>>>
>>> __dp_packet_set_data() now also updates an mbufs' data_len after
>>> setting
>>> the data offset. This is so that both fields are always in sync for
>>> each
>>> mbuf in a chain.
>>>
>>> Co-authored-by: Michael Qiu <qiudayu at chinac.com>
>>> Co-authored-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
>>> Co-authored-by: Przemyslaw Lal <przemyslawx.lal at intel.com>
>>> Co-authored-by: Marcin Ksiadz <mksiadz at gmail.com>
>>> Co-authored-by: Yuanhan Liu <yliu at fridaylinux.org>
>>>
>>> Signed-off-by: Michael Qiu <qiudayu at chinac.com>
>>> Signed-off-by: Mark Kavanagh <mark.b.kavanagh at intel.com>
>>> Signed-off-by: Przemyslaw Lal <przemyslawx.lal at intel.com>
>>> Signed-off-by: Marcin Ksiadz <mksiadz at gmail.com>
>>> Signed-off-by: Yuanhan Liu <yliu at fridaylinux.org>
>>> Signed-off-by: Tiago Lam <tiago.lam at intel.com>
>>> ---
>>>  lib/dp-packet.h | 76
>>> ++++++++++++++++++++++++++++++++++++++++++++++++---------
>>>  1 file changed, 64 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
>>> index c612ad4..fcbaf60 100644
>>> --- a/lib/dp-packet.h
>>> +++ b/lib/dp-packet.h
>>> @@ -429,17 +429,49 @@ dp_packet_size(const struct dp_packet *b)
>>>  static inline void
>>>  dp_packet_set_size(struct dp_packet *b, uint32_t v)
>>>  {
>>> -    /* netdev-dpdk does not currently support segmentation;
>>> consequently, for
>>> -     * all intents and purposes, 'data_len' (16 bit) and 'pkt_len'
>>> (32 bit) may
>>> -     * be used interchangably.
>>> -     *
>>> -     * On the datapath, it is expected that the size of packets
>>> -     * (and thus 'v') will always be <= UINT16_MAX; this means that
>>> there is no
>>> -     * loss of accuracy in assigning 'v' to 'data_len'.
>>> -     */
>>> -    b->mbuf.data_len = (uint16_t)v;  /* Current seg length. */
>>> -    b->mbuf.pkt_len = v;             /* Total length of all segments
>>> linked to
>>> -                                      * this segment. */
>>> +    if (b->source == DPBUF_DPDK) {
>>> +        struct rte_mbuf *mbuf = &b->mbuf;
>>> +        uint16_t new_len = v;
>>> +        uint16_t data_len;
>>> +        uint16_t nb_segs = 0;
>>> +        uint16_t pkt_len = 0;
>>> +
>>> +        /* Trim 'v' length bytes from the end of the chained buffers,
>>> freeing
>>> +           any buffers that may be left floating */
>>> +        while (mbuf) {
>>> +            data_len = MIN(new_len, mbuf->data_len);
>>> +            mbuf->data_len = data_len;
>>> +
>>> +            if (new_len - data_len <= 0) {
>>> +                /* Free the rest of chained mbufs */
>>> +                free_dpdk_buf(CONTAINER_OF(mbuf->next, struct
>>> dp_packet,
>>> +                                           mbuf));
>>> +                mbuf->next = NULL;
>>> +            } else if (!mbuf->next) {
>>> +                /* Don't assign more than what we have available */
>>> +                mbuf->data_len = MIN(new_len,
>>> +                                     mbuf->buf_len - mbuf->data_off);
>>> +            }
>>> +
>>> +            new_len -= data_len;
>>> +            nb_segs += 1;
>>> +            pkt_len += mbuf->data_len;
>>> +            mbuf = mbuf->next;
>>> +        }
>>> +
>>> +        /* pkt_len != v would effectively mean that pkt_len < than
>>> 'v' (as
>>> +         * being bigger is logically impossible). Being < than 'v'
>>> would mean
>>> +         * the 'v' provided was bigger than the available room, which
>>> is the
>>> +         * responsibility of the caller to make sure there is enough
>>> room */
>>> +        ovs_assert(pkt_len == v);
>>> +
>>> +        b->mbuf.nb_segs = nb_segs;
>>> +        b->mbuf.pkt_len = pkt_len;
>>> +    } else {
>>> +        b->mbuf.data_len = v;
>>> +        /* Total length of all segments linked to this segment. */
>>> +        b->mbuf.pkt_len = v;
>>> +    }
>>>  }
>>>
>>>  static inline uint16_t
>>> @@ -451,7 +483,27 @@ __packet_data(const struct dp_packet *b)
>>>  static inline void
>>>  __packet_set_data(struct dp_packet *b, uint16_t v)
>>>  {
>>> -    b->mbuf.data_off = v;
>>> +    if (b->source == DPBUF_DPDK) {
>>> +        /* Moving data_off away from the first mbuf in the chain is
>>> not a
>>> +         * possibility using DPBUF_DPDK dp_packets */
>>> +        ovs_assert(v == UINT16_MAX || v <= b->mbuf.buf_len -
>>> b->mbuf.data_off);
>>
>> Are you ok this check is ok? Should it not just be “.. || v<=
>> b->mbuf.buf_len”?
>> Assuming you have a 1K buffer, data_off is currently 520, now you want
>> to set the offset to 530?
>>
>> 530 <= 1024 - 520 —> ASSERT!
>>
>
> You're right, this won't assert correctly. I somehow forgot that 'v' is
> already passed as an offset to the base address, and not a value to be
> incremented to `data_off`, hence why I was checking on the available
> tailroom.
>
> I'll prepare v3 and run some more tests with it before sending here.

Perfect, if this is the only change for v3, I’ll ack the series!

Cheers,

Eelco


More information about the dev mailing list