[ovs-dev] [PATCH v5 08/14] dp-packet: Handle multi-seg mbufs in resize__().

Lam, Tiago tiago.lam at intel.com
Mon Jul 16 08:37:46 UTC 2018


On 13/07/2018 18:54, Darrell Ball wrote:
> Thanks for the patch.
> 
> A few queries inline.
> 

Hi Darrell,

Thanks for your inputs. I've replied in-line as well.

> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam at intel.com
> <mailto:tiago.lam at intel.com>> wrote:
> 
>     When enabled with DPDK OvS relies on mbufs allocated by mempools to
>     receive and output data on DPDK ports. Until now, each OvS dp_packet has
>     had only one mbuf associated, which is allocated with the maximum
>     possible size, taking the MTU into account. This approach, however,
>     doesn't allow us to increase the allocated size in an mbuf, if needed,
>     since an mbuf is allocated and initialised upon mempool creation. Thus,
>     in the current implementatin this is dealt with by calling
>     OVS_NOT_REACHED() and terminating OvS.
> 
>     To avoid this, and allow the (already) allocated space to be better
>     used, dp_packet_resize__() now tries to use the available room, both the
>     tailroom and the headroom, to make enough space for the new data. Since
>     this happens for packets of source DPBUF_DPDK, the single-segment mbuf
>     case mentioned above is also covered by this new aproach in resize__().
> 
>     Signed-off-by: Tiago Lam <tiago.lam at intel.com
>     <mailto:tiago.lam at intel.com>>
>     Acked-by: Eelco Chaudron <echaudro at redhat.com
>     <mailto:echaudro at redhat.com>>
>     ---
>      lib/dp-packet.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
>      1 file changed, 46 insertions(+), 2 deletions(-)
> 
>     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>     index d6e19eb..87af459 100644
>     --- a/lib/dp-packet.c
>     +++ b/lib/dp-packet.c
>     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, size_t
>     new_headroom, size_t new_tailroom
>          new_allocated = new_headroom + dp_packet_size(b) + new_tailroom;
> 
>          switch (b->source) {
>     +    /* When resizing mbufs, both a single mbuf and multi-segment
>     mbufs (where
>     +     * data is not contigously held in memory), both the headroom
>     and the
>     +     * tailroom available will be used to make more space for where
>     data needs
>     +     * to be inserted. I.e if there's not enough headroom, data may
>     be shifted
>     +     * right if there's enough tailroom.
>     +     * However, this is not bulletproof and in some cases the space
>     available
>     +     * won't be enough - in those cases, an error should be
>     returned and the
>     +     * packet dropped. */
>          case DPBUF_DPDK:
>     -        OVS_NOT_REACHED();
> 
> 
> Previously, it was a coding error to call this function for a DPDK mbuf
> case, which is pretty
> clear. But with this patch, presumably that is not longer the case and
> the calling the API is
> now ok for DPDK mbufs.
> 

As it stands, it will still be an error to call dp_packet_resize__() for
any DPDK packet, or by extension any of the other functions that call
it, such as dp_packet_prealloc_tailroom() and
dp_packet_prealloc_headroom(). This patch only tries to alleviate that
by accommodating space from the headroom or tailroom, if possible, and
create just enough space for the new data. My preferred approach would
be to return an error if not possible, but since the API doesn't deal
with errors as is, the previous behavior of manually asserting was left
as is. As reported in [1] (I comment more on that below), the behavior
of manually asserting can lead to undesired behavior in some use cases.

>  
> 
>     +    {
>     +        size_t miss_len;
>     +
>     +        if (new_headroom == dp_packet_headroom(b)) {
>     +            /* This is a tailroom adjustment. Since there's no
>     tailroom space
>     +             * left, try and shift data towards the head to free up
>     tail space,
>     +             * if there's enough headroom */
>     +
>     +            miss_len = new_tailroom - dp_packet_tailroom(b);
>     +
>     +            if (miss_len <= new_headroom) {
>     +                dp_packet_shift(b, -miss_len);
>     +            } else {
>     +                /* XXX: Handle error case and report error to caller */
>     +                OVS_NOT_REACHED();
> 
> 
> 
> This will not just drop the packet, it will fail the daemon, because a
> packet cannot be resized.
> If the system is completely depleted of memory, that may be ok, but in
> the case, no such
> assumption is implied.
> 
> Also, why is XXX still left in the patch?
> 

Because there's still work to do in that regard. The whole process
shouldn't be brought down if there's not enough space to put some data
in one single packet. However, this was intentionally left out of this
series or otherwise it would increase its complexity considerably.

As others have pointed out in [1], this is not a simple change, which
would have to be propagated to higher levels in other parts of the code
base. I've proposed an alternative (vs refactoring the whole dp_packet
API to handle and return errors) in [2], but that seems to have gone
stale. Going forward I see that approach merging with this new piece in
dp_packet_resize__(), where an error can be returned to the caller if
there's not enough space.

> Also, the preexisting API handles two cases:
> 1/ Tailroom only adjustment
> 2/ headroom and/or tailroom adjustment
> 
> meaning it handles all cases.
>
> The new DPDK addition (part of the same API) defines 2 cases
> 
> 1/ tailroom only adjustment
> 2/ headroom only adjustment
> 
> So, it looks like a different API, that also does not handle all cases.
> 
>  

You have a point there, support for point 2/ "headroom and tailroom
adjustment" is missed. It doesn't seem to be used anywhere at the
moment, the only callers being dp_packet_prealloc_tailroom() and
dp_packet_prealloc_headroom(), but I'll submit an incremental patch to
deal with this. Thanks for pointing it out.

Tiago.

[1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html
[2] https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html

> 
>     +            }
>     +        } else {
>     +            /* Otherwise, this is a headroom adjustment. Try to
>     shift data
>     +             * towards the tail to free up head space, if there's
>     enough
>     +             * tailroom */
>     +
>     +            miss_len = new_headroom - dp_packet_headroom(b);
> 
>     +
>     +            if (miss_len <= new_tailroom) {
>     +                dp_packet_shift(b, miss_len);
>     +            } else {
>     +                /* XXX: Handle error case and report error to caller */
>     +                OVS_NOT_REACHED();
> 
> 
> 
> same comments as above.
> 
>  
> 
>     +            }
>     +        }
>     +
>     +        new_base = dp_packet_base(b);
>     +
>     +        break;
>     +    }
>          case DPBUF_MALLOC:
>              if (new_headroom == dp_packet_headroom(b)) {
>                  new_base = xrealloc(dp_packet_base(b), new_allocated);
>     @@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet *b, size_t
>     new_headroom, size_t new_tailroom
>              OVS_NOT_REACHED();
>          }
> 
>     -    dp_packet_set_allocated(b, new_allocated);
>     +    if (b->source != DPBUF_DPDK) {
>     +        dp_packet_set_allocated(b, new_allocated);
>     +    }
>          dp_packet_set_base(b, new_base);
> 
>          new_data = (char *) new_base + new_headroom;
>     -- 
>     2.7.4
> 
>     _______________________________________________
>     dev mailing list
>     dev at openvswitch.org <mailto:dev at openvswitch.org>
>     https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>     <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> 
> 


More information about the dev mailing list