[ovs-dev] [PATCH v5 08/14] dp-packet: Handle multi-seg mbufs in resize__().

Darrell Ball dlu998 at gmail.com
Wed Jul 18 23:02:07 UTC 2018


On Tue, Jul 17, 2018 at 12:59 AM, Lam, Tiago <tiago.lam at intel.com> wrote:

> On 16/07/2018 09:37, Lam, Tiago wrote:
> > On 13/07/2018 18:54, Darrell Ball wrote:
> >> Thanks for the patch.
> >>
> >> A few queries inline.
> >>
> >
> > Hi Darrell,
> >
> > Thanks for your inputs. I've replied in-line as well.
> >
> >> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam at intel.com
> >> <mailto:tiago.lam at intel.com>> wrote:
> >>
> >>     When enabled with DPDK OvS relies on mbufs allocated by mempools to
> >>     receive and output data on DPDK ports. Until now, each OvS
> dp_packet has
> >>     had only one mbuf associated, which is allocated with the maximum
> >>     possible size, taking the MTU into account. This approach, however,
> >>     doesn't allow us to increase the allocated size in an mbuf, if
> needed,
> >>     since an mbuf is allocated and initialised upon mempool creation.
> Thus,
> >>     in the current implementatin this is dealt with by calling
> >>     OVS_NOT_REACHED() and terminating OvS.
> >>
> >>     To avoid this, and allow the (already) allocated space to be better
> >>     used, dp_packet_resize__() now tries to use the available room,
> both the
> >>     tailroom and the headroom, to make enough space for the new data.
> Since
> >>     this happens for packets of source DPBUF_DPDK, the single-segment
> mbuf
> >>     case mentioned above is also covered by this new aproach in
> resize__().
> >>
> >>     Signed-off-by: Tiago Lam <tiago.lam at intel.com
> >>     <mailto:tiago.lam at intel.com>>
> >>     Acked-by: Eelco Chaudron <echaudro at redhat.com
> >>     <mailto:echaudro at redhat.com>>
> >>     ---
> >>      lib/dp-packet.c | 48 ++++++++++++++++++++++++++++++
> ++++++++++++++++--
> >>      1 file changed, 46 insertions(+), 2 deletions(-)
> >>
> >>     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> >>     index d6e19eb..87af459 100644
> >>     --- a/lib/dp-packet.c
> >>     +++ b/lib/dp-packet.c
> >>     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b, size_t
> >>     new_headroom, size_t new_tailroom
> >>          new_allocated = new_headroom + dp_packet_size(b) +
> new_tailroom;
> >>
> >>          switch (b->source) {
> >>     +    /* When resizing mbufs, both a single mbuf and multi-segment
> >>     mbufs (where
> >>     +     * data is not contigously held in memory), both the headroom
> >>     and the
> >>     +     * tailroom available will be used to make more space for where
> >>     data needs
> >>     +     * to be inserted. I.e if there's not enough headroom, data may
> >>     be shifted
> >>     +     * right if there's enough tailroom.
> >>     +     * However, this is not bulletproof and in some cases the space
> >>     available
> >>     +     * won't be enough - in those cases, an error should be
> >>     returned and the
> >>     +     * packet dropped. */
> >>          case DPBUF_DPDK:
> >>     -        OVS_NOT_REACHED();
> >>
> >>
> >> Previously, it was a coding error to call this function for a DPDK mbuf
> >> case, which is pretty
> >> clear. But with this patch, presumably that is not longer the case and
> >> the calling the API is
> >> now ok for DPDK mbufs.
> >>
> >
> > As it stands, it will still be an error to call dp_packet_resize__() for
> > any DPDK packet, or by extension any of the other functions that call
> > it, such as dp_packet_prealloc_tailroom() and
> > dp_packet_prealloc_headroom(). This patch only tries to alleviate that
> > by accommodating space from the headroom or tailroom, if possible, and
> > create just enough space for the new data. My preferred approach would
> > be to return an error if not possible, but since the API doesn't deal
> > with errors as is, the previous behavior of manually asserting was left
> > as is. As reported in [1] (I comment more on that below), the behavior
> > of manually asserting can lead to undesired behavior in some use cases.
> >
> >>
> >>
> >>     +    {
> >>     +        size_t miss_len;
> >>     +
> >>     +        if (new_headroom == dp_packet_headroom(b)) {
> >>     +            /* This is a tailroom adjustment. Since there's no
> >>     tailroom space
> >>     +             * left, try and shift data towards the head to free up
> >>     tail space,
> >>     +             * if there's enough headroom */
> >>     +
> >>     +            miss_len = new_tailroom - dp_packet_tailroom(b);
> >>     +
> >>     +            if (miss_len <= new_headroom) {
> >>     +                dp_packet_shift(b, -miss_len);
> >>     +            } else {
> >>     +                /* XXX: Handle error case and report error to
> caller */
> >>     +                OVS_NOT_REACHED();
> >>
> >>
> >>
> >> This will not just drop the packet, it will fail the daemon, because a
> >> packet cannot be resized.
> >> If the system is completely depleted of memory, that may be ok, but in
> >> the case, no such
> >> assumption is implied.
> >>
> >> Also, why is XXX still left in the patch?
> >>
> >
> > Because there's still work to do in that regard. The whole process
> > shouldn't be brought down if there's not enough space to put some data
> > in one single packet. However, this was intentionally left out of this
> > series or otherwise it would increase its complexity considerably.
> >
> > As others have pointed out in [1], this is not a simple change, which
> > would have to be propagated to higher levels in other parts of the code
> > base. I've proposed an alternative (vs refactoring the whole dp_packet
> > API to handle and return errors) in [2], but that seems to have gone
> > stale. Going forward I see that approach merging with this new piece in
> > dp_packet_resize__(), where an error can be returned to the caller if
> > there's not enough space.
> >
> >> Also, the preexisting API handles two cases:
> >> 1/ Tailroom only adjustment
> >> 2/ headroom and/or tailroom adjustment
> >>
> >> meaning it handles all cases.
> >>
> >> The new DPDK addition (part of the same API) defines 2 cases
> >>
> >> 1/ tailroom only adjustment
> >> 2/ headroom only adjustment
> >>
> >> So, it looks like a different API, that also does not handle all cases.
> >>
> >>
> >
> > You have a point there, support for point 2/ "headroom and tailroom
> > adjustment" is missed. It doesn't seem to be used anywhere at the
> > moment, the only callers being dp_packet_prealloc_tailroom() and
> > dp_packet_prealloc_headroom(), but I'll submit an incremental patch to
> > deal with this. Thanks for pointing it out.
> >
> I've had a look into this and it doesn't seem that case number 2/
> "headroom and tailroom adjustment" above would make sense for the
> DPBUF_DPDK case. The reason being that if both `new_tailroom` and
> `new_headroom` are being increased (in respect to dp_packet_tailroom()
> and dp_packet_headroom()), there won't be enough space as the tail and
> head would be competing for the same available space. So it makes sense
> to make it an exclusive operation for the DPBUF_DPDK case.
>

Maybe you can explain why that would be the case, in general?

However, if that is really the case, then that is fine; an "else if" case
would
need to be added and a modified 'return error else case'.



>
> The work I mentioned before in [1] should help here as we would then be
> able to return an error (suach as `EINVAL`) if both "new_tailroom" and
> "new_headroom" are incremented for the DPBUF_DPDK case.
>
> What do you think?
>
> [1] https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html
>


More information about the dev mailing list