[ovs-dev] [PATCH v5 08/14] dp-packet: Handle multi-seg mbufs in resize__().

Darrell Ball dlu998 at gmail.com
Mon Jul 23 22:48:37 UTC 2018


On Fri, Jul 20, 2018 at 10:10 AM, Lam, Tiago <tiago.lam at intel.com> wrote:

> On 19/07/2018 00:02, Darrell Ball wrote:
> >
> >
> > On Tue, Jul 17, 2018 at 12:59 AM, Lam, Tiago <tiago.lam at intel.com
> > <mailto:tiago.lam at intel.com>> wrote:
> >
> >     On 16/07/2018 09:37, Lam, Tiago wrote:
> >     > On 13/07/2018 18:54, Darrell Ball wrote:
> >     >> Thanks for the patch.
> >     >>
> >     >> A few queries inline.
> >     >>
> >     >
> >     > Hi Darrell,
> >     >
> >     > Thanks for your inputs. I've replied in-line as well.
> >     >
> >     >> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam at intel.com
> >     <mailto:tiago.lam at intel.com>
> >     >> <mailto:tiago.lam at intel.com <mailto:tiago.lam at intel.com>>> wrote:
> >     >>
> >     >>     When enabled with DPDK OvS relies on mbufs allocated by
> >     mempools to
> >     >>     receive and output data on DPDK ports. Until now, each OvS
> >     dp_packet has
> >     >>     had only one mbuf associated, which is allocated with the
> maximum
> >     >>     possible size, taking the MTU into account. This approach,
> >     however,
> >     >>     doesn't allow us to increase the allocated size in an mbuf,
> >     if needed,
> >     >>     since an mbuf is allocated and initialised upon mempool
> >     creation. Thus,
> >     >>     in the current implementatin this is dealt with by calling
> >     >>     OVS_NOT_REACHED() and terminating OvS.
> >     >>
> >     >>     To avoid this, and allow the (already) allocated space to be
> >     better
> >     >>     used, dp_packet_resize__() now tries to use the available
> >     room, both the
> >     >>     tailroom and the headroom, to make enough space for the new
> >     data. Since
> >     >>     this happens for packets of source DPBUF_DPDK, the
> >     single-segment mbuf
> >     >>     case mentioned above is also covered by this new aproach in
> >     resize__().
> >     >>
> >     >>     Signed-off-by: Tiago Lam <tiago.lam at intel.com
> >     <mailto:tiago.lam at intel.com>
> >     >>     <mailto:tiago.lam at intel.com <mailto:tiago.lam at intel.com>>>
> >     >>     Acked-by: Eelco Chaudron <echaudro at redhat.com
> >     <mailto:echaudro at redhat.com>
> >     >>     <mailto:echaudro at redhat.com <mailto:echaudro at redhat.com>>>
> >     >>     ---
> >     >>      lib/dp-packet.c | 48
> >     ++++++++++++++++++++++++++++++++++++++++++++++--
> >     >>      1 file changed, 46 insertions(+), 2 deletions(-)
> >     >>
> >     >>     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> >     >>     index d6e19eb..87af459 100644
> >     >>     --- a/lib/dp-packet.c
> >     >>     +++ b/lib/dp-packet.c
> >     >>     @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b,
> >     size_t
> >     >>     new_headroom, size_t new_tailroom
> >     >>          new_allocated = new_headroom + dp_packet_size(b) +
> >     new_tailroom;
> >     >>
> >     >>          switch (b->source) {
> >     >>     +    /* When resizing mbufs, both a single mbuf and
> multi-segment
> >     >>     mbufs (where
> >     >>     +     * data is not contigously held in memory), both the
> >     headroom
> >     >>     and the
> >     >>     +     * tailroom available will be used to make more space
> >     for where
> >     >>     data needs
> >     >>     +     * to be inserted. I.e if there's not enough headroom,
> >     data may
> >     >>     be shifted
> >     >>     +     * right if there's enough tailroom.
> >     >>     +     * However, this is not bulletproof and in some cases
> >     the space
> >     >>     available
> >     >>     +     * won't be enough - in those cases, an error should be
> >     >>     returned and the
> >     >>     +     * packet dropped. */
> >     >>          case DPBUF_DPDK:
> >     >>     -        OVS_NOT_REACHED();
> >     >>
> >     >>
> >     >> Previously, it was a coding error to call this function for a
> >     DPDK mbuf
> >     >> case, which is pretty
> >     >> clear. But with this patch, presumably that is not longer the
> >     case and
> >     >> the calling the API is
> >     >> now ok for DPDK mbufs.
> >     >>
> >     >
> >     > As it stands, it will still be an error to call
> >     dp_packet_resize__() for
> >     > any DPDK packet, or by extension any of the other functions that
> call
> >     > it, such as dp_packet_prealloc_tailroom() and
> >     > dp_packet_prealloc_headroom(). This patch only tries to alleviate
> that
> >     > by accommodating space from the headroom or tailroom, if possible,
> and
> >     > create just enough space for the new data. My preferred approach
> would
> >     > be to return an error if not possible, but since the API doesn't
> deal
> >     > with errors as is, the previous behavior of manually asserting was
> >     left
> >     > as is. As reported in [1] (I comment more on that below), the
> behavior
> >     > of manually asserting can lead to undesired behavior in some use
> >     cases.
> >     >
> >     >>
> >     >>
> >     >>     +    {
> >     >>     +        size_t miss_len;
> >     >>     +
> >     >>     +        if (new_headroom == dp_packet_headroom(b)) {
> >     >>     +            /* This is a tailroom adjustment. Since there's
> no
> >     >>     tailroom space
> >     >>     +             * left, try and shift data towards the head to
> >     free up
> >     >>     tail space,
> >     >>     +             * if there's enough headroom */
> >     >>     +
> >     >>     +            miss_len = new_tailroom - dp_packet_tailroom(b);
> >     >>     +
> >     >>     +            if (miss_len <= new_headroom) {
> >     >>     +                dp_packet_shift(b, -miss_len);
> >     >>     +            } else {
> >     >>     +                /* XXX: Handle error case and report error
> >     to caller */
> >     >>     +                OVS_NOT_REACHED();
> >     >>
> >     >>
> >     >>
> >     >> This will not just drop the packet, it will fail the daemon,
> >     because a
> >     >> packet cannot be resized.
> >     >> If the system is completely depleted of memory, that may be ok,
> >     but in
> >     >> the case, no such
> >     >> assumption is implied.
> >     >>
> >     >> Also, why is XXX still left in the patch?
> >     >>
> >     >
> >     > Because there's still work to do in that regard. The whole process
> >     > shouldn't be brought down if there's not enough space to put some
> data
> >     > in one single packet. However, this was intentionally left out of
> this
> >     > series or otherwise it would increase its complexity considerably.
> >     >
> >     > As others have pointed out in [1], this is not a simple change,
> which
> >     > would have to be propagated to higher levels in other parts of the
> >     code
> >     > base. I've proposed an alternative (vs refactoring the whole
> dp_packet
> >     > API to handle and return errors) in [2], but that seems to have
> gone
> >     > stale. Going forward I see that approach merging with this new
> >     piece in
> >     > dp_packet_resize__(), where an error can be returned to the caller
> if
> >     > there's not enough space.
> >     >
> >     >> Also, the preexisting API handles two cases:
> >     >> 1/ Tailroom only adjustment
> >     >> 2/ headroom and/or tailroom adjustment
> >     >>
> >     >> meaning it handles all cases.
> >     >>
> >     >> The new DPDK addition (part of the same API) defines 2 cases
> >     >>
> >     >> 1/ tailroom only adjustment
> >     >> 2/ headroom only adjustment
> >     >>
> >     >> So, it looks like a different API, that also does not handle all
> >     cases.
> >     >>
> >     >>
> >     >
> >     > You have a point there, support for point 2/ "headroom and tailroom
> >     > adjustment" is missed. It doesn't seem to be used anywhere at the
> >     > moment, the only callers being dp_packet_prealloc_tailroom() and
> >     > dp_packet_prealloc_headroom(), but I'll submit an incremental
> patch to
> >     > deal with this. Thanks for pointing it out.
> >     >
> >     I've had a look into this and it doesn't seem that case number 2/
> >     "headroom and tailroom adjustment" above would make sense for the
> >     DPBUF_DPDK case. The reason being that if both `new_tailroom` and
> >     `new_headroom` are being increased (in respect to
> dp_packet_tailroom()
> >     and dp_packet_headroom()), there won't be enough space as the tail
> and
> >     head would be competing for the same available space. So it makes
> sense
> >     to make it an exclusive operation for the DPBUF_DPDK case.
> >
> >
> > Maybe you can explain why that would be the case, in general?
> >
>
> DPDK mbufs have a fixed layout. That means we can't really resize the
> mbufs per say, as `buf_len` are fixed. For example, while in a normal
> DPBUF_MALLOC packet we can reallocate data and copy everything to a
> smaller buffer, decreasing the tailroom that way, in a DPDK mbuf we are
> stuck with the same `buf_len` (meaning we can't change the tailroom
> without affecting the `data_len`, thus modifying data). What we can do
> is shift data to try and create more or less tailroom, taking or adding
> space from the headroom.
>

The above is fine, albeit obvious; but my question was something different.
I was asking why you thought a packet could not adjust both headroom and
tailroom
for dpdk mbuf packets. data_len may change will headroom and/or tailroom
changes.



>
> I've gave it some thought and I'm now convinced that reusing resize__()
> here is probably not the best approach. Given its implementation and
> description ("Reallocates 'b' so that it has exactly 'new_headroom' and
> 'new_tailroom'"), that doesn't sound like it would be a good fit for a
> DPDK packet. Maybe a completely separate function would be the best
> approach here, setting up different expectations.
>

That is part of the issue.




>
> However, as I said before, this was implemented to try to alleviate the
> call to OVS_NOT_REACHED() in dp_packet_resize__(). But since it's adding
> more complications than what it's actually solving (at least at this
> point in time), I'm going to drop this patch (I'll send a revert patch
> with the needed changes) in favor of dealing properly with
> dp_packet_resize__() for DPDK packets, and stop calling it for DPDK
> packets.
>


I think it would be better to modify the original patchset, rather than try
to add code with
the multisegment mbuf patchset and then immediately revert it.
Also, I don't see the api, dp_packet_mbuf_shift() in patch 7 being used
without the resize__() in
this patch 8, I think it would be best to remove the 'shift' api from patch
7.



>
> > However, if that is really the case, then that is fine; an "else if"
> > case would
> > need to be added and a modified 'return error else case'.
> >
> >
> >
> >
> >     The work I mentioned before in [1] should help here as we would then
> be
> >     able to return an error (suach as `EINVAL`) if both "new_tailroom"
> and
> >     "new_headroom" are incremented for the DPBUF_DPDK case.
> >
> >     What do you think?
> >
> >     [1]
> >     https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html
> >     <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/
> 348908.html>
> >
> >
>


More information about the dev mailing list