[ovs-dev] [PATCH v5 08/14] dp-packet: Handle multi-seg mbufs in resize__().

Lam, Tiago tiago.lam at intel.com
Tue Jul 24 07:36:19 UTC 2018


On 23/07/2018 23:48, Darrell Ball wrote:
> 
> 
> On Fri, Jul 20, 2018 at 10:10 AM, Lam, Tiago <tiago.lam at intel.com
> <mailto:tiago.lam at intel.com>> wrote:
> 
>     On 19/07/2018 00:02, Darrell Ball wrote:
>     > 
>     > 
>     > On Tue, Jul 17, 2018 at 12:59 AM, Lam, Tiago <tiago.lam at intel.com <mailto:tiago.lam at intel.com>
>     > <mailto:tiago.lam at intel.com <mailto:tiago.lam at intel.com>>> wrote:
>     > 
>     >     On 16/07/2018 09:37, Lam, Tiago wrote:
>     >     > On 13/07/2018 18:54, Darrell Ball wrote:
>     >     >> Thanks for the patch.
>     >     >>
>     >     >> A few queries inline.
>     >     >>
>     >     >
>     >     > Hi Darrell,
>     >     >
>     >     > Thanks for your inputs. I've replied in-line as well.
>     >     >
>     >     >> On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam at intel.com <mailto:tiago.lam at intel.com>
>     >     <mailto:tiago.lam at intel.com <mailto:tiago.lam at intel.com>>
>     >     >> <mailto:tiago.lam at intel.com <mailto:tiago.lam at intel.com>
>     <mailto:tiago.lam at intel.com <mailto:tiago.lam at intel.com>>>> wrote:
>     >     >>
>     >     >>     When enabled with DPDK OvS relies on mbufs allocated by
>     >     mempools to
>     >     >>     receive and output data on DPDK ports. Until now, each OvS
>     >     dp_packet has
>     >     >>     had only one mbuf associated, which is allocated with
>     the maximum
>     >     >>     possible size, taking the MTU into account. This approach,
>     >     however,
>     >     >>     doesn't allow us to increase the allocated size in an mbuf,
>     >     if needed,
>     >     >>     since an mbuf is allocated and initialised upon mempool
>     >     creation. Thus,
>     >     >>     in the current implementatin this is dealt with by calling
>     >     >>     OVS_NOT_REACHED() and terminating OvS.
>     >     >>
>     >     >>     To avoid this, and allow the (already) allocated space
>     to be
>     >     better
>     >     >>     used, dp_packet_resize__() now tries to use the available
>     >     room, both the
>     >     >>     tailroom and the headroom, to make enough space for the new
>     >     data. Since
>     >     >>     this happens for packets of source DPBUF_DPDK, the
>     >     single-segment mbuf
>     >     >>     case mentioned above is also covered by this new aproach in
>     >     resize__().
>     >     >>
>     >     >>     Signed-off-by: Tiago Lam <tiago.lam at intel.com
>     <mailto:tiago.lam at intel.com>
>     >     <mailto:tiago.lam at intel.com <mailto:tiago.lam at intel.com>>
>     >     >>     <mailto:tiago.lam at intel.com
>     <mailto:tiago.lam at intel.com> <mailto:tiago.lam at intel.com
>     <mailto:tiago.lam at intel.com>>>>
>     >     >>     Acked-by: Eelco Chaudron <echaudro at redhat.com <mailto:echaudro at redhat.com>
>     >     <mailto:echaudro at redhat.com <mailto:echaudro at redhat.com>>
>     >     >>     <mailto:echaudro at redhat.com
>     <mailto:echaudro at redhat.com> <mailto:echaudro at redhat.com
>     <mailto:echaudro at redhat.com>>>>
>     >     >>     ---
>     >     >>      lib/dp-packet.c | 48
>     >     ++++++++++++++++++++++++++++++++++++++++++++++--
>     >     >>      1 file changed, 46 insertions(+), 2 deletions(-)
>     >     >>
>     >     >>     diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>     >     >>     index d6e19eb..87af459 100644
>     >     >>     --- a/lib/dp-packet.c
>     >     >>     +++ b/lib/dp-packet.c
>     >     >>     @@ -237,9 +237,51 @@ dp_packet_resize__(struct
>     dp_packet *b,
>     >     size_t
>     >     >>     new_headroom, size_t new_tailroom
>     >     >>          new_allocated = new_headroom + dp_packet_size(b) +
>     >     new_tailroom;
>     >     >>
>     >     >>          switch (b->source) {
>     >     >>     +    /* When resizing mbufs, both a single mbuf and
>     multi-segment
>     >     >>     mbufs (where
>     >     >>     +     * data is not contigously held in memory), both the
>     >     headroom
>     >     >>     and the
>     >     >>     +     * tailroom available will be used to make more space
>     >     for where
>     >     >>     data needs
>     >     >>     +     * to be inserted. I.e if there's not enough headroom,
>     >     data may
>     >     >>     be shifted
>     >     >>     +     * right if there's enough tailroom.
>     >     >>     +     * However, this is not bulletproof and in some cases
>     >     the space
>     >     >>     available
>     >     >>     +     * won't be enough - in those cases, an error
>     should be
>     >     >>     returned and the
>     >     >>     +     * packet dropped. */
>     >     >>          case DPBUF_DPDK:
>     >     >>     -        OVS_NOT_REACHED();
>     >     >>
>     >     >>
>     >     >> Previously, it was a coding error to call this function for a
>     >     DPDK mbuf
>     >     >> case, which is pretty
>     >     >> clear. But with this patch, presumably that is not longer the
>     >     case and
>     >     >> the calling the API is
>     >     >> now ok for DPDK mbufs.
>     >     >>
>     >     >
>     >     > As it stands, it will still be an error to call
>     >     dp_packet_resize__() for
>     >     > any DPDK packet, or by extension any of the other functions
>     that call
>     >     > it, such as dp_packet_prealloc_tailroom() and
>     >     > dp_packet_prealloc_headroom(). This patch only tries to
>     alleviate that
>     >     > by accommodating space from the headroom or tailroom, if
>     possible, and
>     >     > create just enough space for the new data. My preferred
>     approach would
>     >     > be to return an error if not possible, but since the API
>     doesn't deal
>     >     > with errors as is, the previous behavior of manually
>     asserting was
>     >     left
>     >     > as is. As reported in [1] (I comment more on that below),
>     the behavior
>     >     > of manually asserting can lead to undesired behavior in some use
>     >     cases.
>     >     >
>     >     >>  
>     >     >>
>     >     >>     +    {
>     >     >>     +        size_t miss_len;
>     >     >>     +
>     >     >>     +        if (new_headroom == dp_packet_headroom(b)) {
>     >     >>     +            /* This is a tailroom adjustment. Since
>     there's no
>     >     >>     tailroom space
>     >     >>     +             * left, try and shift data towards the
>     head to
>     >     free up
>     >     >>     tail space,
>     >     >>     +             * if there's enough headroom */
>     >     >>     +
>     >     >>     +            miss_len = new_tailroom -
>     dp_packet_tailroom(b);
>     >     >>     +
>     >     >>     +            if (miss_len <= new_headroom) {
>     >     >>     +                dp_packet_shift(b, -miss_len);
>     >     >>     +            } else {
>     >     >>     +                /* XXX: Handle error case and report error
>     >     to caller */
>     >     >>     +                OVS_NOT_REACHED();
>     >     >>
>     >     >>
>     >     >>
>     >     >> This will not just drop the packet, it will fail the daemon,
>     >     because a
>     >     >> packet cannot be resized.
>     >     >> If the system is completely depleted of memory, that may be ok,
>     >     but in
>     >     >> the case, no such
>     >     >> assumption is implied.
>     >     >>
>     >     >> Also, why is XXX still left in the patch?
>     >     >>
>     >     >
>     >     > Because there's still work to do in that regard. The whole
>     process
>     >     > shouldn't be brought down if there's not enough space to put
>     some data
>     >     > in one single packet. However, this was intentionally left
>     out of this
>     >     > series or otherwise it would increase its complexity
>     considerably.
>     >     >
>     >     > As others have pointed out in [1], this is not a simple
>     change, which
>     >     > would have to be propagated to higher levels in other parts
>     of the
>     >     code
>     >     > base. I've proposed an alternative (vs refactoring the whole
>     dp_packet
>     >     > API to handle and return errors) in [2], but that seems to
>     have gone
>     >     > stale. Going forward I see that approach merging with this new
>     >     piece in
>     >     > dp_packet_resize__(), where an error can be returned to the
>     caller if
>     >     > there's not enough space.
>     >     >
>     >     >> Also, the preexisting API handles two cases:
>     >     >> 1/ Tailroom only adjustment
>     >     >> 2/ headroom and/or tailroom adjustment
>     >     >>
>     >     >> meaning it handles all cases.
>     >     >>
>     >     >> The new DPDK addition (part of the same API) defines 2 cases
>     >     >>
>     >     >> 1/ tailroom only adjustment
>     >     >> 2/ headroom only adjustment
>     >     >>
>     >     >> So, it looks like a different API, that also does not
>     handle all
>     >     cases.
>     >     >>
>     >     >>  
>     >     >
>     >     > You have a point there, support for point 2/ "headroom and
>     tailroom
>     >     > adjustment" is missed. It doesn't seem to be used anywhere
>     at the
>     >     > moment, the only callers being dp_packet_prealloc_tailroom() and
>     >     > dp_packet_prealloc_headroom(), but I'll submit an
>     incremental patch to
>     >     > deal with this. Thanks for pointing it out.
>     >     >
>     >     I've had a look into this and it doesn't seem that case number 2/
>     >     "headroom and tailroom adjustment" above would make sense for the
>     >     DPBUF_DPDK case. The reason being that if both `new_tailroom` and
>     >     `new_headroom` are being increased (in respect to
>     dp_packet_tailroom()
>     >     and dp_packet_headroom()), there won't be enough space as the
>     tail and
>     >     head would be competing for the same available space. So it
>     makes sense
>     >     to make it an exclusive operation for the DPBUF_DPDK case.
>     >
>     >
>     > Maybe you can explain why that would be the case, in general?
>     >
> 
>     DPDK mbufs have a fixed layout. That means we can't really resize the
>     mbufs per say, as `buf_len` are fixed. For example, while in a normal
>     DPBUF_MALLOC packet we can reallocate data and copy everything to a
>     smaller buffer, decreasing the tailroom that way, in a DPDK mbuf we are
>     stuck with the same `buf_len` (meaning we can't change the tailroom
>     without affecting the `data_len`, thus modifying data). What we can do
>     is shift data to try and create more or less tailroom, taking or adding
>     space from the headroom.
> 
> 
> The above is fine, albeit obvious; but my question was something different.
> I was asking why you thought a packet could not adjust both headroom and
> tailroom
> for dpdk mbuf packets. data_len may change will headroom and/or tailroom
> changes.
> 
>  

My understanding is that `data_len` needs to remain fixed (as `size_` in
non-DPDK packets remains the same in DPBUF_MALLOC). If you change
`data_len` then you're effectively affecting the data. Either adding
random data to the packet, by increasing `data_len`, or trunking the
packet by decreasing `data_len`.

Given the above, if you want to increase the tailroom then you can only
shift the data left (if there's enough headroom to do so), and that will
affect the headroom available space.

> 
> 
>     I've gave it some thought and I'm now convinced that reusing resize__()
>     here is probably not the best approach. Given its implementation and
>     description ("Reallocates 'b' so that it has exactly 'new_headroom' and
>     'new_tailroom'"), that doesn't sound like it would be a good fit for a
>     DPDK packet. Maybe a completely separate function would be the best
>     approach here, setting up different expectations.
> 
> 
> That is part of the issue.
> 
> 
>  
> 
> 
>     However, as I said before, this was implemented to try to alleviate the
>     call to OVS_NOT_REACHED() in dp_packet_resize__(). But since it's adding
>     more complications than what it's actually solving (at least at this
>     point in time), I'm going to drop this patch (I'll send a revert patch
>     with the needed changes) in favor of dealing properly with
>     dp_packet_resize__() for DPDK packets, and stop calling it for DPDK
>     packets.
> 
> 
> 
> I think it would be better to modify the original patchset, rather than
> try to add code with
> the multisegment mbuf patchset and then immediately revert it.
> Also, I don't see the api, dp_packet_mbuf_shift() in patch 7 being used
> without the resize__() in 
> this patch 8, I think it would be best to remove the 'shift' api from
> patch 7.
> 

I would be fine with modifying the original patchset, except your
comments came after the merge to the `dpdk_merge` branch. And it becomes
somewhat even more delicate because it is very close to the deadline for
accepting new features.

Now, we could modify the `dpdk_merge` branch, since it hasn't been
merged to master yet, but at expenses of Ian's time... I'm not sure what
the policy is here, so I'm CC'ing Ian to clarify. Should I submit v6
here, or leave it as is (with a revert patch queued to be applied)?


More information about the dev mailing list