[ovs-dev] [PATCH v5 08/14] dp-packet: Handle multi-seg mbufs in resize__().
Lam, Tiago
tiago.lam at intel.com
Fri Jul 20 17:09:27 UTC 2018
On 18/07/2018 23:53, Darrell Ball wrote:
> sorry, several distractions delayed response.
>
> On Mon, Jul 16, 2018 at 1:37 AM, Lam, Tiago <tiago.lam at intel.com
> <mailto:tiago.lam at intel.com>> wrote:
>
> On 13/07/2018 18:54, Darrell Ball wrote:
> > Thanks for the patch.
> >
> > A few queries inline.
> >
>
> Hi Darrell,
>
> Thanks for your inputs. I've replied in-line as well.
>
> > On Wed, Jul 11, 2018 at 11:23 AM, Tiago Lam <tiago.lam at intel.com <mailto:tiago.lam at intel.com>
> > <mailto:tiago.lam at intel.com <mailto:tiago.lam at intel.com>>> wrote:
> >
> > When enabled with DPDK OvS relies on mbufs allocated by mempools to
> > receive and output data on DPDK ports. Until now, each OvS dp_packet has
> > had only one mbuf associated, which is allocated with the maximum
> > possible size, taking the MTU into account. This approach, however,
> > doesn't allow us to increase the allocated size in an mbuf, if needed,
> > since an mbuf is allocated and initialised upon mempool creation. Thus,
> > in the current implementatin this is dealt with by calling
> > OVS_NOT_REACHED() and terminating OvS.
> >
> > To avoid this, and allow the (already) allocated space to be better
> > used, dp_packet_resize__() now tries to use the available room, both the
> > tailroom and the headroom, to make enough space for the new data. Since
> > this happens for packets of source DPBUF_DPDK, the single-segment mbuf
> > case mentioned above is also covered by this new aproach in resize__().
> >
> > Signed-off-by: Tiago Lam <tiago.lam at intel.com <mailto:tiago.lam at intel.com>
> > <mailto:tiago.lam at intel.com <mailto:tiago.lam at intel.com>>>
> > Acked-by: Eelco Chaudron <echaudro at redhat.com
> <mailto:echaudro at redhat.com>
> > <mailto:echaudro at redhat.com <mailto:echaudro at redhat.com>>>
> > ---
> > lib/dp-packet.c | 48
> ++++++++++++++++++++++++++++++++++++++++++++++--
> > 1 file changed, 46 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> > index d6e19eb..87af459 100644
> > --- a/lib/dp-packet.c
> > +++ b/lib/dp-packet.c
> > @@ -237,9 +237,51 @@ dp_packet_resize__(struct dp_packet *b,
> size_t
> > new_headroom, size_t new_tailroom
> > new_allocated = new_headroom + dp_packet_size(b) +
> new_tailroom;
> >
> > switch (b->source) {
> > + /* When resizing mbufs, both a single mbuf and multi-segment
> > mbufs (where
> > + * data is not contigously held in memory), both the headroom
> > and the
> > + * tailroom available will be used to make more space for
> where
> > data needs
> > + * to be inserted. I.e if there's not enough headroom,
> data may
> > be shifted
> > + * right if there's enough tailroom.
> > + * However, this is not bulletproof and in some cases the
> space
> > available
> > + * won't be enough - in those cases, an error should be
> > returned and the
> > + * packet dropped. */
> > case DPBUF_DPDK:
> > - OVS_NOT_REACHED();
> >
> >
> > Previously, it was a coding error to call this function for a DPDK
> mbuf
> > case, which is pretty
> > clear. But with this patch, presumably that is not longer the case and
> > the calling the API is
> > now ok for DPDK mbufs.
> >
>
> As it stands, it will still be an error to call dp_packet_resize__() for
> any DPDK packet, or by extension any of the other functions that call
> it, such as dp_packet_prealloc_tailroom() and
> dp_packet_prealloc_headroom().
>
>
>
> yep, the existing code fails in a very clear way; i.e. whenever it is
> called for a dpdk packet.
> So the user would need to handle in some other way, which is not being
> done today, I know.
>
>
>
> This patch only tries to alleviate that
> by accommodating space from the headroom or tailroom, if possible, and
> create just enough space for the new data.
>
>
> The new code will fail is some yet undefined way, occasionally working
> and failing
> in the "other" cases.
>
>
>
> My preferred approach would
> be to return an error if not possible, but since the API doesn't deal
> with errors as is, the previous behavior of manually asserting was left
> as is. As reported in [1] (I comment more on that below), the behavior
>
> of manually asserting can lead to undesired behavior in some use cases.
>
>
>
> I am familiar with the issue.
> As part of the change, dp_packet_put_uninit()
> and dp_packet_push_uninit() could be modified to return NULL
> and that could be percolated and checked for.
>
> Those APIs could simply check (by calling a helper API) if they would
> fail a priori to trigger returning
> NULL for dpdk buf cases.
>
>
Ok, I'm glad we're on the same page here. That's what [2] is doing. I'm
planning to bring that forward. If you could have a look as well, that
would be great.
>
>
> >
> >
> > + {
> > + size_t miss_len;
> > +
> > + if (new_headroom == dp_packet_headroom(b)) {
> > + /* This is a tailroom adjustment. Since there's no
> > tailroom space
> > + * left, try and shift data towards the head to free up
> > tail space,
> > + * if there's enough headroom */
> > +
> > + miss_len = new_tailroom - dp_packet_tailroom(b);
> > +
> > + if (miss_len <= new_headroom) {
> > + dp_packet_shift(b, -miss_len);
> > + } else {
> > + /* XXX: Handle error case and report error to caller */
> > + OVS_NOT_REACHED();
> >
> >
> >
> > This will not just drop the packet, it will fail the daemon, because a
> > packet cannot be resized.
> > If the system is completely depleted of memory, that may be ok, but in
> > the case, no such
> > assumption is implied.
> >
> > Also, why is XXX still left in the patch?
> >
>
> Because there's still work to do in that regard. The whole process
> shouldn't be brought down if there's not enough space to put some data
> in one single packet. However, this was intentionally left out of this
> series or otherwise it would increase its complexity considerably.
>
>
> It seems unnecessary to add a bunch of code to a series that tries to handle
> 'resize', but handles it partially in practical cases. It also seems
> undefined when
> it works and when it does not from a API caller POV.
Granted that the dp_packet_resize__() supports more operations than what
this implementation provides, but dp_packet API callers won't notice a
difference since the only callers are
dp_packet_prealloc_tailroom() and dp_packet_prealloc_headroom(). And the
implementation covers for those functions, which only either modify the
tailroom or the headroom (and not both).
> I think patch 7 is also there to only support this patch 8.
>
That's a wrong assumption. Patch 7/14 was added to support the
dp_packet_shift() operation.
In this patch 8/14, because of the noncontiguous nature of chained
mbufs, there's no possibility of using general functions like `memmove`,
and re-using the shift operations was just easier.
> Ideally, It would seem like a modified patch 7 and patch 8 would belong
> with the rest of the
> fix for the dpdk packet memory preallocation constraint issue.
>
> Also, ideally, 'XXX' is removed from patches.
>
>
>
>
> As others have pointed out in [1], this is not a simple change, which
> would have to be propagated to higher levels in other parts of the code
> base. I've proposed an alternative (vs refactoring the whole dp_packet
> API to handle and return errors) in [2], but that seems to have gone
> stale. Going forward I see that approach merging with this new piece in
> dp_packet_resize__(), where an error can be returned to the caller if
> there's not enough space.
>
>
> The full change is outside the scope of this series.
>
>
>
>
> > Also, the preexisting API handles two cases:
> > 1/ Tailroom only adjustment
> > 2/ headroom and/or tailroom adjustment
> >
> > meaning it handles all cases.
> >
> > The new DPDK addition (part of the same API) defines 2 cases
> >
> > 1/ tailroom only adjustment
> > 2/ headroom only adjustment
> >
> > So, it looks like a different API, that also does not handle all cases.
> >
> >
>
> You have a point there, support for point 2/ "headroom and tailroom
> adjustment" is missed. It doesn't seem to be used anywhere at the
> moment, the only callers being dp_packet_prealloc_tailroom() and
> dp_packet_prealloc_headroom(), but I'll submit an incremental patch to
> deal with this. Thanks for pointing it out.
>
> Tiago.
>
> [1]
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html
> <https://mail.openvswitch.org/pipermail/ovs-dev/2018-May/346649.html>
> [2]
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html
> <https://mail.openvswitch.org/pipermail/ovs-dev/2018-July/348908.html>
>
> >
> > + }
> > + } else {
> > + /* Otherwise, this is a headroom adjustment. Try to
> > shift data
> > + * towards the tail to free up head space, if there's
> > enough
> > + * tailroom */
> > +
> > + miss_len = new_headroom - dp_packet_headroom(b);
> >
> > +
> > + if (miss_len <= new_tailroom) {
> > + dp_packet_shift(b, miss_len);
> > + } else {
> > + /* XXX: Handle error case and report error to
> caller */
> > + OVS_NOT_REACHED();
> >
> >
> >
> > same comments as above.
> >
> >
> >
> > + }
> > + }
> > +
> > + new_base = dp_packet_base(b);
> > +
> > + break;
> > + }
> > case DPBUF_MALLOC:
> > if (new_headroom == dp_packet_headroom(b)) {
> > new_base = xrealloc(dp_packet_base(b),
> new_allocated);
> > @@ -263,7 +305,9 @@ dp_packet_resize__(struct dp_packet *b, size_t
> > new_headroom, size_t new_tailroom
> > OVS_NOT_REACHED();
> > }
> >
> > - dp_packet_set_allocated(b, new_allocated);
> > + if (b->source != DPBUF_DPDK) {
> > + dp_packet_set_allocated(b, new_allocated);
> > + }
> > dp_packet_set_base(b, new_base);
> >
> > new_data = (char *) new_base + new_headroom;
> > --
> > 2.7.4
> >
> > _______________________________________________
> > dev mailing list
> > dev at openvswitch.org <mailto:dev at openvswitch.org>
> <mailto:dev at openvswitch.org <mailto:dev at openvswitch.org>>
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
> > <https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>>
> >
> >
>
>
More information about the dev
mailing list