[ovs-dev] [PATCH 6/7] datapath: Add tunnel header caching.

Ben Pfaff blp at nicira.com
Wed Sep 22 17:44:06 UTC 2010


On Tue, Sep 21, 2010 at 03:46:23PM -0700, Jesse Gross wrote:
> On Tue, Sep 21, 2010 at 10:18 AM, Ben Pfaff <blp at nicira.com> wrote:
> > I notice that move_port() can fail after the vport is removed but before
> > it is re-added elsewhere.  Do we leak the vport in that case?
> 
> The port itself is not leaked in the sense that it still exists and
> will be fully cleaned up when it is deleted.  However, if the re-add
> fails it will not be able to receive any packets.  This somewhat lame
> behavior is the same as before.
> 
> In the future (hopefully not too distant) I would like to improve this:
> 
> * Make the port/flow hash table use linked lists for the buckets
> instead of an array.  Since we try to keep the bucket size equal to 1
> this shouldn't make a noticeable difference in performance.  It will
> enable us to avoid allocating memory on insertion / deletion and
> therefore avoid the potential for failure in these situations.
> * Fix up all of these exit/deletion cases that don't really have a
> good way to handle failure and therefore are probably full of errors.
> * In this particular case, link the port config into the new hash
> chain before removing it from the old one.  This will make the
> switchover atomic when we do the rcu_assign_pointer().

A linked list would be better than an array in many ways.  The reason
that an array is used now is that resizing (primarily, enlarging) the
hash table is easy with an array.  With a linked list, the best design
that I had was to include two separate "next" pointers in the table node
(call them next[0] and next[1]), only one of which is in use at a time.
To resize, build a new tabled using the pointers that are not currently
in use, flip to the new table, and free the old hash table when it is
safe.  (You can't blindly use call_rcu() for this: a given hash table
can safely resized only once between RCU grace periods.  After the first
time you have to synchronize_rcu().)

If you think that design is OK, or if you have a better design, then
that's fine.

My own plan, that I've had in the back of my mind for a while, is to
support deleting when memory allocation fails simply by writing NULL to
the pointer in the array.  The search code would then have to tolerate
NULLs, but that's probably a low cost.  (Another way would be to
duplicate one of the adjacent entries.  It would get hit twice on
searches, but it would be cache-hot the second time and we'd avoid a
test-and-branch on the fast path.)

> > In build_cache() it is too bad that we have to build up an skb just to
> > pass it to flow_extract() and free it.  Do you think it is expensive and
> > frequent enough to optimize?
> 
> It is unfortunate but I'm not sure whether the benefit is high enough
> to justify the cost.  Building the cache is somewhat expensive and
> hopefully rare anyways.  I see three options:
> * Make the normal flow_extract() capable of handling this case.  I
> don't want to do that since it will make the common case slower to
> accommodate the rare case.
> * Have a second specialized function.  I'm afraid that it might
> diverge from the primary one over time, subtly breaking things.
> * Keep it the way it is.

I don't think it's that important, so of those options "keep it the way
it is" is probably the best choice.

Isn't the header basically fixed in form?  I would think that you could
easily write simple code in-line there to initialize odp_flow_key
properly.  Maybe you didn't want to maintain it as odp_flow_key changes.

> > I had a hard time convincing myself that locking was OK in
> > cache_cleaner_cb().
> 
> Can you be more specific?

Oh, it looks OK when I look again, but I was a little surprised that
tbl_foreach() was itself safe, since I think that everywhere else we
call it only when we have a lock against the table itself changing.

> > Does check_headroom() cause double-accounting?  I think that the
> > destructor will be called for both new and old packets.
> 
> It will be called twice but that should be the correct result.  The
> destructor will be called on the old packet, giving back the memory
> that it was using.  Then the socket will be charged for the new packet
> and the destructor set.    When the new packet actually gets sent its
> destructor will get called, releasing that memory.

OK, great, I understand now.

> > I'm surprised that there's no easy way to just ensure that struct
> > tnl_cache is padded out to a suitable alignment, instead of needing to
> > deal with that when we allocate it.
> >
> 
> There may be some GCC extension to do this but I'm not sure what it
> is.  There's __attribute__ ((aligned (size))) but that's more for the
> starting point of a variable declaration or member within a structure.
>  I don't think that it does anything for a struct definition.  Do you
> know of something?

I don't.  Oh well.

Thank you for all the explanations.  I'm happy with this, whenever you
think it is ready.




More information about the dev mailing list