[ovs-discuss] Can't send to controller when doing a resubmit and learn (regression)

Tue Apr 22 15:54:04 UTC 2014

On Sat, Apr 19, 2014 at 07:50:31PM -0700, Murphy McCauley wrote:
> I recently found a technique I'd used with OVS 1.9 no longer worked under OVS built from master a few days ago.  Here's a pretty minimal example:
> 
> table=0, actions=resubmit(,2),resubmit(,1)
> table=1, reg1=0 actions=learn(table=2,hard_timeout=1,load:1->NXM_NX_REG1[]),controller
> 
> In this example, it's a poor man's controller rate limiter.  The previous (and expected) behavior is that you can spam packets (e.g., ping -i 0.1) and only one per second goes to the controller.  The observed behavior on new versions of OVS is that nothing ever comes to the controller.
> 
> Adding a reg1=1 match to table 1, it was clear the matching was working right (the packet counts of the table 1 rules summed to the packet count of the table 0 rule).  But still nothing at the controller.  A flood action, however, works just fine -- one per second.  This got me thinking it's a fast path/slow path issue.  I did some digging and found:
> 
> Before 4dff909 (Move odp_actions from subfacet to facet), things worked as expected.  After this commit, it didn't work, but I found a workaround based on a glance through the diff and a hunch: if I put a controller action in the table 0 rule too, both controller actions worked.  I was inspired to try this by the change around line 5027.  Without the table 0 controller action, facet_revalidate() gives up when the facet goes from fast path to slow path.  With it, I am guessing it starts out on the slow path and never changes.  Whether any of that is significant or not, by sending to a nonexistent controller ID in table 0, I had the behavior I wanted again.
> 
> Unfortunately, this workaround didn't work on master.  So more digging.  It turns out that after 3d9c5e5 (Handle learn action flow mods asynchronously), the workaround wasn't required anymore and things were back to working as expected.
> 
> Obviously this didn't last forever.  Specifically, when 9129672 (Move "learn" actions into individual threads) more or less undid the previous, even the workaround doesn't work.
> 
> I tried to find anything related on the mailing list and didn't come up with anything.  Is it unknown?  Is there any reason why this *shouldn't* work?  Any thoughts on getting it to work again?

At a glance, this should work (although it's not a use case I've
considered before).  It's not obvious to me why it doesn't.  If you
figure out a fix (though I'd like to take a look myself, I don't have
the time), please submit it, and then we'll add a test to avoid future
regression.