[ovs-dev] Deferring ofproto_class::destruct vs. ovs-appctl exit

Jarno Rajahalme jarno at ovn.org
Thu Sep 29 20:32:46 UTC 2016


Btw. I don’t see this problem triggered by the testsuite even if I set the ‘ofproto->backer' pointer to NULL right after the free() call. Do you see this happening on an unmodified OVS 2.5? If so, could you send the steps to reproduce. Just need to know if this is a potential crashing bug in OVS 2.5(.1) that needs to be fixed, or if this is something that affects your development.

Thanks,

  Jarno

> On Sep 29, 2016, at 12:54 PM, Jarno Rajahalme <jarno at ovn.org> wrote:
> 
> This may not be the cleanest solution, but how about changing the last line of close_dpif_backer() in ofproto/ofproto-dpif.c like this:
> 
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 83dcc9c..5b42b7e 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -864,7 +864,7 @@ close_dpif_backer(struct dpif_backer *backer)
>     free(backer->type);
>     free(backer->dp_version_string);
>     dpif_close(backer->dpif);
> -    free(backer);
> +    ovsrcu_postpone(free, backer);
> }
> 
> /* Datapath port slated for removal from datapath. */
> 
> Maybe this would solve the problem you found? That is, if the problem is that the backer reference is stale, keeping the memory around for the RCU thread should help. Obviously most of the backer has already been destructed at this point, but it this case it should not matter.
> 
>  Jarno
> 
>> On Sep 29, 2016, at 11:32 AM, Petr Machata <petrm at mellanox.com> wrote:
>> 
>> Hi,
>> 
>> On 2.5, we are seeing the following problem when removing a bridge:
>> 
>> - ofproto_destroy calls ofproto_flush__, which eventually calls
>>   ovsrcu_postpone(remove_rules_rcu)
>> 
>> - ofproto_destroy also calls p->ofproto_class->destruct, which
>>   eventually leads to release of DPIF backer (close_dpif_backer)
>> 
>> - at some later point, remove_rules_rcu is picked up by the RCU
>>   thread.  That calls rule_delete, calls complete_operation, and that
>>   references the backer, which is however already gone:
>>     ofproto->backer->need_revalidate = REV_FLOW_TABLE;
>> 
>> My first idea is to do this:
>> 
>> modified   ofproto/ofproto.c
>> @@ -1588,9 +1588,16 @@ ofproto_destroy__(struct ofproto *ofproto)
>> * - 1st we defer the removal of the rules from the classifier
>> * - 2nd we defer the actual destruction of the rules. */
>> static void
>> +ofproto_class_destruct__(struct ofproto *ofproto)
>> +{
>> +    ofproto->ofproto_class->destruct(ofproto);
>> +}
>> +
>> +static void
>> ofproto_destroy_defer__(struct ofproto *ofproto)
>>    OVS_EXCLUDED(ofproto_mutex)
>> {
>> +    ovsrcu_postpone(ofproto_class_destruct__, ofproto);
>>    ovsrcu_postpone(ofproto_destroy__, ofproto);
>> }
>> 
>> @@ -1623,8 +1630,6 @@ ofproto_destroy(struct ofproto *p, bool del)
>>        free(usage);
>>    }
>> 
>> -    p->ofproto_class->destruct(p);
>> -
>>    /* We should not postpone this because it involves deleting a listening
>>     * socket which we may want to reopen soon. 'connmgr' should not be used
>>     * by other threads */
>> 
>> That seems to fix the issue.
>> 
>> But "ovs-appctl exit" (or rather the ovs-vswitchd exit action that
>> ovs-appctl exit triggers) doesn't wait for the RCU thread to do all the
>> deferred work, so this ends up not calling the cleanup at all.  We can
>> work around by writing something like "ovs-appctl mlnx/wait-br-cleanup",
>> but that's unsatisfactory.
>> 
>> It seems like ovs-vswitchd's exit handling should actually wait for
>> deferred work to get done.
>> 
>> Thoughts?  How would I go about implementing this?
>> 
>> Thanks,
>> Petr
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
> 




More information about the dev mailing list