[ovs-dev] [PATCH] datapath: fix lock up issue in ovs_vport_cmd_set()

Jesse Gross jesse at nicira.com
Fri Mar 23 17:54:56 UTC 2012


On Thu, Mar 22, 2012 at 2:19 PM, Ansis Atteka <aatteka at nicira.com> wrote:
>
>
> On Thu, Mar 22, 2012 at 10:37 AM, Jesse Gross <jesse at nicira.com> wrote:
>>
>> On Wed, Mar 21, 2012 at 5:13 PM, Ansis Atteka <aatteka at nicira.com> wrote:
>> > diff --git a/datapath/datapath.c b/datapath/datapath.c
>> > index d64fc32..aa5be89 100644
>> > --- a/datapath/datapath.c
>> > +++ b/datapath/datapath.c
>> > @@ -1885,7 +1885,7 @@ static int ovs_vport_cmd_set(struct sk_buff *skb,
>> > struct genl_info *info)
>> >                err = PTR_ERR(reply);
>> >                netlink_set_err(GENL_SOCK(sock_net(skb->sk)), 0,
>> >                                ovs_dp_vport_multicast_group.id, err);
>> > -               return 0;
>> > +               goto exit_unlock;
>>
>> This has a side effect of changing the error code from 0 to -ENOMEM
>> (presumably that's why building the message failed), which I don't
>> think is right because the actual command succeeded.  This is
>> obviously related to the other problem that you noticed with sending
>> messages on failure but we should try to avoid introducing side
>> effects in this patch.
>
>
> You are right. But what would be the correct behavior if message
> building in ovs_vport_cmd_build_info() failed:
> 1. revert the altered config to the previous state and return error
> code (similarly as ovs_vport_cmd_new() does)
> 2. do not revert the altered config, but reply with 0 (the same behavior
> as before my patch)?

The problem is that it's not necessarily possible to roll back the
config in this case.  The primary reason for message building to fail
is a memory allocation error.  Due to the use of RCU, reverting the
config back to the original also requires memory allocation, so in
this case it's likely to fail as well.  In ovs_vport_cmd_new() it's
easy to undo the work (just free the new port) so we do it there and
it prevents everyone from having to flush cached information.

That said, the more common behavior is generally #2.  That's because
the two steps of changing the config with reporting success/failure to
caller and reporting the change to other people are really separate
items.  As long as we call netlink_set_err() when something has
changed and we couldn't send a notification things are still correct.



More information about the dev mailing list