[ovs-dev] [LNG] Re: [ovs-discuss] OVS Support for RT Kernel

Jesse Gross jesse at nicira.com
Sat Aug 31 00:57:02 UTC 2013


On Wed, Aug 28, 2013 at 2:57 AM, Viresh Kumar <viresh.kumar at linaro.org> wrote:
> On 28 August 2013 05:48, Jesse Gross <jesse at nicira.com> wrote:
>
>> The implementation is actually pretty much exactly the same as before.
>> The only reason why there are no longer separate process/interrupt
>> counters is because we started disabling bottom halves when processing
>> packets for userspace. However, with your change that is longer the
>> case and it's possible to be interrupted/rescheduled while using a
>> per-CPU counter. This will produce non-deterministic results since the
>> processing for multiple packets are sharing the same loop counter.
>
> Ahh, I didn't wanted to change how the system works today and assumed
> incorrectly that get_cpu_var() will also disable bh but it only disables
> preemption, not bh/irqs..
>
> So my original patch should have been like this:
>
> diff --git a/datapath/actions.c b/datapath/actions.c
> index fa4b904..37b29d6 100644
> --- a/datapath/actions.c
> +++ b/datapath/actions.c
> @@ -607,6 +607,9 @@ int ovs_execute_actions(struct datapath *dp,
> struct sk_buff *skb)
>         int error;
>
>         /* Check whether we've looped too much. */
>         spin_lock_bh(some-lock);
>         loop = &__get_cpu_var(loop_counters);
>         if (unlikely(++loop->count > MAX_LOOPS))
>                 loop->looping = true;
> @@ -629,5 +632,8 @@ out_loop:
>         if (!--loop->count)
>                 loop->looping = false;
>
>        spin_unlock_bh(some-lock);
>        return error;
>  }
> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index 27deec8..a0fa3de 100644
> --- a/datapath/datapath.c
> +++ b/datapath/datapath.c
> @@ -940,9 +940,7 @@ static int ovs_packet_cmd_execute(struct sk_buff
> *skb, struct genl_info *info)
>         if (!dp)
>                 goto err_unlock;
>
> -       local_bh_disable();
>         err = ovs_execute_actions(dp, packet);
> -       local_bh_enable();
>         rcu_read_unlock();
>
>         ovs_flow_free(flow, false);
> (END)
>
> Which should have worked for both RT and non-RT kernels..

I think that will work for the loop checker but executing the actions
could be a very long code path - lots of actions, trips through the IP
stack, etc. It doesn't seem like a great idea to hold a spin lock for
this amount of time, particularly if the goal is preemptibility.

>>>> The current
>>>> method will yield non-deterministic results in the presence of
>>>> preemption.
>>>
>>> I see one problem: We may access these per-cpu variables
>>> concurrently from process context and bh.. And so we need
>>> something to guarantee the serialization here.. And then probably
>>> we don't need to do: get_cpu_var() for non-RT case..
>>>
>>> Do we have some existing lock that can be reused here? Otherwise
>>> I will create a new one.. spinlock would be better then a mutex as we
>>> would be accessing this code from BH too..
>>
>> There's something similar in dev.c so you could check to see if they
>> did anything to that in the RT patchset.
>
> net/core/dev.c??

Yes, that's right.

> There is only one place where they are using local_bh_disable() for
> RT kernel with following patch:

I doubt that the solution would be to turn off bottom halves here. I
would look for the counter code and see what they do with it
specifically.

>>>> Have you also audited the other use of per-CPU variables?
>>>
>>> Yes, I did. But couldn't find anything that would be broken with RT..
>>> Per-cpu doesn't have a problem with RT, only the access to those
>>> can be a issue.. And if things are serialized enough, then there
>>> shouldn't be any issue, I hope.
>>
>> Can't you be interrupted/preempted while modifying the counters?
>
> Yes, you can be. But that will happen today (non-RT) as well without
> proper locking or something like local_bh_disable()..
>
> And I thought if we aren't doing local_bh_disable() at multiple places
> then proper locking is there with spin_lock_bh() or similar.. And I see
> such lockings in place at most of the places..

Most of the network stack operates with bottom halves disabled so
packets ingressing OVS from anywhere except userspace will already
have them turned off. Since we also turn off bottom halves with
packets coming from userspace, this is used as a synchronization
mechanism.

By the way, I don't have a lot of knowledge about the RT patchset,
particularly recent changes. You might want to ask someone with more
direct knowledge.



More information about the dev mailing list