[ovs-discuss] [LNG] Re: OVS Support for RT Kernel

Viresh Kumar viresh.kumar at linaro.org
Wed Aug 28 09:57:58 UTC 2013


On 28 August 2013 05:48, Jesse Gross <jesse at nicira.com> wrote:

> The implementation is actually pretty much exactly the same as before.
> The only reason why there are no longer separate process/interrupt
> counters is because we started disabling bottom halves when processing
> packets for userspace. However, with your change that is longer the
> case and it's possible to be interrupted/rescheduled while using a
> per-CPU counter. This will produce non-deterministic results since the
> processing for multiple packets are sharing the same loop counter.

Ahh, I didn't wanted to change how the system works today and assumed
incorrectly that get_cpu_var() will also disable bh but it only disables
preemption, not bh/irqs..

So my original patch should have been like this:

diff --git a/datapath/actions.c b/datapath/actions.c
index fa4b904..37b29d6 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -607,6 +607,9 @@ int ovs_execute_actions(struct datapath *dp,
struct sk_buff *skb)
        int error;

        /* Check whether we've looped too much. */
        spin_lock_bh(some-lock);
        loop = &__get_cpu_var(loop_counters);
        if (unlikely(++loop->count > MAX_LOOPS))
                loop->looping = true;
@@ -629,5 +632,8 @@ out_loop:
        if (!--loop->count)
                loop->looping = false;

       spin_unlock_bh(some-lock);
       return error;
 }
diff --git a/datapath/datapath.c b/datapath/datapath.c
index 27deec8..a0fa3de 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -940,9 +940,7 @@ static int ovs_packet_cmd_execute(struct sk_buff
*skb, struct genl_info *info)
        if (!dp)
                goto err_unlock;

-       local_bh_disable();
        err = ovs_execute_actions(dp, packet);
-       local_bh_enable();
        rcu_read_unlock();

        ovs_flow_free(flow, false);
(END)

Which should have worked for both RT and non-RT kernels..

>>> The current
>>> method will yield non-deterministic results in the presence of
>>> preemption.
>>
>> I see one problem: We may access these per-cpu variables
>> concurrently from process context and bh.. And so we need
>> something to guarantee the serialization here.. And then probably
>> we don't need to do: get_cpu_var() for non-RT case..
>>
>> Do we have some existing lock that can be reused here? Otherwise
>> I will create a new one.. spinlock would be better then a mutex as we
>> would be accessing this code from BH too..
>
> There's something similar in dev.c so you could check to see if they
> did anything to that in the RT patchset.

net/core/dev.c??

There is only one place where they are using local_bh_disable() for
RT kernel with following patch:

commit 2aac00eb41a5b8b180fc59324886b2cebbf78124
Author: Thomas Gleixner <tglx at linutronix.de>
Date:   Sun Oct 28 15:12:49 2012 +0000

    net: Use local_bh_disable in netif_rx_ni()

    This code triggers the new WARN in __raise_softirq_irqsoff() though it
    actually looks at the softirq pending bit and calls into the softirq
    code, but that fits not well with the context related softirq model of
    RT. It's correct on mainline though, but going through
    local_bh_disable/enable here is not going to hurt badly.

    Signed-off-by: Thomas Gleixner <tglx at linutronix.de>
---
 net/core/dev.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 4466dfd..dc8ec3e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3201,11 +3201,9 @@ int netif_rx_ni(struct sk_buff *skb)
 {
        int err;

-       migrate_disable();
+       local_bh_disable();
        err = netif_rx(skb);
-       if (local_softirq_pending())
-               thread_do_softirq();
-       migrate_enable();
+       local_bh_enable();

        return err;
 }


>>> Have you also audited the other use of per-CPU variables?
>>
>> Yes, I did. But couldn't find anything that would be broken with RT..
>> Per-cpu doesn't have a problem with RT, only the access to those
>> can be a issue.. And if things are serialized enough, then there
>> shouldn't be any issue, I hope.
>
> Can't you be interrupted/preempted while modifying the counters?

Yes, you can be. But that will happen today (non-RT) as well without
proper locking or something like local_bh_disable()..

And I thought if we aren't doing local_bh_disable() at multiple places
then proper locking is there with spin_lock_bh() or similar.. And I see
such lockings in place at most of the places..

So, even if we get preempted then also serialization should be
guaranteed, as is guaranteed in non-RT kernel today..

I am pretty much new to Networking domain, thanks for answering my
stupid questions :)

--
viresh



More information about the discuss mailing list