[ovs-dev] OpenVSwitch and libvirt integration problem at shutdown/reboot

Ansis Atteka aatteka at nicira.com
Wed Mar 6 20:53:59 UTC 2013


On Wed, Mar 6, 2013 at 7:41 AM, Ernesto Domato <edomat at gmail.com> wrote:
> Sorry for the late response.
>
> On Mon, Mar 4, 2013 at 7:06 PM, Ansis Atteka <aatteka at nicira.com> wrote:
>> On Mon, Mar 4, 2013 at 12:08 PM, Ernesto Domato <edomat at gmail.com> wrote:
>>
>> If you do not block on interface creation and libvirt/Open vSwitch
>> init.d dependencies are not right, then I think you might end up with
>> another race condition, where VM automatic start-up would fail.
>> Imagine that:
>> 1. Neither Open vSwitch or libvirt are running
>> 2. libvirt starts up
>> 3. libvirt tries to spin up VM and executes "ovs-vsctl --no-wait
>> --timeout=5 -- del-port ... -- add-port ..." command. After 5 seconds
>> this command times out, because Open vSwitch wasn't running
>> 4. After 6 seconds Open vSwitch starts up, but VM still remains down.
>>
>> This means that you will have to manually start the VM one more time.
>>
>
> Ok, so I guess that to solve this problem, the right solution would be
> that libvirt wait till OVS is up by not timing out, right?

I actually see two solutions:
1. get daemon dependencies right (I see that Eric from libvirt project
also recommended this); or
2. don't timeout when executing "ovs-vsctl add-port" command from libvirt

I would prefer solution #1. If we would go with solution #2, then I am
worried that someone else later on will complain that libvirt is stuck
again (when OVS was not running).


>
> I think that could be a good idea, I'll write a new patch for libvirt
> then and send it to them for comments.
>
>>>
>>> I also added "--no-wait --timeout 5" when libvirt goes down so it can
>>> timeout if ovs-switch is down.
>> Just curious, but wasn't this part already solved with libvirt commit
>> 98e732fc34a47ad9dfdb64aa4207623ee4c1ebcd (network: prevent infinite
>> hang if ovs-vswitchd isn't running)? Are you using libvirt that has
>> this patch?
>>
>
> Ok, I'm using the stable version of libvirt and that fix is in
> experimental package. Anyway, this fix only adds the timeout flag when
> deleting the interface from OVS which does that libvirt don't hang
> trying to delete the interface if OVS is down. But it doesn't fix the
> issue that the OVS-DB still have the reference to the virtual
> interface and so, when you bring the virtual machine up again, it
> doesn't response because of this. As you (Ansis Atteka I guess)

What do you mean by "doesn't response"? Even, if you pass
"--timeout=5" and "--may-exist" to "add-port" command, then sometimes
it still indefinitely blocks?

> recommended before, my patch adds that libvirt try to delete (with the
> --if-exists flag) the interface before adding it again so that problem
> is resolved.

It was long time ago, but I think that adding "--may-exist" flag to
"add-port" command was sufficient. Can you debug this a little bit
more and tell me what exactly is failing here (e.g. provide output of
"ps -Af | grep ovs-vsctl" should be fine when this blocking happens)?

If this indeed turns out to be a problem, then I think a long term
solution would be to:
1. mark ovs ports created by libvirt with something like
other_config:created-by-libvirt=true
2. If libvirt did not have a chance to delete old ports on shutdown,
then at the startup it should iterate over all ports and delete unused
ports where other_config:created-by-libvirt=true

>
>> By the way I tried to execute the same commands as libvirt would have
>> executed. Except I ran them directly in the shell. What I observed is
>> that ovs-vsctl did not block indefinitely for this corner case:
>>
>> root at ubuntu:~# service openvswitch-switch stop
>>  * ovs-brcompatd is not running
>>  * Killing ovs-vswitchd (13202)
>>  * Killing ovsdb-server (13193)
>> root at ubuntu:~# ovs-vsctl --timeout=5  -- --if-exists del-port p0
>> Mar 04 13:39:14|00002|stream_unix|ERR|/tmp/stream-unix.13222.0:
>> connection to /var/run/openvswitch/db.sock failed: No such file or
>> directory
>> Mar 04 13:39:14|00003|reconnect|WARN|unix:/var/run/openvswitch/db.sock:
>> connection attempt failed (No such file or directory)
>> Mar 04 13:39:15|00004|stream_unix|ERR|/tmp/stream-unix.13222.1:
>> connection to /var/run/openvswitch/db.sock failed: No such file or
>> directory
>> Mar 04 13:39:15|00005|reconnect|WARN|unix:/var/run/openvswitch/db.sock:
>> connection attempt failed (No such file or directory)
>> Mar 04 13:39:17|00006|stream_unix|ERR|/tmp/stream-unix.13222.2:
>> connection to /var/run/openvswitch/db.sock failed: No such file or
>> directory
>> Mar 04 13:39:17|00007|reconnect|WARN|unix:/var/run/openvswitch/db.sock:
>> connection attempt failed (No such file or directory)
>> Alarm clock
>> root at ubuntu:~#
>>
>> I tried this with Open vSwitch: 1.4.3-0ubuntu2.1. Are you seeing the
>> same effect with your Open vSwitch?
>
> Yes, I see the same behavior but after applying the patch that adds
> the --timeout flag to the older version of libvirt that is currently
> in Debian as stable but is incorporated in experimental package.
>
> So, summarizing, what I'll do is to download the GIT version of
> libvirt, make patch that just tries to delete the interface before
> adding it again when the virtual machine is going up and do it in a
> separate call to ovs-vsctl so when adding the interface, it wait for
> OVS to be up.
>
> Is that the way to solve this problem?, what do you think? :-)
Let me know answers on other questions I asked above. That will help
to figure out the right strategy.

Also as Ben suggested, in long run we should get rid of "--timeout=5"
and use something like "--try-once" in ovs-vsctl, when OVS DB is
running locally. Otherwise, for example, with 10 ports aggregate
timeout can be up to 50 seconds.

>
> Thanks
> Ernesto
> _______________________________________________
> dev mailing list
> dev at openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev



More information about the dev mailing list