[ovs-dev] OpenVSwitch and libvirt integration problem at shutdown/reboot

Ernesto Domato edomat at gmail.com
Thu Mar 7 17:02:35 UTC 2013


Ok, I'm cross-posting to OpenVSwitch devel mailing and libvirt-users
mailing so we all are in the same tune :-)

If you didn't read the original mail that I sent, it is about an issue
that I'm having on Debian that when I reboot or shutdown the host
system, OVS is stopped before the KVMs are taking down and when
libvirt tries to delete the virtual interface from OVS it hangs, The
libvirt package version that I'm using doesn't have the "timeout" flag
for OVS that prevents this behavior and is in newer libvirt version.

Sorry for the duplicate to those that are on both lists.

On Wed, Mar 6, 2013 at 5:53 PM, Ansis Atteka <aatteka at nicira.com> wrote:
> On Wed, Mar 6, 2013 at 7:41 AM, Ernesto Domato <edomat at gmail.com> wrote:
>> On Mon, Mar 4, 2013 at 7:06 PM, Ansis Atteka <aatteka at nicira.com> wrote:
>>> On Mon, Mar 4, 2013 at 12:08 PM, Ernesto Domato <edomat at gmail.com> wrote:
>>>
>>> If you do not block on interface creation and libvirt/Open vSwitch
>>> init.d dependencies are not right, then I think you might end up with
>>> another race condition, where VM automatic start-up would fail.
>>> Imagine that:
>>> 1. Neither Open vSwitch or libvirt are running
>>> 2. libvirt starts up
>>> 3. libvirt tries to spin up VM and executes "ovs-vsctl --no-wait
>>> --timeout=5 -- del-port ... -- add-port ..." command. After 5 seconds
>>> this command times out, because Open vSwitch wasn't running
>>> 4. After 6 seconds Open vSwitch starts up, but VM still remains down.
>>>
>>> This means that you will have to manually start the VM one more time.
>>>
>>
>> Ok, so I guess that to solve this problem, the right solution would be
>> that libvirt wait till OVS is up by not timing out, right?
>
> I actually see two solutions:
> 1. get daemon dependencies right (I see that Eric from libvirt project
> also recommended this); or
> 2. don't timeout when executing "ovs-vsctl add-port" command from libvirt
>
> I would prefer solution #1. If we would go with solution #2, then I am
> worried that someone else later on will complain that libvirt is stuck
> again (when OVS was not running).
>

I also think that the best solution is the #1, but I tried to
implement it on my box and it didn't work. I changed the priority for
OVS at runlevels 0 (shutdown) and 6 (reboot) to be higher than
libvirt-bin script (that's the script in Debian that takes down
libvirt daemon) but for some reason OVS is still going down before
libvirt-bin finish. I guess that I should take a more deeper look at
this but if someone can help me with the standard installation of
Debian, it would be great :-)

About solution #2, that is the implemented in the patch that I
attached to this mail for the last libvirt git version, I don't know
if the VM being stuck waiting for OVS to come up is a really bad idea
since if you are using OVS for a virtual interface on the VM, it will
not work till OVS is up, if I'm not wrong. The patch just tries to
delete the virtual interface from OVS before adding it again on
"virNetDevOpenvswitchAddPort" libvirt function.

>>>> I also added "--no-wait --timeout 5" when libvirt goes down so it can
>>>> timeout if ovs-switch is down.
>>> Just curious, but wasn't this part already solved with libvirt commit
>>> 98e732fc34a47ad9dfdb64aa4207623ee4c1ebcd (network: prevent infinite
>>> hang if ovs-vswitchd isn't running)? Are you using libvirt that has
>>> this patch?
>>>
>>
>> Ok, I'm using the stable version of libvirt and that fix is in
>> experimental package. Anyway, this fix only adds the timeout flag when
>> deleting the interface from OVS which does that libvirt don't hang
>> trying to delete the interface if OVS is down. But it doesn't fix the
>> issue that the OVS-DB still have the reference to the virtual
>> interface and so, when you bring the virtual machine up again, it
>> doesn't response because of this.
>
> What do you mean by "doesn't response"? Even, if you pass
> "--timeout=5" and "--may-exist" to "add-port" command, then sometimes
> it still indefinitely blocks?
>

Sorry for the confusion but I think that I express myself wrong since
English in not my native language :-)

What I mean with "doesn't response" is that the VM doesn't hang (it
goes up without problem) but you can't reach it trough the virtual
interface attached to OVS. Seems that the older reference (which lies
on the OVS-DB since del-port timed out if OVS is down on reboot od
shutdown) to the virtual interface confuse the routing of OVS to the
VM. The only way that I found to make it respond again is running
manually "del-port" and "add-port" for the virtual interface of the
not responding VM.

I hope this time I explain myself better :-)

>> my patch adds that libvirt try to delete (with the
>> --if-exists flag) the interface before adding it again so that problem
>> is resolved.
>
> It was long time ago, but I think that adding "--may-exist" flag to
> "add-port" command was sufficient. Can you debug this a little bit
> more and tell me what exactly is failing here (e.g. provide output of
> "ps -Af | grep ovs-vsctl" should be fine when this blocking happens)?
>

This is what libvirt already does even on the Debian package that I'm
testing (is the backport one) without any patch and with this, it has
the problem that I mention above.

> If this indeed turns out to be a problem, then I think a long term
> solution would be to:
> 1. mark ovs ports created by libvirt with something like
> other_config:created-by-libvirt=true
> 2. If libvirt did not have a chance to delete old ports on shutdown,
> then at the startup it should iterate over all ports and delete unused
> ports where other_config:created-by-libvirt=true
>

Well, this is what in some manner the patch that I'm sending does. It
deletes the interface (if it already exists) from OVS before adding it
again.

> Also as Ben suggested, in long run we should get rid of "--timeout=5"
> and use something like "--try-once" in ovs-vsctl, when OVS DB is
> running locally. Otherwise, for example, with 10 ports aggregate
> timeout can be up to 50 seconds.
>

I agree with this, but I don't know how libvirt stops VMs in the sense
that if it does it all VMs in parallel or in sequence. In the first
case, there should be no aggregation.

Thanks.
Ernesto
-------------- next part --------------
A non-text attachment was scrubbed...
Name: virnetdevopenvswitch.c.patch
Type: application/octet-stream
Size: 719 bytes
Desc: not available
URL: <http://mail.openvswitch.org/pipermail/ovs-dev/attachments/20130307/42e888b7/attachment-0005.obj>


More information about the dev mailing list