[ovs-dev] Multiple OVS vswitchd controlling same kernel path?
fbl at redhat.com
Tue Mar 31 19:04:48 UTC 2020
On Wed, Mar 25, 2020 at 11:34:57PM +0530, Numan Siddique wrote:
> On Wed, Mar 25, 2020 at 9:04 PM William Tu <u9012063 at gmail.com> wrote:
> > On Wed, Mar 25, 2020 at 08:21:38AM -0700, William Tu wrote:
> > > On Wed, Mar 25, 2020 at 10:08:14AM -0400, Tim Rozet wrote:
> > > > Hi All,
> > > > I've run into this question several times, and several folks have different
> > > > opinions. I was hoping we could resolve it. The question centers around
> > >
> > > Me too. Thanks for bringing this up.
> > >
> > > > having multiple OVS vswitchd instances in different netns on the same host
> > > > using kernel data path. In the past several folks have told me this does
> > > > not work, because there will be potential conflicts in the kernel data
> > > > path, or each OVS may flush the kernel path flows without regard for
> > > > what another OVS has programmed. In contrast, some other folks have told me
> > > > this should work perfectly fine as each OVS has is using its own DPID and
> > > > there should be 0 conflicts.
> > >
> > > There is a talk about this in 2015 mentioning a couple of issues
> > > https://www.openvswitch.org/support/ovscon2015/17/1555-benc.pdf
> > > Maybe we could go over and test issues in this slide?
> > >
> > > but in reality, I do see people running multiple ovs-vswitcd in multiple
> > > containers sharing one ovs kernel datapath, without any problem.
> > >
> > > >
> > > > One use case around this is being able to use Kubernetes In Docker (KIND),
> > > > where we are running multiple Docker containers acting as "nodes" with ovs
> > > > containers inside them, to simulate a large k8s deployment on a single
> > > > host. Antrea has added support for using netdev mode and claims that kernel
> > > > data path with multiple OVS will not work:
> > > > https://github.com/vmware-tanzu/antrea/issues/14
> > > > https://github.com/vmware-tanzu/antrea/blob/master/docs/kind.md#why-is-the-yaml-manifest-different-when-using-kind
> > > >
> > How about doing this experiment:
> > 1) start 2 containers
> > 2) in each container, run 'make check-kernel' or 'make check-kmod' at the same time
> > 3) see if test cases passed or behave the same as single container
> > William
> In the case of ovn testing, I use ovn-fake-multinode , which
> creates containers to act as fake nodes.
> In my testing I noticed that output of "ovs-dpctl dump-flows" is
> different for each container.
> Out of curiosity, I put some prints in the ovs kernel module and I
> could see that the "struct datapath"
> is different for each container.
> In my opinion, multiple instances of ovs-vswitchd should run fine as
> long as they run in different network namespace.
OvS kernel module keeps track of multiple datapaths within the same
netns, and also multiple netns. So, in theory, you could have
combination of both. If I recall correctly, the kernel uses the
netlink socket's netns created by the tools as the netns to operate
inside the kernel. The kernel DP sends upcalls to the netns where
the DP was created, using IDs to identify each message. The
revalidator thread does the same.
Therefore, I think if you are just moving everything you had in the
root namespace along with its child netns together into a new netns
then it should work. And you could have multiple instances of that.
Otherwise there are limitations. For instance, you can't run one OvS
in the parent netns and one OvS in the child netns, because the one
running in the parent listens to all netlink messages, including the
child's one. So, ifaces with the same name will conflict. You can't
use tools from a different netns, because the kernel uses the
socket's netns as the netns in the kernel. Same goes for the daemons,
because they are required to have sockets in the same netns.
There are other scenarios which were not possible before, like
adding an internal port from one OvS to another one. I don't think
it will work. Using veth pairs most probably is the safest solution
because the interface remains in the same netns as the daemons and
yet you can cross netns in a generic and secure way.
More information about the dev