[ovs-dev] [patch_v2 1/3] ovn: Skip logical switch "router type" port arp responder install.

Darrell Ball dlu998 at gmail.com
Tue Oct 4 23:53:27 UTC 2016


On Tue, Oct 4, 2016 at 3:48 PM, Mickey Spiegel <mickeys.dev at gmail.com>
wrote:

> On Mon, Oct 3, 2016 at 2:21 PM, Darrell Ball <dlu998 at gmail.com> wrote:
>
>> On Mon, Oct 3, 2016 at 10:54 AM, Han Zhou <zhouhan at gmail.com> wrote:
>>
>> >
>> >
>> > On Sun, Oct 2, 2016 at 2:14 PM, Darrell Ball <dlu998 at gmail.com> wrote:
>> > >
>> > >
>> > >
>> > > On Sun, Oct 2, 2016 at 11:27 AM, Han Zhou <zhouhan at gmail.com> wrote:
>> > >>
>> > >> On Sat, Oct 1, 2016 at 4:34 PM, Darrell Ball <dlu998 at gmail.com>
>> wrote:
>> > >> >
>> > >> > Do not install any potential logical switch "router type"
>> > >> > port arp responders.  Logical router port arp responders
>> > >> > should be sufficient in this respect.
>> > >> > It seems a little wierd for a logical switch not proxying
>> > >> > for a remote VIF to be responding to arp requests and we
>> > >> > are not functionally using this capability in ovn.
>> > >> >
>> > >> Hi Darrell,
>> > >>
>> > >> The arp responder for patch port is useful e.g. when a VM pings the
>> > default gateway IP. Would removing the flow cause the arp request get
>> > flooded? And what's the benefit of removing it here?
>>
>
> I agree with Han that removing the flow would cause the ARP request to
> get flooded in more cases, so there would be some performance impact.
>

As I stated in a previous e-mail, the VMs mac binding to the local DLR is
not very
likely to time out and cause flooding. There are multiple "users" of DLR
mac binding
in a given VM and hence the entry should not time out very often.
Meaning the VM should not need to issue arp requests very often for the DLR,
in the first place.
Meaning, it is not a super great optimization.

arp aging timeout is more likely for the other proxied cases, which are NOT
being debated
here.


>
> > >
>> > >
>> > > 1) Modelling: I would expect the L3 gateway arp responder to be
>> > associated with the L3
>> > > gateway router datapath, at the very least. That way, the modeling is
>> > correct and we don't have a situation where, for example, a phantom
>> gateway
>> > router is never even downloaded to a HV,
>> > > but is "responding" or rather appearing to respond to arp requests.
>> > >
>> >
>> > Ok, I see your concern. To achieve this expectation, it may be done in a
>> > way that is similar as the regular LS ports: reply ARP only if
>> > Logical_Switch_Port.up = true. When gateway router is bound to a
>> chassis we
>> > can set the LS patch port up to true. And for distributed routers we can
>> > set patch port up directly. This way we can avoid responding ARP before
>> > gate router is bound.
>> >
>>
>> I think you missed the main aspect.
>> There is a layering violation in doing this and also a modeling issue.
>> The key idea can be summarized as "A logical router should respond to arps
>> to itself" rather than some logical switch proxying that.
>>
>
> I don't think this is a layering violation. You are objecting to an ARP
> response
> being generated by the switch on a port far away from the actual
> destination
> port.
> Why is it so different whether the endpoint is a router or a VIF?
> Both routers and VIFs can and do generate their own ARP responses, but
> that does not rule out the optimization that responds to ARPs immediately
> at the source switch port.
>
>

I don't think we are going to agree on this one.
Here we have a case where a logical switch is responding to arp requests on
behalf
of the logical router. Meaning these datapaths are not independent.

We also have both logical switch ports and logical router ports sharing the
same MAC
and IP addresses. This apparently is what is done in Openstack.
We have no test case where we really test this, as usual.
Maybe you can add one.

Just to reiterate:
As I stated in a previous e-mail, the VMs mac binding to the local DLR is
not very
likely to time out and cause flooding. There are multiple users of DLR mac
binding
in a given VM and hence the entry should not time out very often.
Meaning, it is not a super great optimization.

arp timeout is more likely for the other proxied cases, which are NOT being
debated.




> This has implications for cases where an IP address is shared by several
>> gateways
>> and then the binding is used to designate the gateway used.
>>
>> If there are cases where an IP address can appear on the same network
> with different MAC addresses, then you have a problem. We would need
> to know more about this use case.
>
> Note that you pretty much do have a knob already to control this behavior.
> If the addresses specified on the switch's "router" type port are only
> ethernet addresses, then the switch will not generate any ARP replies.
> If the addresses specified on the switch's "router" type port include both
> ethernet and IP addresses, then the switch will generate ARP replies for
> each specified ethernet/IP address combination.
>

Yeah, we can assign the same mac and ip to both logical switch and logical
router ports.
That is not what I mean about exposing the ability properly or modularity
of datapaths.

The logical switch arp responder will make the logical router arp responder
unused, as well.

What is mean by configuration knob is the logical router datapath code being
the master of its own arp responder including early usage and making that
transparent.
Meaning, the proxying action in code is initiated from the logical router
datapath under the
control of an external knob
That is not the case today.




>
> The only other place where it looks like a switch port IP address is used
> is for IPAM and DCHP. I did not look into this in any more detail, so I am
> not sure of all the implications of leaving out the IP addresses from the
> switch "router" type port.
>
>>
>>
> >
>> > However, I wonder even this change may not be needed. For my
>> understanding
>> > ARP is just to resolve address. Do you see any real problem of replying
>> > even if the gateway router is not yet bound? I don't think this is a
>> > problem of modeling. It might look weird just because it behaves
>> slightly
>> > different from traditional view. I would prefer keep the simplicity.
>> >
>>
>> Having both logical switch and logical router arp responders for the same
>> gateway router
>> is not simpler; it is more complicated
>> I suggest having a single arp responder built by the associated logical
>> router.
>>
>>
>>
>> >
>> >
>> > > 2) We install an arp responder for the logical routers, including L3
>> > gateway(s) today (see below).
>> > > We check for inport in this rule and this inport is only associated
>> with
>> > the L3 gateway HV.
>> > > So only the L3 gateway HV should respond. Meaning, if there is a
>> > response, the L3 gateway
>> > > datapath is really there.
>> >
>> > But the L2 flooding would still happen, right?
>> >
>>
>> Of course;
>> Since a L3 gateway resides on a remote HV only, the packets need to
>> traverse the
>> network to confirm reachability and binding of that L3 gateway.
>>
>>
>>
>> >
>> > >
>> > > 3) Usually, there are a limited number of L3 gateways and therefore
>> > associated bindings.
>> > > Also, for VMs participating in south<->north traffic, the bindings are
>> > less likely
>> > > to timeout since there are multiple uses of the L3 gateway for each
>> VM.
>> > >
>> >
>> > With a big L2, even a small percent of VM doing ARP will cause annoying
>> > flooding. Moreover, considering containers come and go frequently this
>> > would be more common. So I think it is still better to suppress ARP for
>> > south-north if there is no real problem.
>> >
>>
>> I don't buy it.
>>
>> Today, we skip using arp responders for packets arriving on localnet and
>> vtep ports,
>> meaning the arp requests go to all VMs.
>>
>
> We should be clear why we are skipping ARP responders for packets
> arriving on "localnet" and "vtep" ports. It has nothing to do with
> performance.
>

The reason why we skip them is clearly documented in one patch I sent out
and it is not for performance reasons.



>
> If the switch port type is such that there might be a traditional L2
> network
> behind it, then there are two potentially serious problems:
>
>    1. Due to flooding of ARP requests in the traditional L2 network, it is
>    possible to receive multiple copies of the same ARP request on
>    different hypervisors. If they all reply to the ARP request, then the
>    source of the ARP request will receive multiple replies.
>
>
You just re-phrased what I put in the documentation patch.



>
>    1. If the ARP replies have the same MAC address from different
>    attachment points to the traditional L2 network, then they can
>    mess up L2 learning.
>
>
No kidding Mickey - see my documentation patch.



> For switch port types other than "localnet", "vtep" and "l2gateway", it
> seems like the switch ARP response is replying right at the source of
> the ARP request. If there is no flooding of the ARP request, and no L2
> learning implication, then how does the switch ARP responder cause
> any problems?
>

We were never debating the advantage of early arp response or static
arp mapping - I think that is obvious.

The discussion is specific to logical switch arp responder for the
logical router ports and how much of an optimization that was vs
the modeling and modularity issues.



>
> For "l2gateway", I guess the current supported scenarios support only
> one "l2gateway" port to each attached traditional L2 network?
> If that is the case, then even in this case the ARP response would not
> cause any problems.
>

Hmm, I guess that is why I stated that in the previous documentation patch.


> This would be a much more serious issue since external abuse is possible.
>>
>> This L3 gateway case is more limited and other approaches are possible to
>> mitigate
>> this.
>> We discussed this internally and we are otherwise thinking to have a user
>> visible
>> configuration for arp responders in general.
>>
>
> Do you need a new knob, or would it be good enough to leave the IP
> addresses out of the switch "router" type port's addresses, specifying
> only the ethernet addresses, as described above?
>
>
I think there is some work to be done here and maybe even some
optimization opportunity.
It is not the highest priority right now.



> Mickey
>
>
>> If we really cannot tolerate a few containers coming and going then we
>> have
>> a serious
>> problem that already exists for localnet and vtep cases as well as pure L2
>> forwarding
>> decisions.
>>
>>
>>
>>
>> >
>> > >
>> > >>
>> > >>
>> > >>
>> > >> Han
>> > >
>> > >
>> >
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
>>
>
>



More information about the dev mailing list