[ovs-dev] [PATCH v7 3/7] ovn: Introduce "chassisredirect" port binding

Fri Jan 13 22:19:21 UTC 2017

On Thu, Jan 12, 2017 at 5:12 PM, Mickey Spiegel <mickeys.dev at gmail.com>
wrote:

>
> On Sun, Jan 8, 2017 at 10:30 PM, Mickey Spiegel <mickeys.dev at gmail.com>
> wrote:
>
>>
>> On Fri, Jan 6, 2017 at 8:31 PM, Mickey Spiegel <mickeys.dev at gmail.com>
>> wrote:
>>
>>>
>>> On Fri, Jan 6, 2017 at 4:21 PM, Mickey Spiegel <mickeys.dev at gmail.com>
>>> wrote:
>>>
>>>>
>>>> On Fri, Jan 6, 2017 at 4:11 PM, Ben Pfaff <blp at ovn.org> wrote:
>>>>
>>>>> On Fri, Jan 06, 2017 at 03:47:03PM -0800, Mickey Spiegel wrote:
>>>>> > On Fri, Jan 6, 2017 at 3:20 PM, Ben Pfaff <blp at ovn.org> wrote:
>>>>> >
>>>>> > > On Fri, Jan 06, 2017 at 12:00:30PM -0800, Mickey Spiegel wrote:
>>>>> > > > Currently OVN handles all logical router ports in a distributed
>>>>> manner,
>>>>> > > > creating instances on each chassis.  The logical router ingress
>>>>> and
>>>>> > > > egress pipelines are traversed locally on the source chassis.
>>>>> > > >
>>>>> > > > In order to support advanced features such as one-to-many NAT
>>>>> (aka IP
>>>>> > > > masquerading), where multiple private IP addresses spread across
>>>>> > > > multiple chassis are mapped to one public IP address, it will be
>>>>> > > > necessary to handle some of the logical router processing on a
>>>>> specific
>>>>> > > > chassis in a centralized manner.
>>>>> > > >
>>>>> > > > The goal of this patch is to develop abstractions that allow for
>>>>> a
>>>>> > > > subset of router gateway traffic to be handled in a centralized
>>>>> manner
>>>>> > > > (e.g. one-to-many NAT traffic), while allowing for other subsets
>>>>> of
>>>>> > > > router gateway traffic to be handled in a distributed manner
>>>>> (e.g.
>>>>> > > > floating IP traffic).
>>>>> > > >
>>>>> > > > This patch introduces a new type of SB port_binding called
>>>>> > > > "chassisredirect".  A "chassisredirect" port represents a
>>>>> particular
>>>>> > > > instance, bound to a specific chassis, of an otherwise
>>>>> distributed
>>>>> > > > port.  The ovn-controller on that chassis populates the "chassis"
>>>>> > > > column for this record as an indication for other
>>>>> ovn-controllers of
>>>>> > > > its physical location.  Other ovn-controllers do not treat this
>>>>> port
>>>>> > > > as a local port.
>>>>> > > >
>>>>> > > > A "chassisredirect" port should never be used as an "inport".
>>>>> When an
>>>>> > > > ingress pipeline sets the "outport", it may set the value to a
>>>>> logical
>>>>> > > > port of type "chassisredirect".  This will cause the packet to be
>>>>> > > > directed to a specific chassis to carry out the egress logical
>>>>> router
>>>>> > > > pipeline, in the same way that a logical switch forwards egress
>>>>> traffic
>>>>> > > > to a VIF port residing on a specific chassis.  At the beginning
>>>>> of the
>>>>> > > > egress pipeline, the "outport" will be reset to the value of the
>>>>> > > > distributed port.
>>>>> > > >
>>>>> > > > For outbound traffic to be handled in a centralized manner, the
>>>>> > > > "outport" should be set to the "chassisredirect" port
>>>>> representing
>>>>> > > > centralized gateway functionality in the otherwise distributed
>>>>> router.
>>>>> > > > For outbound traffic to be handled in a distributed manner,
>>>>> locally on
>>>>> > > > the source chassis, the "outport" should be set to the existing
>>>>> "patch"
>>>>> > > > port representing distributed gateway functionality.
>>>>> > > >
>>>>> > > > Inbound traffic will be directed to the appropriate chassis by
>>>>> > > > restricting source MAC address usage and ARP responses to that
>>>>> chassis,
>>>>> > > > or by running dynamic routing protocols.
>>>>> > > >
>>>>> > > > Note that "chassisredirect" ports have no associated IP or MAC
>>>>> addresses.
>>>>> > > > Any pipeline stages that depend on port specific IP or MAC
>>>>> addresses
>>>>> > > > should be carried out in the context of the distributed port.
>>>>> > > >
>>>>> > > > Although the abstraction represented by the "chassisredirect"
>>>>> port
>>>>> > > > binding is generalized, in this patch the "chassisredirect" port
>>>>> binding
>>>>> > > > is only created for NB logical router ports that specify the new
>>>>> > > > "redirect-chassis" option.  There is no explicit notion of a
>>>>> > > > "chassisredirect" port in the NB database.  The expectation is
>>>>> when
>>>>> > > > capabilities are implemented that take advantage of
>>>>> "chassisredirect"
>>>>> > > > ports (e.g. NAT), the addition of flows specifying a
>>>>> "chassisredirect"
>>>>> > > > port as the outport will also be triggered by the presence of the
>>>>> > > > "redirect-chassis" option.  Such flows are added for NB logical
>>>>> router
>>>>> > > > ports that specify the "redirect-chassis" option.
>>>>> > > >
>>>>> > > > Signed-off-by: Mickey Spiegel <mickeys.dev at gmail.com>
>>>>> > >
>>>>> > > chassisredirect ports seem incredibly similar to vif ports.  Is
>>>>> the only
>>>>> > > difference that the output port is changed at the beginning of the
>>>>> > > egress pipeline?  That's something that could be implemented in the
>>>>> > > logical egress pipeline with 'outport = "...";'.  We do say that
>>>>> the
>>>>> > > outport isn't supposed to be modified in an egress pipeline, but
>>>>> nothing
>>>>> > > enforces that and if it's actually useful then we could just
>>>>> change the
>>>>> > > documentation.
>>>>> > >
>>>>> >
>>>>> > I don't get the similarity to vif ports.
>>>>> >
>>>>> > I need to create two different ports for each logical router port
>>>>> > specifying a "redirect-chassis". One represents the centralized
>>>>> > instance, for traffic that needs to be centralized. The other
>>>>> > represents the distributed instance, i.e. just take the local patch
>>>>> > port and go to/from the local logical router instance. I wanted the
>>>>> > egress pipeline processing to be the same regardless of whether
>>>>> > the packet arrived at the egress pipeline on the port representing
>>>>> > the centralized instance, or whether the packet arrived at the
>>>>> > egress pipeline on the port representing the distributed instance.
>>>>> >
>>>>> > There is no pipeline processing of the chassisredirect port,
>>>>> > except as the outport in the ingress pipeline. Everything else
>>>>> > happens in tables 32 and 33.
>>>>>
>>>>> OK, then I'm having trouble following the description.  For me, here's
>>>>> the key paragraphs that led me to my conclusions:
>>>>>
>>>>>     This patch introduces a new type of SB port_binding called
>>>>>     "chassisredirect".  A "chassisredirect" port represents a
>>>>> particular
>>>>>     instance, bound to a specific chassis, of an otherwise distributed
>>>>>     port.  The ovn-controller on that chassis populates the "chassis"
>>>>>     column for this record as an indication for other ovn-controllers
>>>>> of
>>>>>     its physical location.  Other ovn-controllers do not treat this
>>>>> port
>>>>>     as a local port.
>>>>>
>>>>>     A "chassisredirect" port should never be used as an "inport".  When
>>>>>     an ingress pipeline sets the "outport", it may set the value to a
>>>>>     logical port of type "chassisredirect".  This will cause the packet
>>>>>     to be directed to a specific chassis to carry out the egress
>>>>> logical
>>>>>     router pipeline, in the same way that a logical switch forwards
>>>>>     egress traffic to a VIF port residing on a specific chassis.  At
>>>>> the
>>>>>     beginning of the egress pipeline, the "outport" will be reset to
>>>>> the
>>>>>     value of the distributed port.
>>>>>
>>>>> The first paragraph appears to say that a chassisredirect port is a
>>>>> port
>>>>> on a particular chassis and that its chassis column says what chassis
>>>>> it's on.  OK, that's the same as a vif port, right?
>>>>>
>>>>
>>>> Yes, the same as vif, l2gateway, or l3gateway in the sense that this
>>>> port is bound to a chassis. No differences there.
>>>>
>>>>>
>>>>> The second paragraph appears to me to say, first, that packets would
>>>>> never originate from a chassisredirect port.  OK, fine, no problem.
>>>>> Second, it directly makes an analogy to vif ports, and then says that
>>>>> the outport changes.  No problem.
>>>>>
>>>>
>>>> Two main differences from vif:
>>>> 1. The outport changes. I want the ct_zone assignments in table 33
>>>>    and the loopback check in table 34 to be according to the new
>>>>    outport.
>>>>
>>>> 2. There is no pipeline processing of this port. This port has no
>>>>    addresses or other configuration. The purpose of the port is to
>>>>    tell table 32 to go to a particular chassis, and then tell table 33
>>>>    what the real outport should be.
>>>>
>>>> I got to this notion because a port is the way to tell table 32 to
>>>> go to a particular chassis. The first thought was two regular patch
>>>> ports, but the idea of two patch ports with the same addresses
>>>> is confusing and dangerous. By changing back to the real patch
>>>> port right away in the egress pipeline, it avoids those problems.
>>>>
>>>> Mickey
>>>>
>>>
>>> Let me go back to first principles. I need three sorts of chassis
>>> specific behaviors for distributed NAT:
>>> 1. Install some flows only on the chassis where a certain logical
>>>    port resides. That is is_chassis_resident which you already
>>>    reviewed and acked. The nat flows patch at the end of the
>>>    patch set uses this mechanism.
>>> 2. Install a different set of flows associated with the distributed
>>>    gateway port only on the redirect-chassis. There are several
>>>    such flows in this patch.
>>> 3. Direct some traffic with outport being the distributed gateway
>>>    port to the instance of the distributed gateway port on the
>>>    redirect-chassis. When this traffic hits table 32, it gets
>>>    sent through the normal tunnel to the redirect-chassis.
>>>
>>> I needed some handle that triggers 3. I decided to make that
>>> handle be a port, which I called a "chassisredirect" port. That
>>> also allows me to use is_chassis_resident(chassisredirect_port)
>>> to solve 2.
>>>
>>> It is possible to make that handle be something other than a
>>> port, as long as table 32 is modified to act on that. In that case,
>>> I will need another match "condition" (as I called it) based on
>>> that handle, similar to is_chassis_resident but based on
>>> whatever handle we decide on instead of port.
>>>
>>
>> I realized earlier tonight that there is a straightforward
>> alternative, though it does have one potentially confusing
>> aspect.
>>
>> For some reason, I had been assuming that a port_binding is
>> either exclusive to a chassis (in the previous implementation
>> with OVS patch ports, it had an ofport), or the port_binding
>> exists everywhere and does not have a chassis association
>> (is_remote in the previous implementation with OVS patch
>> ports).
>>
>> If this is relaxed and we allow logical patch ports to be
>> associated with a chassis, then all I need is a new
>> MLF_FORCE_CHASSIS_REDIRECT flag rather than
>> a second port_binding with a new "chassisredirect" type.
>>
>> The potentially confusing aspect is that even though the
>> mechanism for associating a logical patch port with a
>> chassis is identical to that for other port_binding types such
>> as "l3gateway", the association of a chassis with a logical
>> patch port has a different meaning than the association of a
>> chassis with a VIF, a type "l3gateway" port_binding, or a
>> type "l2gateway" port_binding.  For the latter, the association
>> is exclusive, i.e. the port only exists on that chassis.  For
>> logical patch ports, whether there is an association with a
>> chassis or not, the logical patch port exists everywhere
>> (subject to the constraints of conditional monitoring).
>>
>> The chassis association would only be used for a new
>> table 32 flow similar to other flows sending packets to
>> remote hypervisors for other port_binding types, but with
>> a different match condition:
>>     match_set_metadata(&match, htonll(dp_key))
>>     match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
>>     match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0,
>>                          1, MLF_FORCE_CHASSIS_REDIRECT);
>>
>> Depending on whether the
>> MLF_FORCE_CHASSIS_REDIRECT flag is set, the
>> packet would either be sent to the remote hypervisor,
>> or it would fall through to the table 32 priority 0 fallback
>> flow and be processed locally.
>>
>> The chassis association could also be used for
>> evaluation of is_chassis_resident("l3dgw_port") functions
>> in flow matches.
>>
>> If you agree that this approach is more promising than
>> type "chassisredirect" ports, I can code this up tomorrow.
>>
>
> I am having trouble making this approach work with the
> ARP request table. With the approach of replacing the
> logical outport, the ARP request goes to the controller
> with the new outport of type "chassisredirect". When the
> packet is reinjected, it does eventually end up at the
> redirect chassis.
>
> With the approach of using a flag, the packet is not
> hitting the table 32 entry matching the flag. I am not sure
> what happens to the packet after it goes up to the
> controller, and I am not sure how to debug it further or
> what to change to make it work.
>

I found the bug. It was affecting all packets, not just arp, and
was a simple fix. I am still checking all scenarios, but I think
I have the approach with the flag instead of a new port type
working. I can move forward with either approach, a flag or
a new port type as originally proposed.

Mickey

>
> Mickey
>
>
>> Mickey
>>
>>
>>
>>> Mickey
>>>
>>>
>>>>
>>>>> I guess that I must be missing important points, but that's why I
>>>>> interpreted the text as I did.  Can you help me figure out why I'm not
>>>>> following?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Ben.
>>>>>
>>>>
>>>>
>>>
>>
>