[ovs-dev] OVN: Setting custom data

Fri Feb 8 13:16:30 UTC 2019

Thanks for your reply Ben. Comments in-line below.

On 2/7/19 10:06 PM, Ben Pfaff wrote:
> On Thu, Feb 07, 2019 at 02:24:08PM -0500, Mark Michelson wrote:
>> The general problem I have is that I'm developing a feature for OVN. When a
>> relevant packet arrives, I want to set some data and have that data
>> available the entire time that the packet is being processed. My initial
>> idea was to use one of reg0-reg9 for this. However, those register values
>> get reset to 0 when the packet changes from the ingress to the egress
>> pipeline and when the packet moves from one logical datapath to another.
>>
>> My question is, is it possible (or acceptable) for me to use one of
>> reg0-reg9 as a special purpose register and not to reset its value to 0
>> automatically? Or is there some other method that would be appropriate?
> 
> The reason that the registers get reset to 0 on transition between
> pipelines is that the transition between pipelines is where packets get
> transmitted across tunnels from one hypervisor to another.  The
> registers don't get transmitted in the tunnels, so they would naturally
> get reset to 0.  We want the logical pipeline to be independent of the
> physical pipeline, so we reset all the registers even when the packets
> are not transmitted over a tunnel.
> 
> This could easily be changed, if we want to assume the use of Geneve,
> since we can transmit as many registers as we want as Geneve TLVs.  But
> that would rule out the use of STT (and further restrict the use of
> VXLAN).  In the past, there's been little willingness to do that.

Yes, this is an excellent point. If we can avoid having to modify 
encapsulation metadata that'd be swell.

> 
>> Here's a more detailed explanation of the problem. OpenShift is migrating
>> from using their own custom use of OVS to using OVN (via ovn-kubernetes).
>> One feature they currently offer is for any pod in a namespace to be able to
>> send a multicast packet and have it reach all other pods in the same
>> namespace. The actual multicast destination address is not important. This
>> is on purpose so that separate namespaces can use the same multicast
>> destination without the worry of collision. They want to maintain this
>> feature when switching to OVN.
>>
>> You may recall me talking about doing something like this quite some time
>> ago, but the actual method has changed a bit since then.
>>
>> My approach for implementing this is to create a new northbound table for
>> multicast groups. This table contains a list of logical switch ports that
>> represent members of the group. In ovn-northd, we do some magic to ensure
>> that if any of these logical switch ports are on separate logical switches,
>> then appropriate logical flows get installed so that the multicast packet
>> attempts to traverse the logical router(s) that separate the logical
>> switches.
>>
>> When the packet initially arrives on a logical switch, we can use the
>> logical input port, coupled with the multicast destination, to determine the
>> multicast group that this packet should be sent to. However, once the packet
>> reaches a logical router, there's nothing about the packet that we can use
>> to determine which multicast group is the appropriate destination. Same
>> thing occurs when the packet arrives in a logical switch from a logical
>> router.
>>
>> My idea for fixing this is to store the multicast group ID in a
>> general-purpose register.  But as stated above, if I set this value in a
>> register, the register's value will get reset to 0 when the packet reaches
>> the logical router pipeline. Hence my questions from the beginning.
> 
> It seems like this could be done by using allocating multicast IP
> addresses to the groups, and then mapping from multicast IP addresses to
> the OVN multicast group.  But I'm really naive about IP multicast so
> maybe this doesn't make sense.
> 

You're exactly right. When I first started implementing multicast 
groups, I did it this way. However, as I mentioned above, I'm trying to 
keep OpenShift's current usage in mind, and they wish to be able to send 
to any multicast address and keep the packet in the current namespace.

I thought about this some more this morning, and I may be able to do 
this by fudging with the logical output port. This is already in the 
encapsulation metadata, so if I could get away with that, I might have 
what I need.