[ovs-dev] [PATCH] sFlow export: include standard tunnel structures (for GRE, VXLAN etc.)

Fri Oct 25 17:49:23 UTC 2013

On Oct 24, 2013, at 5:46 PM, Jesse Gross <jesse at nicira.com> wrote:

> On Thu, Oct 24, 2013 at 3:39 PM, Romain Lenglet <rlenglet at vmware.com> wrote:
>> ----- Original Message -----
>>> From: "Jesse Gross" <jesse at nicira.com>
>>> To: "Romain Lenglet" <rlenglet at vmware.com>
>>> Cc: dev at openvswitch.org
>>> Sent: Tuesday, October 22, 2013 3:46:54 PM
>>> Subject: Re: [ovs-dev] [PATCH] sFlow export: include standard tunnel structures (for GRE, VXLAN etc.)
>>> 
>>> On Mon, Oct 21, 2013 at 2:33 PM, Romain Lenglet <rlenglet at vmware.com> wrote:
>>>> ----- Original Message -----
>>>>> From: "Romain Lenglet" <rlenglet at vmware.com>
>>>>> To: "Jesse Gross" <jesse at nicira.com>
>>>>> Cc: dev at openvswitch.org
>>>>> Sent: Friday, October 18, 2013 6:46:05 PM
>>>>> Subject: Re: [ovs-dev] [PATCH] sFlow export: include standard tunnel
>>>>> structures (for GRE, VXLAN etc.)
>>>>> 
>>>>> ----- Original Message -----
>>>>>> From: "Jesse Gross" <jesse at nicira.com>
>>>>>> To: "Romain Lenglet" <rlenglet at vmware.com>
>>>>>> Cc: "Neil Mckee" <neil.mckee at inmon.com>, dev at openvswitch.org
>>>>>> Sent: Friday, October 18, 2013 6:23:23 PM
>>>>>> Subject: Re: [ovs-dev] [PATCH] sFlow export: include standard tunnel
>>>>>> structures (for GRE, VXLAN etc.)
>>>>>> 
>>>>>> On Fri, Oct 18, 2013 at 5:58 PM, Romain Lenglet <rlenglet at vmware.com>
>>>>>> wrote:
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Jesse Gross" <jesse at nicira.com>
>>>>>>>> To: "Romain Lenglet" <rlenglet at vmware.com>
>>>>>>>> Cc: "Neil Mckee" <neil.mckee at inmon.com>, dev at openvswitch.org
>>>>>>>> Sent: Friday, October 18, 2013 5:50:05 PM
>>>>>>>> Subject: Re: [ovs-dev] [PATCH] sFlow export: include standard tunnel
>>>>>>>> structures (for GRE, VXLAN etc.)
>>>>>>>> 
>>>>>>>> On Fri, Oct 18, 2013 at 5:43 PM, Romain Lenglet <rlenglet at vmware.com>
>>>>>>>> wrote:
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> From: "Romain Lenglet" <rlenglet at vmware.com>
>>>>>>>>>> To: "Neil Mckee" <neil.mckee at inmon.com>
>>>>>>>>>> Cc: dev at openvswitch.org
>>>>>>>>>> Sent: Wednesday, October 9, 2013 10:30:17 AM
>>>>>>>>>> Subject: Re: [ovs-dev] [PATCH] sFlow export: include standard
>>>>>>>>>> tunnel
>>>>>>>>>> structures (for GRE, VXLAN etc.)
>>>>>>>>>> 
>>>>>>>>>> On Oct 8, 2013, at 10:09 PM, Neil Mckee <neil.mckee at inmon.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> +    /* Indicate 0==unknown for the src_port. It may be set to a
>>>>>>>>>>> random
>>>>>>>>>>> +       number on a flow-by-flow basis to increase entropy for
>>>>>>>>>>> ECMP
>>>>>>>>>>> fabrics.
>>>>>>>>>>> +       The assumption being made here is that it is not so
>>>>>>>>>>> important
>>>>>>>>>>> to
>>>>>>>>>>> +       report this.  At least not important enough to justify
>>>>>>>>>>> the
>>>>>>>>>>> effort
>>>>>>>>>>> +       of making it accessible here. */
>>>>>>>>>> 
>>>>>>>>>> Exporting the source UDP source port is essential.
>>>>>>>>>> You also have to export the tunnel key: GRE key (32- or 64-bit),
>>>>>>>>>> VNI
>>>>>>>>>> (24-bit), etc.
>>>>>>>>>> I don't see how this feature could be useful without the UDP
>>>>>>>>>> source
>>>>>>>>>> port
>>>>>>>>>> and
>>>>>>>>>> tunnel key.
>>>>>>>>> 
>>>>>>>>> I thought more about this. Exporting the source UDP port is really
>>>>>>>>> important. Since the source port is calculated in the tunnel port
>>>>>>>>> at
>>>>>>>>> egress during encapsulation and is lost at ingress during
>>>>>>>>> decapsulation,
>>>>>>>>> and the sampling here is done before encapsulation or after
>>>>>>>>> decapsulation,
>>>>>>>>> the easiest way I can imagine to determine the source port is to
>>>>>>>>> redo
>>>>>>>>> the
>>>>>>>>> hashing here. This would require factorizing the hashing code into
>>>>>>>>> a
>>>>>>>>> function that can be used in userspace in this code.
>>>>>>>> 
>>>>>>>> I don't think that it's really viable to regenerate the hash used to
>>>>>>>> compute the source port. In the best case, we are the ones generating
>>>>>>>> it but the kernel hash function might change or the hash might come
>>>>>>>> from the NIC. In the worst case, when we receive a packet the hash
>>>>>>>> could have been generated by a non-OVS device with an unknown hash
>>>>>>>> algorithm
>>>>>>>> 
>>>>>>> 
>>>>>>> Yes, agreed, that's a problem.
>>>>>>> The only other alternative I can imagine to get the source UDP port is
>>>>>>> to
>>>>>>> do
>>>>>>> the sampling in the port (esp. in the tunnel port) in the datapath.
>>>>>>> This would be quite intrusive and complicated, as it would require the
>>>>>>> ports
>>>>>>> to do sampling and upcalls.
>>>>>>> I'd prefer to avoid that.
>>>>>>> Do you see any other alternative?
>>>>>> 
>>>>>> I guess it's not entirely clear to me at this point why it's important
>>>>>> to record the UDP source port. Can you explain?
>>>>> 
>>>>> Identifying all the flows for a tunnel in the network is useful to detect
>>>>> changes in the routing of tunnel flows, which can e.g. be due to network
>>>>> failures (e.g. a link went down, and the flows are rerouted), and might
>>>>> impact the tunnel as a whole. This is useful for root cause analysis.
>>>>> If we didn't get all the tunnel flow headers from the hosts, we would lose
>>>>> some of the information.
>>>> 
>>>> More importantly, we want to be able to map a logical flow to a specific
>>>> tunnel flow (i.e. the tunnel's IP+transport header), to determine the path
>>>> taken by a logical flow in the physical fabric.
>>>> This is possible because the tunnel header, incl. the transport source
>>>> port,
>>>> uniquely identifies that tunnel flow in the physical network.
>>>> If we don't have the source port from OVS, we can't do that mapping.
>>>> 
>>>> Here's a proposal:
>>>> 
>>>> - Factorize get_src_port() out of datapath/vport-lisp.c to be shared by all
>>>>  vport types.
>>>> 
>>>> - Modify datapath/vport-vxlan.c to call get_src_port() instead of
>>>>  vxlan_src_port(). The VXLAN RFC doesn't specify any specific hashing
>>>>  algorithm, so it should be fine to just use the same get_src_port()
>>>>  hashing as for LISP.
>>>> 
>>>> - Always calculate the hash in kernelspace for each packet sent in an
>>>>  upcall, or only for some types of upcalls e.g. sFlow / IPFIX sampling
>>>>  upcalls, and send it in the upcall so that userspace gets the transport
>>>>  transport source port from independently from the input or output tunnel
>>>>  type.
>>> 
>>> To clarify, I think this would need to have two parts:
>>> - For received packets include the source port of the outer UDP
>>> header. This can't simply be computed because the original sender
>>> might have used an unknown hash algorithm.
>>> - Compute the hash for all packets because they might be send to a tunnel
>>> port.
>>> 
>>> Is that right?
>> 
>> Correct.
>> 
>>> The second one in particular seems a little odd to me. The other thing
>>> that I think is important to be careful of is how this will interact
>>> with megaflows. In the traditional OVS case with a very wide exact
>>> match, it was likely (although perhaps not guaranteed) that the hash
>>> computed for the source port was fixed for a given flow. This is
>>> definitely not true any longer and while it may not matter if it is
>>> only needed on a per-sampled-packet basis, it affects where and how it
>>> is attached to a flow or upcall
>> 
>> I would insert sample actions just before each output action, so whatever
>> flow is sampled is exactly the same flow that will be output. So if it's
>> calculated in the datapath for both the sample upcall and the output, the
>> hashing should be done on the exact same flow?
> 
> I think it's OK in the sampling case because, as you say, it's based
> on a particular packet. The part that is potentially a little odd is
> that we typically use the same flow format for all types of upcalls so
> we would either have to strip it out in other cases or find some
> reasonable semantics.
> 
>> Would it make any difference to sample just after an output, instead of
>> just before? It doesn't matter either way from a sampling viewpoint.
> 
> I guess if we can find a way to make this work at the physical layer
> (when the packet goes through OVS after encapsulation) that seems
> best. The upcall would have the entire packet and could dissect it
> arbitrarily deeply. I realize that this has issues with IPsec but
> maybe this is an edge case or we can mark the packet somehow before
> encryption to get the necessarily information?

I agree that the vport-*.c modules would be the best place to do the sampling,
because we have all the information we need there.
However, that would require:
- Adding hooks into all vport-*.c modules for packet sampling.
- Defining new upcalls for sending packets sampled at ingress / egress within
  a port.
- Defining new configuration options for ports (of all types), to enable/disable
  sampling, setting the sampling probability, etc., enable/disable sampling at
  ingress and/or egress, enable/disable sampling of tunnel headers, etc.

It seems like more work to me, and it's more intrusive.
I had hoped we could implement this with only minimal changes to the current
sample datapath action and upcalls.
But if you think it's worth instrumenting the datapath vports, I'll think more about
that solution.

Thanks!
--
Romain Lenglet