[ovs-dev] [PATCH 1/1 V4] Add support for tun_key to OVS datapath

Jesse Gross jesse at nicira.com
Fri Sep 7 23:34:52 UTC 2012


On Fri, Sep 7, 2012 at 12:56 PM, Kyle Mestery (kmestery)
<kmestery at cisco.com> wrote:
> On Sep 6, 2012, at 10:24 PM, Jesse Gross wrote:
>> On Wed, Sep 5, 2012 at 2:58 PM, Kyle Mestery <kmestery at cisco.com> wrote:
>>> @@ -1185,7 +1207,12 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port, __be64 *tun_id,
>>>                                break;
>>>
>>>                        case OVS_KEY_ATTR_TUN_ID:
>>> -                               *tun_id = nla_get_be64(nla);
>>> +                               tun_key->tun_id = nla_get_be64(nla);
>>> +                               break;
>>> +
>>> +                       case OVS_KEY_ATTR_IPV4_TUNNEL:
>>> +                               memcpy(tun_key, nla_data(nla),
>>> +                                      sizeof(*tun_key));
>>
>> We should enforce consistency here as well (this one is probably more
>> likely to be broken anyways because flow setups just echo back what
>> the kernel supplied but this is generated "by hand").  In this case we
>> can't enforce that both come together.
>>
> I don't think there is an implicit order to the netlink attributes here, so what I did
> was something like this:
>
> diff --git a/datapath/flow.c b/datapath/flow.c
> index 1c4eb99..5be492e 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
> @@ -1187,9 +1187,10 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
>  {
>         const struct nlattr *nla;
>         int rem;
> +       __be64 tun_id = 0;
>
>         *in_port = DP_MAX_PORTS;
> -       memset(tun_key, 0, sizeof(struct ovs_key_ipv4_tunnel));
> +       memset(tun_key, 0, sizeof(*tun_key));
>         *priority = 0;
>
>         nla_for_each_nested(nla, attr, rem) {
> @@ -1205,12 +1206,19 @@ int ovs_flow_metadata_from_nlattrs(u32 *priority, u16 *in_port,
>                                 break;
>
>                         case OVS_KEY_ATTR_TUN_ID:
> -                               tun_key->tun_id = nla_get_be64(nla);
> +                               tun_id = nla_get_be64(nla);
> +                               if (tun_key->tun_id != 0 &&
> +                                   tun_key->tun_id != tun_id)
> +                                       return -EINVAL;
>                                 break;
>
>                         case OVS_KEY_ATTR_IPV4_TUNNEL:
> +                               if (tun_key->tun_id != 0)
> +                                       tun_id = tun_key->tun_id;
>                                 memcpy(tun_key, nla_data(nla),
>                                        sizeof(*tun_key));
> +                               if (tun_id != 0 && tun_id != tun_key->tun_id)
> +                                       return -EINVAL;
>                                 break;

It's true that there's no ordering requirements here.  However, we do
have to be able to accept either or both attributes.  Current
userspace will generate OVS_KEY_ATTR_TUN_ID but it looks like this
will only store the value unless we also get OVS_KEY_ATTR_IPV4_TUNNEL.
 I guess I would probably also add an explicit flag to indicate that
tun_id has been set rather than relying on nonzero to have that
meaning.

>>> diff --git a/datapath/flow.h b/datapath/flow.h
>>> index 02c563a..5d4fcde 100644
>>> --- a/datapath/flow.h
>>> +++ b/datapath/flow.h
>>> @@ -42,7 +42,6 @@ struct sw_flow_actions {
>>>
>>> struct sw_flow_key {
>>>        struct {
>>> -               __be64  tun_id;         /* Encapsulating tunnel ID. */
>>>                u32     priority;       /* Packet QoS priority. */
>>>                u16     in_port;        /* Input switch port (or DP_MAX_PORTS). */
>>>        } phy;
>>> @@ -92,6 +91,9 @@ struct sw_flow_key {
>>>                        } nd;
>>>                } ipv6;
>>>        };
>>> +        struct {
>>> +               struct ovs_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
>>> +       } tun;
>>> };
>>
>> The solution for this is a little more complicated because for
>> efficiency we only match on as much of the flow struct as is
>> necessary.  Since we currently calculate how far we need to search
>> (i.e. TCP/IPv4) without taking into account a tunnel header at the end
>> this will never match on the tunnel.  The other less serious problem
>> is that this reduces the effectiveness of the partial length match
>> when tunneling because IPv4 packets will have to match over a series
>> of zeros where the IPv6 header exists.  Basically we want to stick
>> tunnel headers at the end of the current match, where ever that might
>> be.
>>
>> One way of doing this is to add a copy of the tun_key to each union
>> that could possibly be the end of the struct.  We essentially do that
>> with the TCP/UDP ports which could follow either a IPv4 or IPv6
>> header.  However in the case of tunnels there are many more possible
>> end locations (I count 7) and whereas L4 ports logically follows the
>> L3 header that's not really the case with tunnels, we're just moving
>> them around because we can and the benefit is large.
>>
>> I think what I would do is to make struct tun completely independent
>> of struct sw_flow_key and then have it "float" over where ever we
>> decide is the end of the flow struct.  It's a little messy if you want
>> random access to it because you have to compute the size of the
>> earlier members but in practice don't think we ever do so it shouldn't
>> be too bad.
>>
> I think I see what you're looking for here. I'm going to make this a separate patch
> built on top of the other stuff, since it seems this may be a little more complicated
> as you indicate. I'll send it out with the other patch, though.

Breaking it out into a separate patch definitely makes sense.  I'll
just want to apply them both at the same time so that we don't have
any performance regressions.



More information about the dev mailing list