[ovs-dev] [PATCH v4] ovn: DNAT and SNAT on a gateway router.

Flaviof flavio at flaviof.com
Tue Jun 21 17:29:06 UTC 2016


On Tue, Jun 21, 2016 at 10:46 AM, Guru Shetty <guru at ovn.org> wrote:

>
>
> On 20 June 2016 at 19:36, Flaviof <flavio at flaviof.com> wrote:
>
>> On Mon, Jun 13, 2016 at 6:45 AM, Gurucharan Shetty <guru at ovn.org> wrote:
>>
>> > For traffic from physical space to virtual space we need DNAT.
>> > The DNAT happens in the gateway router and reaches the logical
>> > port. The return traffic should be unDNATed.
>> >
>> > Traffic originating in virtual space heading to physical space
>> > should be SNATed. The return traffic is unSNATted.
>> >
>> > East-west traffic with the public destination IP address needs
>> > a DNAT. This traffic is punted to the l3 gateway where DNAT
>> > takes place. This traffic is also SNATed and eventually loops back to
>> > its destination. The SNAT is needed because we need the reverse traffic
>> > to go back to the l3 gateway and not short-circuit directly to the
>> source.
>> >
>> > This commit introduces 4 new logical actions.
>> > 1. ct_snat: To send the packet through SNAT zone to unSNAT packets.
>> > 2. ct_snat(IP): To SNAT to the provided IP address.
>> > 3. ct_dnat: To send the packet throgh DNAT zone to unDNAT packets.
>> > 4. ct_dnat(IP): To DNAT to the provided IP.
>> >
>> > This commit only provides the ability to do IP based NAT. This will
>> > eventually be enhanced to do PORT based NAT too.
>> >
>> > Command hints:
>> >
>> > Consider a distributed router "R1" that has switch foo (192.168.1.0/24)
>> > with a lport foo1 (192.168.1.2) and bar (192.168.2.0/24) with lport
>> bar1
>> > (192.168.2.2) connected to it. You connect "R1" to
>> > a gateway router "R2" via a switch "join" in (20.0.0.0/24) network.
>> >
>> > R2 has a switch "alice" (172.16.1.0/24) connected to it (to simulate
>> > external network).
>> >
>> > case: Add pure DNAT (north-south)
>> >
>> > Add a DNAT rule in R2:
>> > ovn-nbctl -- --id=@nat create nat type="dnat" logical_ip=192.168.1.2 \
>> > external_ip=30.0.0.2 -- add logical_router R2 nat @nat
>> >
>> > Now alice1 should be able to ping 192.168.1.2 via 30.0.0.2.
>> >
>> > case2 : Add pure SNAT (south-north)
>> >
>> > Add a SNAT rule in R2:
>> >
>> > ovn-nbctl -- --id=@nat create nat type="snat" logical_ip=192.168.2.2 \
>> > external_ip=30.0.0.1 -- add logical_router R2 nat @nat
>> >
>> > (You need a static route in R1 to send packets destined to outside
>> > world to go through R2. The logical_ip can be a subnet.)
>> >
>> > When bar1 pings alice1, alice1 receives traffic from 30.0.0.1
>> >
>> > case3 : SNAT and DNAT (east-west traffic)
>> >
>> > When bar1 pings 30.0.0.2, the traffic jumps to the gateway router
>> > and loops back to foo1 with a source ip address of 30.0.0.1
>> >
>> >
>> So, is 30.0.0.0/x network an external network that R2 has a port too?
>>
>
> The example above does not have that. In the above example 30.0.0.0/x is
> being treated as virtual address. But in a real setup (non-simulated), you
> are right. R2 will be connected to a 30.0.0.0/x network and will have a
> port in it. It will also have a static route (0.0.0.0/0) or a
> default_gateway to point to the physical router IP address as its next hop.
> (I have not tested it as I do not have a real setup at hand, but based on
> the simulation, it should ideally work.)
>
>
>> What is the next hop that R2 would use to reach a destination beyond
>> that subnet?
>>
> Answered above.
>

Ack!


>
>>
>> I think this may be clear when a test is added to ovn.at, which uses foo,
>> bar, join, alice
>>
> The unit tests do not have the ability to do conntrack NAT right now. I
> think we should add one once Daniele introduces NAT to usespace conntrack.
> But the unit test "ovn -- 2 HVs, 2 LRs connected via LS, gateway router"
> does something very similar (it has foo - R1 - join - R2 - alice).
>

Right, I saw that test and it makes perfect sense. Adding the 'bar' logical
switch, net 30.0.0.x and the nat rules are the few lines that it currently
does not have.


>
>>
>> Based on the code and my little test setup, there seems to be a high cost
>> for DNAT entries in that an ARP response rule will be added per DNAT x all
>> router ports.
>
> The intention was to add only on the router where DNAT entry is defined
> and not on all router ports of all routers. Is it not true? (If so, this is
> a bug. ). The for loop which adds this entry, only looks at that datapath's
> NAT entries.
>
> On the gateway router itself, there would be typically two DNAT entries.
> One of them connected to internal network (for east-west) and another one
> at external port (facing physical router).
>
>
Understood.


>
>
>> In the example used by the commit message, ingress table 1 of
>> the logical router will have arp response entries for inports alice and
>> R2_join.
>>
> Right. That is because as explained above, I need to do DNAT for both
> east-west as well as north-south. (It is very possible that I did not
> understand your concern)
>

Nah, you set me straight. If there were multiple internal subnets I imagine
we will need a DNAT
rule for each, since the response needs to be slightly different for each
router port. Not an issue, just an observation.


>
>
>>
>>
>> Table 3: do we really intend to apply the actions 'inport = ""; ct_dnat;'
>> to all ip packets that do not have an explicit dnat mapping?
>>
> Yes. This is a little tricky. I have tried to explain the rationale in a
> comment above. The general idea is that in a gateway router, there will be
> atleast one DNAT or SNAT entry. Otherwise, why have a gateway router? Also,
> a re-circulation is considered to be very expensive. What we want is to
> minimize re-circulations. With the code above, we have a minimum of
> one-recirculation no matter what and a maximum of two re-circulations. I
> have tried different ways to optimize it. There was a possibility of 3
> re-circulations as a worst case if I did not force the minimum one
> re-circulation. Probably there is a different way to optimize it (that I
> haven't thought about).
>
>
>
Thanks for the clarification. I don't know enough about the implications of
calling
the ct_dnat action, but I imagine that is just noise and -- like you point
out -- this is only in the
gateway router and saves on recirculations.



>
>
>>
>> SNAT: do we need ARP reply rules for the SNAT addresses, similar to the
>> ones added for DNAT?
>>
> I don't think we need ARP reply rules for SNAT entries. What is the use
> case?
>

This is likely a moot point in my part. It is just that because in my
example, the gateway
router did not have a port in the 30.0.0.x network. So it was not obvious
to me that if
it did, it would have the ARP response rule for it's own address, which is
masking the
internal ips for foo and bar. Sorry for not understanding that before
making the noise. :)


>
>>
>> SNAT: looking at the openflow table I see n mentioning of the address
>> added
>> to support SNAT. Ist that because that is all handled by connect_tracker
>> and there is nothing to be done via openflow? Or maybe part of another
>> patchset?
>>
>
> We do add SNAT specific rules. Search for S_ROUTER_IN_UNSNAT
> and S_ROUTER_OUT_SNAT.
>
>

Ack, I missed that in the egress datapath. *facepalm*



>
>> Thanks,
>>
>> -- flaviof
>>
>>
>>
>>
>> > Signed-off-by: Gurucharan Shetty <guru at ovn.org>
>>
>

Acked-by: Flavio Fernandes <flavio at flaviof.com>






> > ---
>> >  ovn/lib/actions.c           |  83 ++++++++++++++++++++
>> >  ovn/northd/ovn-northd.8.xml | 131 ++++++++++++++++++++++++++++---
>> >  ovn/northd/ovn-northd.c     | 187
>> > ++++++++++++++++++++++++++++++++++++++++++--
>> >  ovn/ovn-nb.ovsschema        |  19 ++++-
>> >  ovn/ovn-nb.xml              |  65 +++++++++++++--
>> >  ovn/ovn-sb.xml              |  41 ++++++++++
>> >  ovn/utilities/ovn-nbctl.c   |   5 ++
>> >  tests/ovn.at                |  17 ++++
>> >  8 files changed, 524 insertions(+), 24 deletions(-)
>> >
>> > diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c
>> > index 5f0bf19..4a486a0 100644
>> > --- a/ovn/lib/actions.c
>> > +++ b/ovn/lib/actions.c
>> > @@ -442,6 +442,85 @@ emit_ct(struct action_context *ctx, bool
>> recirc_next,
>> > bool commit)
>> >      add_prerequisite(ctx, "ip");
>> >  }
>> >
>> > +static void
>> > +parse_ct_nat(struct action_context *ctx, bool snat)
>> > +{
>> > +    const size_t ct_offset = ctx->ofpacts->size;
>> > +    ofpbuf_pull(ctx->ofpacts, ct_offset);
>> > +
>> > +    struct ofpact_conntrack *ct = ofpact_put_CT(ctx->ofpacts);
>> > +
>> > +    if (ctx->ap->cur_ltable < ctx->ap->n_tables) {
>> > +        ct->recirc_table = ctx->ap->first_ptable + ctx->ap->cur_ltable
>> +
>> > 1;
>> > +    } else {
>> > +        action_error(ctx,
>> > +                     "\"ct_[sd]nat\" action not allowed in last
>> table.");
>> > +        return;
>> > +    }
>> > +
>> > +    if (snat) {
>> > +        ct->zone_src.field = mf_from_id(MFF_LOG_SNAT_ZONE);
>> > +    } else {
>> > +        ct->zone_src.field = mf_from_id(MFF_LOG_DNAT_ZONE);
>> > +    }
>> > +    ct->zone_src.ofs = 0;
>> > +    ct->zone_src.n_bits = 16;
>> > +    ct->flags = 0;
>> > +    ct->alg = 0;
>> > +
>> > +    add_prerequisite(ctx, "ip");
>> > +
>> > +    struct ofpact_nat *nat;
>> > +    size_t nat_offset;
>> > +    nat_offset = ctx->ofpacts->size;
>> > +    ofpbuf_pull(ctx->ofpacts, nat_offset);
>> > +
>> > +    nat = ofpact_put_NAT(ctx->ofpacts);
>> > +    nat->flags = 0;
>> > +    nat->range_af = AF_UNSPEC;
>> > +
>> > +    int commit = 0;
>> > +    if (lexer_match(ctx->lexer, LEX_T_LPAREN)) {
>> > +        ovs_be32 ip;
>> > +        if (ctx->lexer->token.type == LEX_T_INTEGER
>> > +            && ctx->lexer->token.format == LEX_F_IPV4) {
>> > +            ip = ctx->lexer->token.value.ipv4;
>> > +        } else {
>> > +            action_syntax_error(ctx, "invalid ip");
>> > +            return;
>> > +        }
>> > +
>> > +        nat->range_af = AF_INET;
>> > +        nat->range.addr.ipv4.min = ip;
>> > +        if (snat) {
>> > +            nat->flags |= NX_NAT_F_SRC;
>> > +        } else {
>> > +            nat->flags |= NX_NAT_F_DST;
>> > +        }
>> > +        commit = NX_CT_F_COMMIT;
>> > +        lexer_get(ctx->lexer);
>> > +        if (!lexer_match(ctx->lexer, LEX_T_RPAREN)) {
>> > +            action_syntax_error(ctx, "expecting `)'");
>> > +            return;
>> > +        }
>> > +    }
>> > +
>> > +    ctx->ofpacts->header = ofpbuf_push_uninit(ctx->ofpacts,
>> nat_offset);
>> > +    ct = ctx->ofpacts->header;
>> > +    ct->flags |= commit;
>> > +
>> > +    /* XXX: For performance reasons, we try to prevent additional
>> > +     * recirculations.  So far, ct_snat which is used in a gateway
>> router
>> > +     * does not need a recirculation. ct_snat(IP) does need a
>> > recirculation.
>> > +     * Should we consider a method to let the actions specify whether a
>> > action
>> > +     * needs recirculation if there more use cases?. */
>> > +    if (!commit && snat) {
>> > +        ct->recirc_table = NX_CT_RECIRC_NONE;
>> > +    }
>> > +    ofpact_finish(ctx->ofpacts, &ct->ofpact);
>> > +    ofpbuf_push_uninit(ctx->ofpacts, ct_offset);
>> > +}
>> > +
>> >  static bool
>> >  parse_action(struct action_context *ctx)
>> >  {
>> > @@ -469,6 +548,10 @@ parse_action(struct action_context *ctx)
>> >          emit_ct(ctx, true, false);
>> >      } else if (lexer_match_id(ctx->lexer, "ct_commit")) {
>> >          emit_ct(ctx, false, true);
>> > +    } else if (lexer_match_id(ctx->lexer, "ct_dnat")) {
>> > +        parse_ct_nat(ctx, false);
>> > +    } else if (lexer_match_id(ctx->lexer, "ct_snat")) {
>> > +        parse_ct_nat(ctx, true);
>> >      } else if (lexer_match_id(ctx->lexer, "arp")) {
>> >          parse_arp_action(ctx);
>> >      } else if (lexer_match_id(ctx->lexer, "get_arp")) {
>> > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
>> > index 1983812..c237604 100644
>> > --- a/ovn/northd/ovn-northd.8.xml
>> > +++ b/ovn/northd/ovn-northd.8.xml
>> > @@ -517,11 +517,40 @@ next;
>> >
>> >        <li>
>> >          <p>
>> > -          Reply to ARP requests.  These flows reply to ARP requests for
>> > the
>> > -          router's own IP address.  For each router port <var>P</var>
>> > that owns
>> > -          IP address <var>A</var> and Ethernet address <var>E</var>, a
>> > -          priority-90 flow matches <code>inport == <var>P</var>
>> &amp;&amp;
>> > -          arp.op == 1 &amp;&amp; arp.tpa == <var>A</var></code> (ARP
>> > request)
>> > +          Reply to ARP requests.
>> > +        </p>
>> > +
>> > +        <p>
>> > +          These flows reply to ARP requests for the router's own IP
>> > address.
>> > +          For each router port <var>P</var> that owns IP address
>> > <var>A</var>
>> > +          and Ethernet address <var>E</var>, a priority-90 flow matches
>> > +          <code>inport == <var>P</var> &amp;&amp; arp.op == 1
>> &amp;&amp;
>> > +          arp.tpa == <var>A</var></code> (ARP request) with the
>> following
>> > +          actions:
>> > +        </p>
>> > +
>> > +        <pre>
>> > +eth.dst = eth.src;
>> > +eth.src = <var>E</var>;
>> > +arp.op = 2; /* ARP reply. */
>> > +arp.tha = arp.sha;
>> > +arp.sha = <var>E</var>;
>> > +arp.tpa = arp.spa;
>> > +arp.spa = <var>A</var>;
>> > +outport = <var>P</var>;
>> > +inport = ""; /* Allow sending out inport. */
>> > +output;
>> > +        </pre>
>> > +      </li>
>> > +
>> > +      <li>
>> > +        <p>
>> > +          These flows reply to ARP requests for the virtual IP
>> addresses
>> > +          configured in the router for DNAT. For a configured DNAT IP
>> > address
>> > +          <var>A</var>, for each router port <var>P</var> with Ethernet
>> > +          address <var>E</var>, a priority-90 flow matches
>> > +          <code>inport == <var>P</var> &amp;&amp; arp.op == 1
>> &amp;&amp;
>> > +          arp.tpa == <var>A</var></code> (ARP request)
>> >            with the following actions:
>> >          </p>
>> >
>> > @@ -663,7 +692,62 @@ icmp4 {
>> >        </li>
>> >      </ul>
>> >
>> > -    <h3>Ingress Table 2: IP Routing</h3>
>> > +    <h3>Ingress Table 2: UNSNAT</h3>
>> > +
>> > +    <p>
>> > +      This is for already established connections' reverse traffic.
>> > +      i.e., SNAT has already been done in egress pipeline and now the
>> > +      packet has entered the ingress pipeline as part of a reply.  It
>> is
>> > +      unSNATted here.
>> > +    </p>
>> > +
>> > +    <ul>
>> > +      <li>
>> > +        <p>
>> > +          For each configuration in the OVN Northbound database, that
>> asks
>> > +          to change the source IP address of a packet from
>> <var>A</var> to
>> > +          <var>B</var>, a priority-100 flow matches <code>ip &amp;&amp;
>> > +          ip4.dst == <var>B</var></code> with an action
>> > +          <code>ct_snat; next;</code>.
>> > +        </p>
>> > +
>> > +        <p>
>> > +          A priority-0 logical flow with match <code>1</code> has
>> actions
>> > +          <code>next;</code>.
>> > +        </p>
>> > +      </li>
>> > +    </ul>
>> > +
>> > +    <h3>Ingress Table 3: DNAT</h3>
>> > +
>> > +    <p>
>> > +      Packets enter the pipeline with destination IP address that
>> needs to
>> > +      be DNATted from a virtual IP address to a real IP address.
>> Packets
>> > +      in the reverse direction needs to be unDNATed.
>> > +    </p>
>> > +    <ul>
>> > +      <li>
>> > +        <p>
>> > +          For each configuration in the OVN Northbound database, that
>> asks
>> > +          to change the destination IP address of a packet from
>> > <var>A</var> to
>> > +          <var>B</var>, a priority-100 flow matches <code>ip &amp;&amp;
>> > +          ip4.dst == <var>A</var></code> with an action <code>inport =
>> "";
>> > +          ct_dnat(<var>B</var>);</code>.
>> > +        </p>
>> > +
>> > +        <p>
>> > +          For all IP packets of a Gateway router, a priority-50 flow
>> with
>> > an
>> > +          action <code>inport = ""; ct_dnat;</code>.
>> > +        </p>
>> > +
>> > +        <p>
>> > +          A priority-0 logical flow with match <code>1</code> has
>> actions
>> > +          <code>next;</code>.
>> > +        </p>
>> > +      </li>
>> > +    </ul>
>> > +
>> > +    <h3>Ingress Table 4: IP Routing</h3>
>> >
>> >      <p>
>> >        A packet that arrives at this table is an IP packet that should
>> be
>> > routed
>> > @@ -672,7 +756,7 @@ icmp4 {
>> >        <code>ip4.dst</code>, the packet's final destination, unchanged)
>> and
>> >        advances to the next table for ARP resolution.  It also sets
>> >        <code>reg1</code> to the IP address owned by the selected router
>> > port
>> > -      (which is used later in table 4 as the IP source address for an
>> ARP
>> > +      (which is used later in table 6 as the IP source address for an
>> ARP
>> >        request, if needed).
>> >      </p>
>> >
>> > @@ -743,7 +827,7 @@ icmp4 {
>> >        </li>
>> >      </ul>
>> >
>> > -    <h3>Ingress Table 3: ARP Resolution</h3>
>> > +    <h3>Ingress Table 5: ARP Resolution</h3>
>> >
>> >      <p>
>> >        Any packet that reaches this table is an IP packet whose
>> next-hop IP
>> > @@ -798,7 +882,7 @@ icmp4 {
>> >        </li>
>> >      </ul>
>> >
>> > -    <h3>Ingress Table 4: ARP Request</h3>
>> > +    <h3>Ingress Table 6: ARP Request</h3>
>> >
>> >      <p>
>> >        In the common case where the Ethernet destination has been
>> > resolved, this
>> > @@ -823,7 +907,7 @@ arp {
>> >          </pre>
>> >
>> >          <p>
>> > -          (Ingress table 2 initialized <code>reg1</code> with the IP
>> > address
>> > +          (Ingress table 4 initialized <code>reg1</code> with the IP
>> > address
>> >            owned by <code>outport</code>.)
>> >          </p>
>> >
>> > @@ -838,7 +922,32 @@ arp {
>> >        </li>
>> >      </ul>
>> >
>> > -    <h3>Egress Table 0: Delivery</h3>
>> > +    <h3>Egress Table 0: SNAT</h3>
>> > +
>> > +    <p>
>> > +      Packets that are configured to be SNATed get their source IP
>> address
>> > +      changed based on the configuration in the OVN Northbound
>> database.
>> > +    </p>
>> > +    <ul>
>> > +      <li>
>> > +        <p>
>> > +          For each configuration in the OVN Northbound database, that
>> asks
>> > +          to change the source IP address of a packet from an IP
>> address
>> > of
>> > +          <var>A</var> or to change the source IP address of a packet
>> that
>> > +          belongs to network <var>A</var> to <var>B</var>, a flow
>> matches
>> > +          <code>ip &amp;&amp; ip4.src == <var>A</var></code> with an
>> > action
>> > +          <code>ct_snat(<var>B</var>);</code>.  The priority of the
>> flow
>> > +          is calculated based on the mask of <var>A</var>, with matches
>> > +          having larger masks getting higher priorities.
>> > +        </p>
>> > +        <p>
>> > +          A priority-0 logical flow with match <code>1</code> has
>> actions
>> > +          <code>next;</code>.
>> > +        </p>
>> > +      </li>
>> > +    </ul>
>> > +
>> > +    <h3>Egress Table 1: Delivery</h3>
>> >
>> >      <p>
>> >        Packets that reach this table are ready for delivery.  It
>> contains
>> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> > index cac0148..4683780 100644
>> > --- a/ovn/northd/ovn-northd.c
>> > +++ b/ovn/northd/ovn-northd.c
>> > @@ -105,12 +105,15 @@ enum ovn_stage {
>> >      /* Logical router ingress stages. */                              \
>> >      PIPELINE_STAGE(ROUTER, IN,  ADMISSION,   0, "lr_in_admission")    \
>> >      PIPELINE_STAGE(ROUTER, IN,  IP_INPUT,    1, "lr_in_ip_input")     \
>> > -    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  2, "lr_in_ip_routing")   \
>> > -    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 3, "lr_in_arp_resolve")  \
>> > -    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 4, "lr_in_arp_request")  \
>> > +    PIPELINE_STAGE(ROUTER, IN,  UNSNAT,      2, "lr_in_unsnat")       \
>> > +    PIPELINE_STAGE(ROUTER, IN,  DNAT,        3, "lr_in_dnat")         \
>> > +    PIPELINE_STAGE(ROUTER, IN,  IP_ROUTING,  4, "lr_in_ip_routing")   \
>> > +    PIPELINE_STAGE(ROUTER, IN,  ARP_RESOLVE, 5, "lr_in_arp_resolve")  \
>> > +    PIPELINE_STAGE(ROUTER, IN,  ARP_REQUEST, 6, "lr_in_arp_request")  \
>> >                                                                        \
>> >      /* Logical router egress stages. */                               \
>> > -    PIPELINE_STAGE(ROUTER, OUT, DELIVERY,    0, "lr_out_delivery")
>> > +    PIPELINE_STAGE(ROUTER, OUT, SNAT,      0, "lr_out_snat")          \
>> > +    PIPELINE_STAGE(ROUTER, OUT, DELIVERY,  1, "lr_out_delivery")
>> >
>> >  #define PIPELINE_STAGE(DP_TYPE, PIPELINE, STAGE, TABLE, NAME)   \
>> >      S_##DP_TYPE##_##PIPELINE##_##STAGE                          \
>> > @@ -1998,6 +2001,51 @@ build_lrouter_flows(struct hmap *datapaths,
>> struct
>> > hmap *ports,
>> >          free(match);
>> >          free(actions);
>> >
>> > +        /* ARP handling for external IP addresses.
>> > +         *
>> > +         * DNAT IP addresses are external IP addresses that need ARP
>> > +         * handling. */
>> > +        for (int i = 0; i < op->od->nbr->n_nat; i++) {
>> > +            const struct nbrec_nat *nat;
>> > +
>> > +            nat = op->od->nbr->nat[i];
>> > +
>> > +            if(!strcmp(nat->type, "snat")) {
>> > +                continue;
>> > +            }
>> > +
>> > +            ovs_be32 ip;
>> > +            if (!ip_parse(nat->external_ip, &ip) || !ip) {
>> > +                static struct vlog_rate_limit rl =
>> > VLOG_RATE_LIMIT_INIT(5, 1);
>> > +                VLOG_WARN_RL(&rl, "bad ip address %s in dnat
>> > configuration "
>> > +                             "for router %s", nat->external_ip,
>> op->key);
>> > +                continue;
>> > +            }
>> > +
>> > +            match = xasprintf(
>> > +                "inport == %s && arp.tpa == "IP_FMT" && arp.op == 1",
>> > +                op->json_key, IP_ARGS(ip));
>> > +            actions = xasprintf(
>> > +                "eth.dst = eth.src; "
>> > +                "eth.src = "ETH_ADDR_FMT"; "
>> > +                "arp.op = 2; /* ARP reply */ "
>> > +                "arp.tha = arp.sha; "
>> > +                "arp.sha = "ETH_ADDR_FMT"; "
>> > +                "arp.tpa = arp.spa; "
>> > +                "arp.spa = "IP_FMT"; "
>> > +                "outport = %s; "
>> > +                "inport = \"\"; /* Allow sending out inport. */ "
>> > +                "output;",
>> > +                ETH_ADDR_ARGS(op->mac),
>> > +                ETH_ADDR_ARGS(op->mac),
>> > +                IP_ARGS(ip),
>> > +                op->json_key);
>> > +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90,
>> > +                          match, actions);
>> > +            free(match);
>> > +            free(actions);
>> > +        }
>> > +
>> >          /* Drop IP traffic to this router. */
>> >          match = xasprintf("ip4.dst == "IP_FMT, IP_ARGS(op->ip));
>> >          ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 60,
>> > @@ -2005,6 +2053,135 @@ build_lrouter_flows(struct hmap *datapaths,
>> struct
>> > hmap *ports,
>> >          free(match);
>> >      }
>> >
>> > +    /* NAT in Gateway routers. */
>> > +    HMAP_FOR_EACH (od, key_node, datapaths) {
>> > +        if (!od->nbr) {
>> > +            continue;
>> > +        }
>> > +
>> > +        /* Packets are allowed by default. */
>> > +        ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1", "next;");
>> > +        ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", "next;");
>> > +        ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;");
>> > +
>> > +        /* NAT rules are only valid on Gateway routers. */
>> > +        if (!smap_get(&od->nbr->options, "chassis")) {
>> > +            continue;
>> > +        }
>> > +
>> > +        for (int i = 0; i < od->nbr->n_nat; i++) {
>> > +            const struct nbrec_nat *nat;
>> > +
>> > +            nat = od->nbr->nat[i];
>> > +
>> > +            ovs_be32 ip, mask;
>> > +
>> > +            char *error = ip_parse_masked(nat->external_ip, &ip,
>> &mask);
>> > +            if (error || mask != OVS_BE32_MAX) {
>> > +                static struct vlog_rate_limit rl =
>> > VLOG_RATE_LIMIT_INIT(5, 1);
>> > +                VLOG_WARN_RL(&rl, "bad external ip %s for nat",
>> > +                             nat->external_ip);
>> > +                free(error);
>> > +                continue;
>> > +            }
>> > +
>> > +            /* Check the validity of nat->logical_ip. 'logical_ip' can
>> > +             * be a subnet when the type is "snat". */
>> > +            error = ip_parse_masked(nat->logical_ip, &ip, &mask);
>> > +            if (!strcmp(nat->type, "snat")) {
>> > +                if (error) {
>> > +                    static struct vlog_rate_limit rl =
>> > +                        VLOG_RATE_LIMIT_INIT(5, 1);
>> > +                    VLOG_WARN_RL(&rl, "bad ip network or ip %s for
>> snat "
>> > +                                 "in router "UUID_FMT"",
>> > +                                 nat->logical_ip, UUID_ARGS(&od->key));
>> > +                    free(error);
>> > +                    continue;
>> > +                }
>> > +            } else {
>> > +                if (error || mask != OVS_BE32_MAX) {
>> > +                    static struct vlog_rate_limit rl =
>> > +                        VLOG_RATE_LIMIT_INIT(5, 1);
>> > +                    VLOG_WARN_RL(&rl, "bad ip %s for dnat in router "
>> > +                        ""UUID_FMT"", nat->logical_ip,
>> > UUID_ARGS(&od->key));
>> > +                    free(error);
>> > +                    continue;
>> > +                }
>> > +            }
>> > +
>> > +
>> > +            char *match, *actions;
>> > +
>> > +            /* Ingress UNSNAT table: It is for already established
>> > connections'
>> > +             * reverse traffic. i.e., SNAT has already been done in
>> egress
>> > +             * pipeline and now the packet has entered the ingress
>> > pipeline as
>> > +             * part of a reply. We undo the SNAT here.
>> > +             *
>> > +             * Undoing SNAT has to happen before DNAT processing.
>> This is
>> > +             * because when the packet was DNATed in ingress pipeline,
>> it
>> > did
>> > +             * not know about the possibility of eventual additional
>> SNAT
>> > in
>> > +             * egress pipeline. */
>> > +            if (!strcmp(nat->type, "snat")
>> > +                || !strcmp(nat->type, "dnat_and_snat")) {
>> > +                match = xasprintf("ip && ip4.dst == %s",
>> > nat->external_ip);
>> > +                ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 100,
>> > +                              match, "ct_snat; next;");
>> > +                free(match);
>> > +            }
>> > +
>> > +            /* Ingress DNAT table: Packets enter the pipeline with
>> > destination
>> > +             * IP address that needs to be DNATted from a external IP
>> > address
>> > +             * to a logical IP address. */
>> > +            if (!strcmp(nat->type, "dnat")
>> > +                || !strcmp(nat->type, "dnat_and_snat")) {
>> > +                /* Packet when it goes from the initiator to
>> destination.
>> > +                 * We need to zero the inport because the router can
>> > +                 * send the packet back through the same interface. */
>> > +                match = xasprintf("ip && ip4.dst == %s",
>> > nat->external_ip);
>> > +                actions = xasprintf("inport = \"\"; ct_dnat(%s);",
>> > +                                    nat->logical_ip);
>> > +                ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100,
>> > +                           match, actions);
>> > +                free(match);
>> > +                free(actions);
>> > +            }
>> > +
>> > +            /* Egress SNAT table: Packets enter the egress pipeline
>> with
>> > +             * source ip address that needs to be SNATted to a
>> external ip
>> > +             * address. */
>> > +            if (!strcmp(nat->type, "snat")
>> > +                || !strcmp(nat->type, "dnat_and_snat")) {
>> > +                match = xasprintf("ip && ip4.src == %s",
>> nat->logical_ip);
>> > +                actions = xasprintf("ct_snat(%s);", nat->external_ip);
>> > +
>> > +                /* The priority here is calculated such that the
>> > +                 * nat->logical_ip with the longest mask gets a higher
>> > +                 * priority. */
>> > +                ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT,
>> > +                              count_1bits(ntohl(mask)) + 1, match,
>> > actions);
>> > +                free(match);
>> > +                free(actions);
>> > +            }
>> > +        }
>> > +
>> > +        /* Re-circulate every packet through the DNAT zone.
>> > +        * This helps with two things.
>> > +        *
>> > +        * 1. Any packet that needs to be unDNATed in the reverse
>> > +        * direction gets unDNATed. Ideally this could be done in
>> > +        * the egress pipeline. But since the gateway router
>> > +        * does not have any feature that depends on the source
>> > +        * ip address being external IP address for IP routing,
>> > +        * we can do it here, saving a future re-circulation.
>> > +        *
>> > +        * 2. Any packet that was sent through SNAT zone in the
>> > +        * previous table automatically gets re-circulated to get
>> > +        * back the new destination IP address that is needed for
>> > +        * routing in the openflow pipeline. */
>> > +        ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50,
>> > +                      "ip", "inport = \"\"; ct_dnat;");
>> > +    }
>> > +
>> >      /* Logical router ingress table 2: IP Routing.
>> >       *
>> >       * A packet that arrives at this table is an IP packet that should
>> be
>> > @@ -2205,7 +2382,7 @@ build_lrouter_flows(struct hmap *datapaths, struct
>> > hmap *ports,
>> >          ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_REQUEST, 0, "1",
>> > "output;");
>> >      }
>> >
>> > -    /* Logical router egress table 0: Delivery (priority 100).
>> > +    /* Logical router egress table 1: Delivery (priority 100).
>> >       *
>> >       * Priority 100 rules deliver packets to enabled logical ports. */
>> >      HMAP_FOR_EACH (op, key_node, ports) {
>> > diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema
>> > index fa21b30..ac6ca14 100644
>> > --- a/ovn/ovn-nb.ovsschema
>> > +++ b/ovn/ovn-nb.ovsschema
>> > @@ -1,7 +1,7 @@
>> >  {
>> >      "name": "OVN_Northbound",
>> > -    "version": "2.1.2",
>> > -    "cksum": "429668869 5325",
>> > +    "version": "2.1.3",
>> > +    "cksum": "3631923697 6121",
>> >      "tables": {
>> >          "Logical_Switch": {
>> >              "columns": {
>> > @@ -78,6 +78,11 @@
>> >                                     "max": "unlimited"}},
>> >                  "default_gw": {"type": {"key": "string", "min": 0,
>> "max":
>> > 1}},
>> >                  "enabled": {"type": {"key": "boolean", "min": 0, "max":
>> > 1}},
>> > +                "nat": {"type": {"key": {"type": "uuid",
>> > +                                         "refTable": "NAT",
>> > +                                         "refType": "strong"},
>> > +                                 "min": 0,
>> > +                                 "max": "unlimited"}},
>> >                  "options": {
>> >                       "type": {"key": "string",
>> >                                "value": "string",
>> > @@ -104,6 +109,16 @@
>> >                  "ip_prefix": {"type": "string"},
>> >                  "nexthop": {"type": "string"},
>> >                  "output_port": {"type": {"key": "string", "min": 0,
>> > "max": 1}}},
>> > +            "isRoot": false},
>> > +        "NAT": {
>> > +            "columns": {
>> > +                "external_ip": {"type": "string"},
>> > +                "logical_ip": {"type": "string"},
>> > +                "type": {"type": {"key": {"type": "string",
>> > +                                           "enum": ["set", ["dnat",
>> > +                                                             "snat",
>> > +
>> >  "dnat_and_snat"
>> > +                                                               ]]}}}},
>> >              "isRoot": false}
>> >      }
>> >  }
>> > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
>> > index 130b63b..36d1158 100644
>> > --- a/ovn/ovn-nb.xml
>> > +++ b/ovn/ovn-nb.xml
>> > @@ -631,18 +631,31 @@
>> >        router has all ingress and egress traffic dropped.
>> >      </column>
>> >
>> > +    <column name="nat">
>> > +      One or more NAT rules for the router. NAT rules only work on the
>> > +      Gateway routers.
>> > +    </column>
>> > +
>> >      <group title="Options">
>> >        <p>
>> >          Additional options for the logical router.
>> >        </p>
>> >
>> >        <column name="options" key="chassis">
>> > -        If set, indicates that the logical router in question is
>> > -        a Gateway router (which is centralized) and resides in the set
>> > -        chassis.  The same value is also used by
>> > <code>ovn-controller</code>
>> > -        to uniquely identify the chassis in the OVN deployment and
>> > -        comes from <code>external_ids:system-id</code> in the
>> > -        <code>Open_vSwitch</code> table of Open_vSwitch database.
>> > +        <p>
>> > +          If set, indicates that the logical router in question is a
>> > Gateway
>> > +          router (which is centralized) and resides in the set chassis.
>> > The
>> > +          same value is also used by <code>ovn-controller</code> to
>> > +          uniquely identify the chassis in the OVN deployment and
>> > +          comes from <code>external_ids:system-id</code> in the
>> > +          <code>Open_vSwitch</code> table of Open_vSwitch database.
>> > +        </p>
>> > +
>> > +        <p>
>> > +          The Gateway router can only be connected to a distributed
>> router
>> > +          via a switch if SNAT and DNAT are to be configured in the
>> > Gateway
>> > +          router.
>> > +        </p>
>> >        </column>
>> >      </group>
>> >
>> > @@ -765,4 +778,44 @@
>> >      </column>
>> >    </table>
>> >
>> > +  <table name="NAT" title="NAT rules for a Gateway router.">
>> > +    <p>
>> > +      Each record represents a NAT rule in a Gateway router.
>> > +    </p>
>> > +
>> > +    <column name="type">
>> > +      <p>Type of the NAT rule.</p>
>> > +      <ul>
>> > +        <li>
>> > +          When <ref column="type"/> is <code>dnat</code>, the
>> externally
>> > +          visible IP address <ref column="external_ip"/> is DNATted to
>> > the IP
>> > +          address <ref column="logical_ip"/> in the logical space.
>> > +        </li>
>> > +        <li>
>> > +          When <ref column="type"/> is <code>snat</code>, IP packets
>> > +          with their source IP address that either matches the IP
>> address
>> > +          in <ref column="logical_ip"/> or is in the network provided
>> by
>> > +          <ref column="logical_ip"/> is SNATed into the IP address in
>> > +          <ref column="external_ip"/>.
>> > +        </li>
>> > +        <li>
>> > +          When <ref column="type"/> is <code>dnat_and_snat</code>, the
>> > +          externally visible IP address <ref column="external_ip"/> is
>> > +          DNATted to the IP address <ref column="logical_ip"/> in the
>> > +          logical space. In addition, IP packets with the source IP
>> > +          address that matches <ref column="logical_ip"/> is SNATed
>> into
>> > +          the IP address in <ref column="external_ip"/>.
>> > +        </li>
>> > +      </ul>
>> > +    </column>
>> > +
>> > +    <column name="external_ip">
>> > +      An IPv4 address.
>> > +    </column>
>> > +
>> > +    <column name="logical_ip">
>> > +      An IPv4 network (e.g 192.168.1.0/24) or an IPv4 address.
>> > +    </column>
>> > +  </table>
>> > +
>> >  </database>
>> > diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
>> > index 1231b4e..5665871 100644
>> > --- a/ovn/ovn-sb.xml
>> > +++ b/ovn/ovn-sb.xml
>> > @@ -951,6 +951,47 @@
>> >            </p>
>> >          </dd>
>> >
>> > +        <dt><code>ct_dnat;</code></dt>
>> > +        <dt><code>ct_dnat(<var>IP</var>);</code></dt>
>> > +        <dd>
>> > +          <p>
>> > +            <code>ct_dnat</code> sends the packet through the DNAT
>> zone in
>> > +            connection tracking table to unDNAT any packet that was
>> > DNATed in
>> > +            the opposite direction.  The packet is then automatically
>> > sent to
>> > +            to the next tables as if followed by <code>next;</code>
>> > action.
>> > +            The next tables will see the changes in the packet caused
>> by
>> > +            the connection tracker.
>> > +          </p>
>> > +          <p>
>> > +            <code>ct_dnat(<var>IP</var>)</code> sends the packet
>> through
>> > the
>> > +            DNAT zone to change the destination IP address of the
>> packet
>> > to
>> > +            the one provided inside the parenthesis and commits the
>> > connection.
>> > +            The packet is then automatically sent to the next tables
>> as if
>> > +            followed by <code>next;</code> action.  The next tables
>> will
>> > see
>> > +            the changes in the packet caused by the connection tracker.
>> > +          </p>
>> > +        </dd>
>> > +
>> > +        <dt><code>ct_snat;</code></dt>
>> > +        <dt><code>ct_snat(<var>IP</var>);</code></dt>
>> > +        <dd>
>> > +          <p>
>> > +            <code>ct_snat</code> sends the packet through the SNAT
>> zone to
>> > +            unSNAT any packet that was SNATed in the opposite
>> direction.
>> > If
>> > +            the packet needs to be sent to the next tables, then it
>> > should be
>> > +            followed by a <code>next;</code> action.  The next tables
>> > will not
>> > +            see the changes in the packet caused by the connection
>> > tracker.
>> > +          </p>
>> > +          <p>
>> > +            <code>ct_snat(<var>IP</var>)</code> sends the packet
>> through
>> > the
>> > +            SNAT zone to change the source IP address of the packet to
>> > +            the one provided inside the parenthesis and commits the
>> > connection.
>> > +            The packet is then automatically sent to the next tables
>> as if
>> > +            followed by <code>next;</code> action.  The next tables
>> will
>> > see the
>> > +            changes in the packet caused by the connection tracker.
>> > +          </p>
>> > +        </dd>
>> > +
>> >          <dt><code>arp { <var>action</var>; </code>...<code>
>> };</code></dt>
>> >          <dd>
>> >            <p>
>> > diff --git a/ovn/utilities/ovn-nbctl.c b/ovn/utilities/ovn-nbctl.c
>> > index 321040e..b821307 100644
>> > --- a/ovn/utilities/ovn-nbctl.c
>> > +++ b/ovn/utilities/ovn-nbctl.c
>> > @@ -1449,6 +1449,11 @@ static const struct ctl_table_class tables[] = {
>> >         NULL},
>> >        {NULL, NULL, NULL}}},
>> >
>> > +    {&nbrec_table_nat,
>> > +     {{&nbrec_table_nat, NULL,
>> > +       NULL},
>> > +      {NULL, NULL, NULL}}},
>> > +
>> >      {NULL, {{NULL, NULL, NULL}, {NULL, NULL, NULL}}}
>> >  };
>> >
>> > diff --git a/tests/ovn.at b/tests/ovn.at
>> > index 633cf35..19d5c73 100644
>> > --- a/tests/ovn.at
>> > +++ b/tests/ovn.at
>> > @@ -507,6 +507,23 @@ ip.ttl => Syntax error at end of input expecting
>> `--'.
>> >  ct_next; => actions=ct(table=27,zone=NXM_NX_REG5[0..15]), prereqs=ip
>> >  ct_commit; => actions=ct(commit,zone=NXM_NX_REG5[0..15]), prereqs=ip
>> >
>> > +# dnat
>> > +ct_dnat; => actions=ct(table=27,zone=NXM_NX_REG3[0..15],nat),
>> prereqs=ip
>> > +ct_dnat(192.168.1.2); =>
>> >
>> actions=ct(commit,table=27,zone=NXM_NX_REG3[0..15],nat(dst=192.168.1.2)),
>> > prereqs=ip
>> > +ct_dnat(192.168.1.2, 192.168.1.3); => Syntax error at `,' expecting
>> `)'.
>> > +ct_dnat(foo); => Syntax error at `foo' invalid ip.
>> > +ct_dnat(foo, bar); => Syntax error at `foo' invalid ip.
>> > +ct_dnat(); => Syntax error at `)' invalid ip.
>> > +
>> > +# snat
>> > +ct_snat; => actions=ct(zone=NXM_NX_REG4[0..15],nat), prereqs=ip
>> > +ct_snat(192.168.1.2); =>
>> >
>> actions=ct(commit,table=27,zone=NXM_NX_REG4[0..15],nat(src=192.168.1.2)),
>> > prereqs=ip
>> > +ct_snat(192.168.1.2, 192.168.1.3); => Syntax error at `,' expecting
>> `)'.
>> > +ct_snat(foo); => Syntax error at `foo' invalid ip.
>> > +ct_snat(foo, bar); => Syntax error at `foo' invalid ip.
>> > +ct_snat(); => Syntax error at `)' invalid ip.
>> > +
>> > +
>> >  # arp
>> >  arp { eth.dst = ff:ff:ff:ff:ff:ff; output; }; =>
>> >
>> actions=controller(userdata=00.00.00.00.00.00.00.00.00.19.00.10.80.00.06.06.ff.ff.ff.ff.ff.ff.00.00.ff.ff.00.10.00.00.23.20.00.0e.ff.f8.40.00.00.00),
>> > prereqs=ip4
>> >
>> > --
>> > 1.9.1
>> >
>> >
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
>>
>
>



More information about the dev mailing list