[ovs-dev] [L3 In-Band 9/9] in-band: Implement L3-based in-band control

Justin Pettit jpettit at nicira.com
Tue Sep 1 21:52:51 UTC 2009


On Sep 1, 2009, at 10:10 AM, Ben Pfaff wrote:

> Justin Pettit <jpettit at nicira.com> writes:
>
>> Previously, in-band control was L2-based.  This worked well when the
>> controller was on the same network segment as the switch.  However,  
>> many
>> configuration are not setup this way.  These changes allow a switch  
>> and
>
> "configurations".  Also, "setup" is a noun; "set up" is the verb.

Man, I could have been in trouble if I played Scrabble and someone  
challenged me to use "setup" in a sentence and I gave the example, "I  
setup the stereo for my friend."  Thanks!

>> controller to be on different subnets.
>>
>> This set of changes also fixes some problems related to passing DHCP
>> traffic as described in Bug #1618.
>
> Could you add something to the change log to indicate that you
> are going to add further explanations?  Then the reader of the
> change log will know to look for them.

Okay, done.

> This code has changed in master, so the merge may be a bit of
> trouble.

Yeah, I agree.

>> +    char dev_name[IFNAMSIZ];
>
> I noticed that you used IFNAMSIZ here and IF_NAMESIZE earlier.
> It would be better to use just one of them consistently.
> IFNAMSIZ is Linux-specific, IF_NAMESIZE is in POSIX, so the
> latter is a better choice.

This problem went away with the change in netdev_get_next_hop(), which  
returns the device name as a pointer to a newly allocated string.  I  
didn't realize that one was Linux-specific and the other POSIX,  
though, so thank you.

>> +    if (now >= ib->next_remote_refresh) {
>> +        c_in4.s_addr = ib->controller_ip;
>> +        memset(ib->remote_mac, 0, sizeof ib->remote_mac);
>> +        retval = netdev_get_next_hop(&c_in4, &r_in4, dev_name);
>> +        if (retval) {
>> +            VLOG_WARN("cannot find route for controller  
>> ("IP_FMT"): %s",
>> +                    IP_ARGS(&ib->controller_ip), strerror(retval));
>> +            return NULL;
>
> I think that this will fail to update next_remote_refresh in the
> failure case, so that in that case we will try to refresh every
> time we arrive here until we succeed.  Was that change
> intentional?  I was trying to reduce the throttle the number of
> potentially expensive system calls here.

Thanks.  I made it just back off for one second.

>> +        have_mac = !eth_addr_is_zero(ib->remote_mac);
>> +
>> +        if (have_mac
>> +                && memcmp(ib->last_remote_mac, ib->remote_mac,  
>> ETH_ADDR_LEN)) {
>
> It would be nice to use !eth_addr_equals(ib->last_remote_mac,
> ib->remote_mac) here.  (I realize that you only moved this code
> around.)

What do you think of adding Lolcat variable name requirement in the  
Coding Style document?  For example, "have_mac" would be changed to  
"i_has_mac".  Well, think about it.  Regardless, I made the requested  
change.

>> @@ -214,72 +206,163 @@ setup_flow(struct in_band *in_band, int  
>> rule_idx, const flow_t *flow,
>>     }
>> }
>>
>> +/* Returns true if 'packet' should be sent to the local port  
>> regardless
>> + * of the flow table. */
>> +bool
>> +in_band_msg_in_hook(struct in_band *in_band, const flow_t *flow,
>> +                    const struct ofpbuf *packet)
>> +{
>> +    if (!in_band) {
>> +        return false;
>> +    }
>> +
>> +    /* Regardless of how the flow table is configured, we want to be
>> +     * able to see replies to our DHCP requests. */
>> +    if ((flow->nw_proto == IP_TYPE_UDP)
>> +            && (flow->tp_src == htons(DHCP_SERVER_PORT))
>> +            && (flow->tp_dst == htons(DHCP_CLIENT_PORT))
>> +            && (packet->size >= DHCP_HEADER_LEN)) {
>> +        struct dhcp_header *dhcp = packet->l7;
>
> Shouldn't this check for dl_type == htons(ETH_TYPE_IP)?

I don't think it's strictly required, because flow_extract() (which  
ran before this), zeros out the flow structure.  However, I added it  
for clarity.  I also removed the extra parens, so if any kernel mavens  
peruse the code that they don't vomit.

> I think that the size test is ignoring the size of the Ethernet
> and IP headers.  And I'm kind of paranoid about using packet->l7
> without testing it for nonnull (even though I *think* it should
> always be nonnull here).
>
> How about:
>    && packet->l7) {
>        struct dhcp_header *dhcp;
>        const uint8_t *local_mac;
>
>        dhcp = ofpbuf_at(packet, (char *) packet->l7 - packet->data,
>                         sizeof *dhcp);
>        if (!dhcp) {
>            return false;
>        }
>
> Or you could test the test against the length of the L7 header
> directly if you like that better.

I went with your former suggestion and took out the old, improper  
length check.

>> +        const uint8_t *local_mac;
>> +
>> +        local_mac = get_local_mac(in_band);
>> +        if (!memcmp(dhcp->chaddr, local_mac, ETH_ADDR_LEN)) {
>
> eth_addr_equals() again.

Got it.  Thanks.

>> /* Returns true if the rule that would match 'flow' with 'actions' is
>> * allowed to be setup in the datapath. */
>> bool
>> in_band_rule_check(struct in_band *in_band, const flow_t *flow,
>>                   const struct odp_actions *actions)
>
> Hmm, is there a reason that we disable setting up a flow instead
> of adding a deliver-to-local-port action?  Probably, but I don't
> recall it.

I believe the reason was that there's no way to distinguish our DHCP  
traffic from others without doing L7 processing; DHCP uses IP  
broadcast and the only distinguishing data is in the CHADDR field of  
the payload.  To allow DHCP to work for other hosts (the general  
case), we would need to setup a "flood" action, which would then not  
allow the controller to apply policy restrictions on DHCP traffic.

>> +    /* Don't allow flows that would prevent DHCP replies from  
>> being seen
>> +     * by the local port. */
>> +    if ((flow->nw_proto == IP_TYPE_UDP)
>> +            && (flow->tp_src == htons(DHCP_SERVER_PORT))
>> +            && (flow->tp_dst == htons(DHCP_CLIENT_PORT))) {
>
> Also test dl_type?

Once again, I don't think this is strictly necessary, but I added it  
for clarity and future possible problems.

>> -    if (time_now() < MIN(in_band->next_refresh, in_band- 
>> >next_local_refresh)) {
>> +    if (now < MIN(in_band->next_remote_refresh, in_band- 
>> >next_local_refresh)) {
>
> MIN is such an ugly macro.  Now that we have time_now() factored
> out, let's rewrite this as
>    if (now < in_band->next_remote_refresh
>        && now < in_band->next_local_refresh)
> Even though it needs to be broken across lines, I still like it
> better.

Okay.

>> -    /* Switch traffic sent by the local port. */
>> +    /* Allow DHCP requests to be sent from the local port. */
>>     memset(&flow, 0, sizeof flow);
>>     flow.in_port = ODPP_LOCAL;
>> -    setup_flow(in_band, IBR_FROM_LOCAL_PORT, &flow, OFPFW_IN_PORT,
>> -               OFPP_NORMAL);
>> +    flow.dl_type = htons(ETH_TYPE_IP);
>> +    flow.nw_proto = IP_TYPE_UDP;
>> +    flow.tp_src = htons(DHCP_CLIENT_PORT);
>> +    flow.tp_dst = htons(DHCP_SERVER_PORT);
>> +    setup_flow(in_band, IBR_FROM_LOCAL_DHCP, &flow,
>> +               (OFPFW_IN_PORT | OFPFW_DL_TYPE | OFPFW_NW_PROTO
>> +                | OFPFW_TP_SRC | OFPFW_TP_DST), OFPP_NORMAL);
>
> Can we fix dl_src as local_mac here?

Probably.  Since we've bound it to the local port, I didn't know if  
was strictly necessary.  I've added it, though.

>> +    if (controller_ip) {
>> +        /* Allow ARP requests for the controller's IP. */
>>         memset(&flow, 0, sizeof flow);
>>         flow.dl_type = htons(ETH_TYPE_ARP);
>> -        memcpy(flow.dl_dst, eth_addr_broadcast, ETH_ADDR_LEN);
>> -        memcpy(flow.dl_src, controller_mac, ETH_ADDR_LEN);
>> -        setup_flow(in_band, IBR_ARP_FROM_CTL, &flow,
>> -                   OFPFW_DL_TYPE | OFPFW_DL_DST | OFPFW_DL_SRC,
>> +        flow.nw_proto = ARP_OP_REQUEST;
>> +        flow.nw_src = controller_ip;
>> +        setup_flow(in_band, IBR_TO_CTL_ARP, &flow,
>> +                   (OFPFW_DL_TYPE | OFPFW_NW_PROTO |  
>> OFPFW_NW_SRC_MASK),
>>                    OFPP_NORMAL);
>
> The comment on this flow setup is confusing, since an ARP request
> for the controller's IP will be sent *to* the controller (nw_dst
> == controller_ip), but the flow here comes *from the controller
> (nw_src == controller_ip).

Okay, I tried to clarify it.

>> +        /* Allow ARP replies from the controller's IP. */
>> +        memset(&flow, 0, sizeof flow);
>> +        flow.dl_type = htons(ETH_TYPE_ARP);
>> +        flow.nw_proto = ARP_OP_REPLY;
>> +        flow.nw_dst = controller_ip;
>> +        setup_flow(in_band, IBR_FROM_CTL_ARP, &flow,
>> +                   (OFPFW_DL_TYPE | OFPFW_NW_PROTO |  
>> OFPFW_NW_DST_MASK),
>> +                   OFPP_NORMAL);
>
> Imagine a comment similar to the previous one here, in reverse.

Ditto.

>> @@ -2123,6 +2123,13 @@ xlate_actions(const union ofp_action *in,  
>> size_t n_in,
>>     ctx.tags = tags ? tags : &no_tags;
>>     ctx.may_setup_flow = true;
>>     do_xlate_actions(in, n_in, &ctx);
>> +
>> +    /* Check with in-band control to see if we're allowed to setup  
>> this
>> +     * flow. */
>
> ...to "set up" this flow.

Shouldn't the function you wrote "setup_flow" be "set_up_flow"?  ;-)

>> +    if (!in_band_rule_check(ofproto->in_band, flow, out)) {
>> +        ctx.may_setup_flow = false;
>> +    }
>> +
>
>> @@ -3034,6 +3041,21 @@ handle_odp_msg(struct ofproto *p, struct  
>> ofpbuf *packet)
>>     struct ofpbuf payload;
>>     flow_t flow;
>>
>> +    payload.data = msg + 1;
>> +    payload.size = msg->length - sizeof *msg;
>> +    flow_extract(&payload, msg->port, &flow);
>> +
>> +    /* Check with in-band control to see if this packet should be  
>> sent
>> +     * to the local port regardless of the flow table. */
>> +    if (in_band_msg_in_hook(p->in_band, &flow, &payload)) {
>> +        union odp_action action;
>> +
>> +        memset(&action, 0, sizeof(action));
>> +        action.output.type = ODPAT_OUTPUT;
>> +        action.output.port = ODPP_LOCAL;
>> +        dpif_execute(&p->dpif, flow.in_port, &action, 1, &payload);
>> +    }
>
> I think that this can safely go below the _ODPL_ACTION_NR test,
> to avoid extracting the flow data in that case.  _ODPL_ACTION_NR
> can only be sent by a flow, and in the case we care about here we
> will never set up a flow.
>
>>     /* Handle controller actions. */
>>     if (msg->type == _ODPL_ACTION_NR) {
>>         COVERAGE_INC(ofproto_ctlr_action);

Sounds good.

> Otherwise, the code looks fine.  I have no idea whether it works,
> since it's magic, but as long as you think so...

Me, too.  I'm going to do some testing of these changes, and then I'll  
push.

Thanks for the excellent and detailed review!

--Justin






More information about the dev mailing list