[ovs-dev] [RFC recirc-fix] recirculation: Do not change handling of packets after recirculation.

Alex Wang alexw at nicira.com
Sun Dec 21 01:17:04 UTC 2014


Hey Yamamoto,

Sorry for this delayed reply,

Your concern is valid.  We will need to discuss it more the coming week,

For a correct fix that covers all cases, we are thinking about making
userspace restore the flow format of pre-recirc pkt so that the miss
handling of the recirc pkts could go through the same pipeline as the
pre-recirc pkt (and reach the recirc_ofproto from recv_ofproto)~

The above idea may require more careful design and implementation.
Our current priority is fixing the balance-tcp bond case first.  And we are
thinking it is very rare to use 'patch' port for other types of
recirculation.
Therefore, the setting of inport to OFPP_NONE is proposed as fix in the
short run.

Would like to hear your feedback about both the short/long fix~

Thanks,
Alex Wang,

On Thu, Dec 18, 2014 at 9:43 PM, YAMAMOTO Takashi <yamamoto at valinux.co.jp>
wrote:

> > Hey Yamamoto,
> >
> > Upon further offline discussion, we'd like to take the following approach
> > to fix the bug,
> >
> > - Keep a map between 'recirc_id' and corresponding 'ofproto_dpif',
> > - When handling misses of 'recirc'ed pkts, we find the 'recirc_ofproto'
> via
> >   lookup the map with 'recirc_id', and start the handing from that
> >   'recirc_ofproto'.
> > -  One special case is:
> >    If the 'recirc_ofproto' is not the same as the 'recv_ofproto'
> > ('ofproto_dpif'
> >    on which the 'recirc'ed pkts are received), we set the 'in_port' as
> >    OFPP_NONE, since the original 'in_port' is a port in the
> 'recv_ofproto'.
> >
> > At this time, there is only one such special case, which is a packet
> > received in bridge A, goes through the patch port, and output to a
> > balance-tcp
> > bond on bridge B.  And for this case, we do not care the in_port of
> > 'recirc'ed
> > packets.  So we just set it to OFPP_NONE.
> >
> > Do these make sense?
> >
> > The encoding of OpenFlow in_port in 'recirc_id' could work, but it is not
> > necessary.  So, we just go for this way and have a map,
>
> i agree it would work for bond.
> however, for non-bond cases, we need to restore the correct in_port.
> (the port in recirc_ofproto)
>
> YAMAMOTO Takashi
>
> >
> > I'm sending of a new RFC patch soon,
> >
> > Thanks,
> > Alex Wang,
> >
> > On Thu, Dec 18, 2014 at 9:06 PM, YAMAMOTO Takashi <
> yamamoto at valinux.co.jp>
> > wrote:
> >>
> >> i agree on all points.
> >>
> >> YAMAMOTO Takashi
> >>
> >> > Yes, I agree we need to encode (bridge, ofp_port).  recirc_id (the
> >> > upper 16 bit part) is globally unique, so bridge can be recovered by
> >> > maintaining recirc_id ->bridge mapping in the user space.
> >> >
> >> > I agree odp_port could be used as well. For the problem at hand, using
> >> > recirc_id seems to be simpler.
> >> >
> >> >
> >> > On Wed, Dec 17, 2014 at 11:11 PM, YAMAMOTO Takashi
> >> > <yamamoto at valinux.co.jp> wrote:
> >> >> i agree that the suggested patch was too bond specific.
> >> >>
> >> >> ofp port number is not enough.  you need to encode (bridge, ofp_port)
> >> >> pair in some way.  it's why i suggested to assign odp port numbers to
> >> >> every pseudo ports.
> >> >>
> >> >> YAMAMOTO Takashi
> >> >>
> >> >>> This is probably obvious: the bond internal flows now should only
> >> match 16 bits.
> >> >>>
> >> >>> On Wed, Dec 17, 2014 at 3:56 PM, Alex Wang <alexw at nicira.com>
> wrote:
> >> >>>> This sounds reasonable,
> >> >>>>
> >> >>>> I'll just use the 'ofport' in the 'recirc_id' instead of
> 'OFPP_LOCAL'
> >> >>>> in my implementation.
> >> >>>>
> >> >>>> Thanks,
> >> >>>> Alex Wang,
> >> >>>>
> >> >>>> On Wed, Dec 17, 2014 at 3:44 PM, Andy Zhou <azhou at nicira.com>
> wrote:
> >> >>>>>
> >> >>>>> This solution seems too bond specific.  How about the following:
> >> >>>>>
> >> >>>>> We use the top 16 bits of the recirc_id, as how we use it today.
> For
> >> >>>>> example, in bond implementation, it represents a bond port.
> >> >>>>>
> >> >>>>> The lower 16 bits can then be used to represent OF port number
> within
> >> >>>>> the bridge. This representation is opaque to the kernel,  Since we
> >> >>>>> only use 16 bits to represent OF port in user space, it can
> >> >>>>> be simply 1 to 1 mapping for the user space.
> >> >>>>>
> >> >>>>>
> >> >>>>> On Wed, Dec 17, 2014 at 1:19 PM, Alex Wang <alexw at nicira.com>
> wrote:
> >> >>>>> > After commit 0c7812e5e (recirculation: Do not drop packet when
> >> >>>>> > there is no match from internal table.), if flow keys are
> modified
> >> >>>>> > before the recirculation action (e.g. set vlan ID), the miss
> >> >>>>> > handling of recirculated packets may take different path than
> the
> >> >>>>> > ones.
> >> >>>>> >
> >> >>>>> > This commit adds an unittest that captures this bug.  Moreover,
> >> >>>>> > to solve this bug, this commit makes OVS directly input the
> >> >>>>> > recirculated packets from local port of the bridge that owns
> >> >>>>> > the correpsonding bond.  Thus, those packets are always handled
> >> >>>>> > from the correct bridge.
> >> >>>>> >
> >> >>>>> > Signed-off-by: Alex Wang <alexw at nicira.com>
> >> >>>>> > ---
> >> >>>>> >  ofproto/bond.c               |   26 +++++++++++++++++++
> >> >>>>> >  ofproto/bond.h               |    2 ++
> >> >>>>> >  ofproto/ofproto-dpif-xlate.c |   20 ++++++++++++--
> >> >>>>> >  tests/ofproto-dpif.at        |   59
> >> >>>>> > ++++++++++++++++++++++++++++++++++++++++++
> >> >>>>> >  4 files changed, 105 insertions(+), 2 deletions(-)
> >> >>>>> >
> >> >>>>> > diff --git a/ofproto/bond.c b/ofproto/bond.c
> >> >>>>> > index c4b3a3a..08900fa 100644
> >> >>>>> > --- a/ofproto/bond.c
> >> >>>>> > +++ b/ofproto/bond.c
> >> >>>>> > @@ -254,6 +254,32 @@ bond_ref(const struct bond *bond_)
> >> >>>>> >      return bond;
> >> >>>>> >  }
> >> >>>>> >
> >> >>>>> > +struct ofproto_dpif *
> >> >>>>> > +bond_get_ofproto(const struct bond *bond)
> >> >>>>> > +{
> >> >>>>> > +    return bond->ofproto;
> >> >>>>> > +}
> >> >>>>> > +
> >> >>>>> > +/* Searches bond that uses 'recirc_id'.  Returns an
> 'ref_count'ed
> >> >>>>> > + * reference to 'bond'. */
> >> >>>>> > +struct bond *
> >> >>>>> > +bond_find_by_recirc_id(const uint32_t recirc_id)
> >> >>>>> > +{
> >> >>>>> > +    struct bond *bond;
> >> >>>>> > +
> >> >>>>> > +    ovs_rwlock_rdlock(&rwlock);
> >> >>>>> > +    HMAP_FOR_EACH (bond, hmap_node, all_bonds) {
> >> >>>>> > +        if (bond->recirc_id == recirc_id) {
> >> >>>>> > +            bond_ref(bond);
> >> >>>>> > +            ovs_rwlock_unlock(&rwlock);
> >> >>>>> > +            return bond;
> >> >>>>> > +        }
> >> >>>>> > +    }
> >> >>>>> > +    ovs_rwlock_unlock(&rwlock);
> >> >>>>> > +
> >> >>>>> > +    return NULL;
> >> >>>>> > +}
> >> >>>>> > +
> >> >>>>> >  /* Frees 'bond'. */
> >> >>>>> >  void
> >> >>>>> >  bond_unref(struct bond *bond)
> >> >>>>> > diff --git a/ofproto/bond.h b/ofproto/bond.h
> >> >>>>> > index c7b6308..43600df 100644
> >> >>>>> > --- a/ofproto/bond.h
> >> >>>>> > +++ b/ofproto/bond.h
> >> >>>>> > @@ -68,6 +68,8 @@ struct bond *bond_create(const struct
> >> bond_settings *,
> >> >>>>> >                           struct ofproto_dpif *ofproto);
> >> >>>>> >  void bond_unref(struct bond *);
> >> >>>>> >  struct bond *bond_ref(const struct bond *);
> >> >>>>> > +struct bond *bond_find_by_recirc_id(const uint32_t recirc_id);
> >> >>>>> > +struct ofproto_dpif *bond_get_ofproto(const struct bond*);
> >> >>>>> >
> >> >>>>> >  bool bond_reconfigure(struct bond *, const struct bond_settings
> >> *);
> >> >>>>> >  void bond_slave_register(struct bond *, void *slave_,
> ofp_port_t
> >> >>>>> > ofport, struct netdev *);
> >> >>>>> > diff --git a/ofproto/ofproto-dpif-xlate.c
> >> b/ofproto/ofproto-dpif-xlate.c
> >> >>>>> > index c1327a6..dd0f329 100644
> >> >>>>> > --- a/ofproto/ofproto-dpif-xlate.c
> >> >>>>> > +++ b/ofproto/ofproto-dpif-xlate.c
> >> >>>>> > @@ -981,9 +981,25 @@ static struct ofproto_dpif *
> >> >>>>> >  xlate_lookup_ofproto_(const struct dpif_backer *backer, const
> >> struct
> >> >>>>> > flow *flow,
> >> >>>>> >                        ofp_port_t *ofp_in_port, const struct
> xport
> >> >>>>> > **xportp)
> >> >>>>> >  {
> >> >>>>> > -    const struct xport *xport;
> >> >>>>> > +    const struct xport *xport = NULL;
> >> >>>>> > +
> >> >>>>> > +    /* If 'flow->recirc_id' is set, uses the OFPP_LOCAL port of
> >> the
> >> >>>>> > +     * 'xbridge' that owns the bond.  Using 'OFPP_LOCAL'
> should be
> >> >>>>> > fine,
> >> >>>>> > +     * since the lookup only matches 'recirc_id' and
> 'dp_hash'. */
> >> >>>>> > +    if (flow->recirc_id) {
> >> >>>>> > +        struct bond *bond =
> >> bond_find_by_recirc_id(flow->recirc_id);
> >> >>>>> > +
> >> >>>>> > +        if (bond) {
> >> >>>>> > +            struct xlate_cfg *xcfg = ovsrcu_get(struct
> xlate_cfg
> >> *,
> >> >>>>> > &xcfgp);
> >> >>>>> > +            struct ofproto_dpif *ofproto =
> bond_get_ofproto(bond);
> >> >>>>> >
> >> >>>>> > -    *xportp = xport = xlate_lookup_xport(backer, flow);
> >> >>>>> > +            bond_unref(bond);
> >> >>>>> > +            *xportp = xport = get_ofp_port(xbridge_lookup(xcfg,
> >> >>>>> > ofproto),
> >> >>>>> > +                                           OFPP_LOCAL);
> >> >>>>> > +        }
> >> >>>>> > +    } else {
> >> >>>>> > +        *xportp = xport = xlate_lookup_xport(backer, flow);
> >> >>>>> > +    }
> >> >>>>> >
> >> >>>>> >      if (xport) {
> >> >>>>> >          if (ofp_in_port) {
> >> >>>>> > diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
> >> >>>>> > index baa942f..5e4e245 100644
> >> >>>>> > --- a/tests/ofproto-dpif.at
> >> >>>>> > +++ b/tests/ofproto-dpif.at
> >> >>>>> > @@ -144,6 +144,65 @@ AT_CHECK([ovs-appctl dpif/dump-flows br1
> >> |grep tcp
> >> >>>>> > > br1_flows.txt])
> >> >>>>> >  AT_CHECK([test `grep in_port.4 br1_flows.txt |wc -l` -gt 24])
> >> >>>>> >  AT_CHECK([test `grep in_port.5 br1_flows.txt |wc -l` -gt 24])
> >> >>>>> >  AT_CHECK([test `grep in_port.6 br1_flows.txt |wc -l` -gt 24])
> >> >>>>> > +
> >> >>>>> > +OVS_VSWITCHD_STOP()
> >> >>>>> > +AT_CLEANUP
> >> >>>>> > +
> >> >>>>> > +# Makes sure recirculation does not change the way packet is
> >> handled.
> >> >>>>> > +AT_SETUP([ofproto-dpif, balance-tcp bonding, different recirc
> >> flow ])
> >> >>>>> > +OVS_VSWITCHD_START(
> >> >>>>> > +  [add-bond br0 bond0 p1 p2 bond_mode=balance-tcp lacp=active \
> >> >>>>> > +       other-config:lacp-time=fast
> >> >>>>> > other-config:bond-rebalance-interval=0 -- \
> >> >>>>> > +   set interface p1 type=dummy
> >> >>>>> > options:pstream=punix:$OVS_RUNDIR/p1.sock ofport_request=1 -- \
> >> >>>>> > +   set interface p2 type=dummy
> >> >>>>> > options:pstream=punix:$OVS_RUNDIR/p2.sock ofport_request=2 -- \
> >> >>>>> > +   add-br br1 -- \
> >> >>>>> > +   set bridge br1 other-config:hwaddr=aa:66:aa:66:00:00 -- \
> >> >>>>> > +   set bridge br1 datapath-type=dummy
> >> other-config:datapath-id=1234 \
> >> >>>>> > +       fail-mode=standalone -- \
> >> >>>>> > +   add-bond br1 bond1 p3 p4 bond_mode=balance-tcp lacp=active \
> >> >>>>> > +       other-config:lacp-time=fast
> >> >>>>> > other-config:bond-rebalance-interval=0 -- \
> >> >>>>> > +   set interface p3 type=dummy
> >> options:stream=unix:$OVS_RUNDIR/p1.sock
> >> >>>>> > ofport_request=3 -- \
> >> >>>>> > +   set interface p4 type=dummy
> >> options:stream=unix:$OVS_RUNDIR/p2.sock
> >> >>>>> > ofport_request=4 -- \
> >> >>>>> > +   add-port br1 br1- -- set interface br1- type=patch
> >> options:peer=br1+
> >> >>>>> > ofport_request=100 -- \
> >> >>>>> > +   set port br1- tag=1 -- \
> >> >>>>> > +   add-br br-int -- \
> >> >>>>> > +   set bridge br-int other-config:hwaddr=aa:77:aa:77:00:00 -- \
> >> >>>>> > +   set bridge br-int datapath-type=dummy
> >> other-config:datapath-id=1235
> >> >>>>> > \
> >> >>>>> > +       fail-mode=secure -- \
> >> >>>>> > +   add-port br-int br1+ -- set interface br1+ type=patch
> >> >>>>> > options:peer=br1- ofport_request=101 -- \
> >> >>>>> > +   add-port br-int p5 -- set interface p5 ofport_request=5
> >> type=dummy
> >> >>>>> > +])
> >> >>>>> > +AT_CHECK([ovs-appctl netdev-dummy/set-admin-state up], 0, [OK
> >> >>>>> > +])
> >> >>>>> > +
> >> >>>>> > +# The dl_vlan flow should not be ever matched,
> >> >>>>> > +# since recirculation should not change the flow handling.
> >> >>>>> > +AT_DATA([flows.txt], [dnl
> >> >>>>> > +table=0 priority=1 in_port=5 actions=output(101)
> >> >>>>> > +table=0 priority=2 in_port=5 dl_vlan=1 actions=drop
> >> >>>>> > +])
> >> >>>>> > +AT_CHECK([ovs-ofctl add-flows br-int flows.txt])
> >> >>>>> > +
> >> >>>>> > +# Sends a packet to trigger recirculation.
> >> >>>>> > +# Should generate recirc_id(0x12d),dp_hash(0xc1261ba2/0xff).
> >> >>>>> > +AT_CHECK([ovs-appctl netdev-dummy/receive p5
> >> >>>>> >
> >>
> "in_port(5),eth(src=50:54:00:00:00:05,dst=50:54:00:00:01:00),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1)"])
> >> >>>>> > +
> >> >>>>> > +# Collects flow stats.
> >> >>>>> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> >> >>>>> > +
> >> >>>>> > +# Checks the flow stats in br1, should only be one flow with
> >> non-zero
> >> >>>>> > +# 'n_packets' from internal table.
> >> >>>>> > +AT_CHECK([ovs-appctl bridge/dump-flows br1 | ofctl_strip |
> grep --
> >> >>>>> > "n_packets" | grep -- "table_id" | sed -e
> >> 's/output:[[0-9]]\+/output/'],
> >> >>>>> > [0], [dnl
> >> >>>>> > +table_id=254, n_packets=1, n_bytes=64,
> >> >>>>> > priority=20,recirc_id=0x12d,dp_hash=0xa2/0xff,actions=output
> >> >>>>> > +])
> >> >>>>> > +
> >> >>>>> > +# Checks the flow stats in br-int, should be only one match.
> >> >>>>> > +AT_CHECK([ovs-ofctl dump-flows br-int | ofctl_strip | sort],
> [0],
> >> [dnl
> >> >>>>> > + n_packets=1, n_bytes=60, priority=1,in_port=5
> actions=output:101
> >> >>>>> > + priority=2,in_port=5,dl_vlan=1 actions=drop
> >> >>>>> > +NXST_FLOW reply:
> >> >>>>> > +])
> >> >>>>> > +
> >> >>>>> >  OVS_VSWITCHD_STOP()
> >> >>>>> >  AT_CLEANUP
> >> >>>>> >
> >> >>>>> > --
> >> >>>>> > 1.7.9.5
> >> >>>>> >
> >> >>> _______________________________________________
> >> >>> dev mailing list
> >> >>> dev at openvswitch.org
> >> >>> http://openvswitch.org/mailman/listinfo/dev
> >>
>



More information about the dev mailing list