[ovs-dev] [OVN] Potential scalability bug in ovn-northd on creating and binding large number of lports

Hui Kang kangh at us.ibm.com
Mon Jun 27 01:08:47 UTC 2016



Ryan Moats/Omaha/IBM wrote on 06/26/2016 08:53:19 PM:

> From: Ryan Moats/Omaha/IBM
> To: Hui Kang/Watson/IBM at IBMUS
> Cc: Ben Pfaff <blp at ovn.org>, dev at openvswitch.org
> Date: 06/26/2016 08:53 PM
> Subject: Re: [ovs-dev] [OVN] Potential scalability bug in ovn-northd
> on creating and binding large number of lports
>
> Hui Kang/Watson/IBM wrote on 06/26/2016 07:11:27 PM:
>
> > From: Hui Kang/Watson/IBM
> > To: Ryan Moats/Omaha/IBM at IBMUS
> > Cc: Ben Pfaff <blp at ovn.org>, dev at openvswitch.org
> > Date: 06/26/2016 07:11 PM
> > Subject: Re: [ovs-dev] [OVN] Potential scalability bug in ovn-northd
> > on creating and binding large number of lports
> >
> > Ryan Moats/Omaha/IBM wrote on 06/25/2016 09:07:39 PM:
> >
> > > From: Ryan Moats/Omaha/IBM
> > > To: Hui Kang/Watson/IBM at IBMUS
> > > Cc: Ben Pfaff <blp at ovn.org>, dev at openvswitch.org
> > > Date: 06/25/2016 09:07 PM
> > > Subject: Re: [ovs-dev] [OVN] Potential scalability bug in ovn-northd
> > > on creating and binding large number of lports
> > >
> > > Hui Kang/Watson/IBM wrote on 06/25/2016 07:53:36 PM:
> > >
> > > > From: Hui Kang/Watson/IBM
> > > > To: Ryan Moats/Omaha/IBM at IBMUS
> > > > Cc: Ben Pfaff <blp at ovn.org>, dev at openvswitch.org
> > > > Date: 06/25/2016 07:53 PM
> > > > Subject: Re: [ovs-dev] [OVN] Potential scalability bug in
ovn-northd
> > > > on creating and binding large number of lports
> > > >
> > > > > >
> > > > > > Actually, I take that back.  The cycles/port for all the cases
above
> > > > > > demonstrate only slightly nonlinear scaling: 200/25 is 8
> Mcycles/port,
> > > > > > 1200/125 is 9.6 Mcycles/port.
> > > > > >
> > > > > > So the issue is not that it does not scale.  The issue is that
it is
> > > > > > slow.
> > > > >
> > > > > Er? When I do the ratios, I come up with 125 Kcycles/port at 200
> > > ports going
> > > > > down to slightly more than 104 Kcycles/port at 1200 ports, which
> > > is slightly
> > > > > sub-linear (and I do think that's a good thing).
> > > > >
> > > > > However, I'm left wondering if it would be possible to make
> things even
> > > > > better through judicial use of persistence and incremental
processing.
> > > > >
> > > > > Right now the ports logic looks to me like:
> > > > > - Build a list of all ports known via port bindings in the sb db.
> > > > > - For each port known via the nb db:
> > > > >   - Look for the port in the sb list.
> > > > >   - If found, move the port from the sb list to the both list
> > > > >   - If not found, create a new entry in the nb_only list.
> > > > > (After the above finishes, we have three lists: sb_only,
> > > nb_only, and both)
> > > > > - For each entry in the both list, do modifications to align the
port
> > > > >   binding with nb information.
> > > > > - For each entry in the nb_only list, create port_binding
> > information in
> > > > >   the sb db.
> > > > >   [If I were updating the port lists, I'd move the port from
> the nb_only
> > > > >   list to both list]
> > > > > - For each entry in the sb_only list, remove from the
> > port_binding table.
> > > > >   [If I were updating the sb_only list, I'd remove it from the
sb_only
> > > > >   list]
> > > >
> > > > Hi, Ryan
> > > > Thanks for drafting the pseudo-code.
> > > > Please allow me to add number bullets in your original version to
> > > accommodate
> > > > further discussions.
> > >
> > > That's fine, I updated from sb list to sb_only to be more clear as
well
> > >
> > > >
> > > > 1. Build a list of all ports known via port bindings in the sb_only
db.
> > > > 2. For each port known via the nb db:
> > > >    2.1 Look for the port in the sb_only list.
> > > >    2.2 If found, move the port from the sb_only list to the both
list
> > > >    2.3 If not found, create a new entry in the nb_only list.
> > > > (After the above finishes, we have three lists: sb_only,
> > nb_only, and both)
> > > > 3. For each entry in the both list, do modifications to align the
port
> > > >    binding with nb information.
> > > > 4. For each entry in the nb_only list, create port_binding
> information in
> > > >    the sb db.
> > > >    [If I were updating the port lists, I'd move the port from
> the nb_only
> > > >    list to both list]
> > > > 5. For each entry in the sb_only list, remove from the
> port_binding table.
> > > >    [If I were updating the sb_only list, I'd remove it from the
sb_only
> > > >    list]
> > > >
> > > > In square bracket of step 4., do you mean "If I were updating the
> > > nb_lists in
> > > > step 2.3.,  ..."?
> > >
> > > No, that is part of the "if I were going to persist all the port
lists,
> > > what would I need to do"
> > >
> > > > Similarly, in step 5, do you mean "If I were updating the
> sb_only list in
> > > > step 2.2,..."?
> > >
> > > Ditto the above explanation.
> > >
> > > > In my opinion, step 4 and step 5 could be avoided with your
> > logic in square
> > > > bracket. Is my understanding correct?
> > >
> > > No, as those both still need to be performed whether I persist
> theport lists
> > > in ovn-northd or not.
> > >
> > > > >
> > > > > I *think* if I were to consider persisting the sb_only,
> > nb_only, and both
> > > > > lists and follow the extra logic I've added in square
> bracketsabove, I'd
> > > > > only have entries in the both list at the end of the
> > calculationset, so I
> > > > > should only need to persist the both table.
> > > >
> > > > What do you mean by "persisting"? A global linked list to store
> > the elements
> > > > of struct ovn_ports?
> > >
> > > That's exactly what I mean. I'm looking at trading memory for
> > execution time.
> > >
> > > > > Further, I *think* if I were to then apply change tracking
> to the first
> > > > > part of the process above, the logic changes to:
> > > >
> > > > Which step of the above pseudo-code should the following code be
> > > > embedded into ?
> > >
> > > The following replaces the entire list above. The good thing about
writing
> > > this down is that I can come back to it later and realize where I
goofed -
> > > see below.
> > >
> > > > >
> > > > > - For each tracked entry in the port bindings table
> >
> > Is this really a For loop? Since northd is monitoring the chassis
column of
> > southbound database, I think the above For loop are actually OVSDB
> > "notification" events. Therefore, when the the both list is persisted,
> > there is no need to iterate all entries in the port_binding and
> logical_switch
> > table, thereby cutting down the processing time.
> >
> > So the logic for the For loop could be elaborated as follows:
> >
> >      while (! blocked)
> >          - json_rpc_recv(msg);
> >          - if (msg is trigged by Chassis column in southbound database)
> >              - sb := the entry in port_binding table of SB
> > triggering this event
> >              - if sb is an "inserted" entry, check for it in the both
list
> >                   - if it is not there, then add it to the sb_only list
> >              - if sb is a "modified" entry, find it in the both list
> > and update the
> >                sb information contained in the entry
> >
> >          else if (msg is trigged by Logical_swtich_port of
> > Northbound database)
> >              - (use the logic in the "For each port known via the nb
> > db" in your orignal
> >                 post)
> >
> > Is my understanding correct? Thanks.
> >
> > - Hui
>
> Honestly, I don't believe so.  This join_logical_port code currently
> looks at the Port_Binding table, not the Chassis table when building the
> sb_only list.  I'm thinking of just turning IDL change tracking on,
> which will give me access to Port_Binding rows that have changed in
> each cycle via the SBREC_PORT_BINDING_FOR_EACH_TRACKED macro.  It would

Thanks. I did not realize there is such macro to track the changes in the
porting_binding table from SB :)
Then there is no need to extrapolate the OVSDB notification RPCs.

- Hui

> have been more correct to say "For each *changed* entry," and the macro
> may actually expand to a while loop, but I'm trying to find the least
> invasive change that I think will address the hotspot, so I'm looking
> to retain as much of the current structure as possible.
>
> Ryan
>
> >
> > > > >   - if it is a deleted entry, remove from the both list (if
> > there is still
> > > > >     a nb entry, we'll recreate it further on)
> > > > >   - if it is a new entry, add it to the sb_only list
> > >
> > > The above isn't quite right - since we create port binding entries
ourself
> > > in response to unmatched ports in the nb_only list, we need to check
that
> > > there isn't already a port in the both list. So the above changes to:
> > >
> > >       - if it is a new entry, check for it in the both list
> > >         - if it is not there, then add it to the sb_only list
> > >
> > > > >   - if it is a modified entry, find it in the both list and
> update the
> > > > >     sb information contained in the entry
> > > > > - For each port known via the nb db:
> > > > >   - if the entry is found in the both list, update the nb
> datacontained
> > > > >     in the entry
> > > > >   - if the entry is not in the both list, but is in the sb_only
list,
> > > > >     move the entry from the sb_list to the both list
> > > > >   - if the entry is not in either the both or the sb_only list,
create
> > > > >     a new entry in the nb_only list
> > > > > - For each entry in the both list, do modifications to align the
port
> > > > >   binding with nb information.
> > > > > - For each entry in the nb_only list, create port_binding
> > information in
> > > > >   the sb db and move the entry from the nb_only to the both list
> > > > > - For each entry in the sb_only list, remove from the
> > port_binding table.
> > > > >
> > > > > Now, I'm pretty sure this will cut down the number of
> cycles, but before
> > > > > I go off and code it [and potentially break something ala
yesterday's
> > > > > excitement], I'm looking for some verification of both my
> conclusion of
> > > > > persisting just the both list and the modified logic
incorporating the
> > > > > persisted both list and port binding change tracking
adjustments). Do
> > > > > these make sense or have I missed something?
> > >
> > > Ryan



More information about the dev mailing list