[ovs-git] [openvswitch/ovs] 16e3a8: ovsdb-server: Fix schema leak while reading db.

Thu May 28 16:57:09 UTC 2020

  Branch: refs/heads/master
  Home:   https://github.com/openvswitch/ovs
  Commit: 16e3a80cf646f6c53d22ef98599d5aecb8310414
      https://github.com/openvswitch/ovs/commit/16e3a80cf646f6c53d22ef98599d5aecb8310414
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-05-28 (Thu, 28 May 2020)

  Changed paths:
    M ovsdb/ovsdb-server.c

  Log Message:
  -----------
  ovsdb-server: Fix schema leak while reading db.

parse_txn() function doesn't always take ownership of the 'schema'
passed.  So, if the schema of the clustered db has same version as the
one that already in use, parse_txn() will not use it, resulting with a
memory leak:

 7,827 (56 direct, 7,771 indirect) bytes in 1 blocks are definitely lost
    at 0x483BB1A: calloc (vg_replace_malloc.c:762)
    by 0x44AD02: xcalloc (util.c:121)
    by 0x40E70E: ovsdb_schema_create (ovsdb.c:41)
    by 0x40EA6D: ovsdb_schema_from_json (ovsdb.c:217)
    by 0x415EDD: ovsdb_storage_read (storage.c:280)
    by 0x408968: read_db (ovsdb-server.c:607)
    by 0x40733D: main_loop (ovsdb-server.c:227)
    by 0x40733D: main (ovsdb-server.c:469)

While we could put ovsdb_schema_destroy() in a few places inside
'parse_txn()', from the users' point of view it seems better to have a
constant argument and just clone the 'schema' if needed.  The caller
will be responsible for destroying the 'schema' it owns.

Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered databases.")
Acked-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

  Commit: 8c2c503bdb0da1ce6044a53d462f905fd4f8acf5
      https://github.com/openvswitch/ovs/commit/8c2c503bdb0da1ce6044a53d462f905fd4f8acf5
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-05-28 (Thu, 28 May 2020)

  Changed paths:
    M ovsdb/raft-private.c
    M ovsdb/raft-private.h
    M ovsdb/raft.c

  Log Message:
  -----------
  raft: Avoid sending equal snapshots.

Snapshots are huge.  In some cases we could receive several outdated
append replies from the remote server.  This could happen in high
scale cases if the remote server is overloaded and not able to process
all the raft requests in time.  As an action to each outdated append
reply we're sending full database snapshot.  While remote server is
already overloaded those snapshots will stuck in jsonrpc backlog for
a long time making it grow up to few GB.  Since remote server wasn't
able to timely process incoming messages it will likely not able to
process snapshots leading to the same situation with low chances to
recover.  Remote server will likely stuck in 'candidate' state, other
servers will grow their memory consumption due to growing jsonrpc
backlogs:

jsonrpc|INFO|excessive sending backlog, jsonrpc: ssl:192.16.0.3:6644,
             num of msgs: 3795, backlog: 8838994624.

This patch is trying to avoid that situation by avoiding sending of
equal snapshot install requests.  This helps maintain reasonable memory
consumption and allows the cluster to recover on a larger scale.

Acked-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

  Commit: a6117059904bb692039c926221964dd6d49b3bfd
      https://github.com/openvswitch/ovs/commit/a6117059904bb692039c926221964dd6d49b3bfd
  Author: Eiichi Tsukata <eiichi.tsukata at nutanix.com>
  Date:   2020-05-28 (Thu, 28 May 2020)

  Changed paths:
    M lib/classifier.c
    M lib/classifier.h
    M tests/test-classifier.c

  Log Message:
  -----------
  classifier: Prevent tries vs n_tries race leading to NULL dereference.

Currently classifier tries and n_tries can be updated not atomically,
there is a race condition which can lead to NULL dereference.
The race can happen when main thread updates a classifier tries and
n_tries in classifier_set_prefix_fields() and at the same time revalidator
or handler thread try to lookup them in classifier_lookup__(). Such race
can be triggered when user changes prefixes of flow_table.

Race(user changes flow_table prefixes: ip_dst,ip_src => none):

  [main thread]             [revalidator/handler thread]
  ===========================================================
                            /* cls->n_tries == 2 */
                            for (int i = 0; i < cls->n_tries; i++) {
  trie_init(cls, i, NULL);
  /* n_tries == 0 */
  cls->n_tries = n_tries;
                            /* cls->tries[i]->feild is NULL */
                            trie_ctx_init(&trie_ctx[i],&cls->tries[i]);
                            /* trie->field is NULL */
                            ctx->be32ofs = trie->field->flow_be32ofs;

To prevent the race, instead of re-introducing internal mutex
implemented in the commit fccd7c092e09 ("classifier: Remove internal
mutex."), this patch makes trie field RCU protected and checks it after
read.

Fixes: fccd7c092e09 ("classifier: Remove internal mutex.")
Signed-off-by: Eiichi Tsukata <eiichi.tsukata at nutanix.com>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

  Commit: 21ad0088bedbd9b4070f4c5e3e6b29c7d9349beb
      https://github.com/openvswitch/ovs/commit/21ad0088bedbd9b4070f4c5e3e6b29c7d9349beb
  Author: Ilya Maximets <i.maximets at ovn.org>
  Date:   2020-05-28 (Thu, 28 May 2020)

  Changed paths:
    M AUTHORS.rst

  Log Message:
  -----------
  AUTHORS: Add Eiichi Tsukata.

Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

  Commit: 9ba57fc7cccca85a753bc3d5c12271defb5619c1
      https://github.com/openvswitch/ovs/commit/9ba57fc7cccca85a753bc3d5c12271defb5619c1
  Author: Han Zhou <hzhou at ovn.org>
  Date:   2020-05-28 (Thu, 28 May 2020)

  Changed paths:
    M acinclude.m4
    M datapath/datapath.c
    M datapath/datapath.h
    M datapath/linux/compat/include/linux/skbuff.h

  Log Message:
  -----------
  datapath: Add hash info to upcall.

This patch backports below upstream patches, and add __skb_set_hash
to compat for older kernels.

commit b5ab1f1be6180a2e975eede18731804b5164a05d
Author: Jakub Kicinski <kuba at kernel.org>
Date:   Mon Mar 2 21:05:18 2020 -0800

    openvswitch: add missing attribute validation for hash

    Add missing attribute validation for OVS_PACKET_ATTR_HASH
    to the netlink policy.

    Fixes: bd1903b7c459 ("net: openvswitch: add hash info to upcall")
    Signed-off-by: Jakub Kicinski <kuba at kernel.org>
    Reviewed-by: Greg Rose <gvrose8192 at gmail.com>
    Signed-off-by: David S. Miller <davem at davemloft.net>

commit bd1903b7c4596ba6f7677d0dfefd05ba5876707d
Author: Tonghao Zhang <xiangxia.m.yue at gmail.com>
Date:   Wed Nov 13 23:04:49 2019 +0800

    net: openvswitch: add hash info to upcall

    When using the kernel datapath, the upcall don't
    include skb hash info relatived. That will introduce
    some problem, because the hash of skb is important
    in kernel stack. For example, VXLAN module uses
    it to select UDP src port. The tx queue selection
    may also use the hash in stack.

    Hash is computed in different ways. Hash is random
    for a TCP socket, and hash may be computed in hardware,
    or software stack. Recalculation hash is not easy.

    Hash of TCP socket is computed:
    tcp_v4_connect
        -> sk_set_txhash (is random)

    __tcp_transmit_skb
        -> skb_set_hash_from_sk

    There will be one upcall, without information of skb
    hash, to ovs-vswitchd, for the first packet of a TCP
    session. The rest packets will be processed in Open vSwitch
    modules, hash kept. If this tcp session is forward to
    VXLAN module, then the UDP src port of first tcp packet
    is different from rest packets.

    TCP packets may come from the host or dockers, to Open vSwitch.
    To fix it, we store the hash info to upcall, and restore hash
    when packets sent back.

    +---------------+          +-------------------------+
    |   Docker/VMs  |          |     ovs-vswitchd        |
    +----+----------+          +-+--------------------+--+
         |                       ^                    |
         |                       |                    |
         |                       |  upcall            v restore packet hash
(not recalculate)
         |                     +-+--------------------+--+
         |  tap netdev         |                         |   vxlan module
         +--------------->     +-->  Open vSwitch ko     +-->
           or internal type    |                         |
                               +-------------------------+

    Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.html
    Signed-off-by: Tonghao Zhang <xiangxia.m.yue at gmail.com>
    Acked-by: Pravin B Shelar <pshelar at ovn.org>
    Signed-off-by: David S. Miller <davem at davemloft.net>

Tested-by: Aliasgar Ginwala <aginwala at ebay.com>
Acked-by: Tonghao Zhang <xiangxia.m.yue at gmail.com>
Signed-off-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

  Commit: 89b522aee379f7ebd21ec67ffb622118af7e9db1
      https://github.com/openvswitch/ovs/commit/89b522aee379f7ebd21ec67ffb622118af7e9db1
  Author: Mark Michelson <mmichels at redhat.com>
  Date:   2020-05-28 (Thu, 28 May 2020)

  Changed paths:
    M lib/ovsdb-idl.c
    M lib/ovsdb-idl.h

  Log Message:
  -----------
  ovsdb-idl: Add function to reset min_index.

If an administrator removes all of the databases in a cluster from
disk, then ovsdb IDL clients will have a problem. The databases will all
reset their stored indexes to 0, so The IDL client's min_index will be
higher than the indexes of all databases in the cluster. This results in
the client constantly connecting to databases, detecting the data as
"stale", and then attempting to connect to another.

This function provides a way to reset the IDL to an initial state with
min_index of 0. This way, the client will not wrongly detect the
database data as stale and will recover properly.

Notice that this function is not actually used anywhere in this patch.
This will be used by OVN, though, since OVN is the primary user of
clustered OVSDB.

Signed-off-by: Mark Michelson <mmichels at redhat.com>
Acked-by: Han Zhou <hzhou at ovn.org>
Signed-off-by: Ilya Maximets <i.maximets at ovn.org>

Compare: https://github.com/openvswitch/ovs/compare/ede446678f5d...89b522aee379