[ovs-discuss] ovs_assert(classifier_is_empty(&table->cls)) failed when restart openvswitch service.

Zhanghaibo (Euler) zhanghaibo5 at huawei.com
Sat Dec 9 07:58:14 UTC 2017


Hi Ben,

Thanks very much for your reply.

We checked patchs of bug fixes since 2.7.0 released, but did not find any patch that looks could fix the problem, so we did further investigation to this issue.

Here is our findings, would please kindly give your comments? Thank you.

1.      In flow learning procedure of “hanlder” thread, if old rule has already been existing, the rule will be deleted, and later in function add_flow_finish(),  remove_rule_rcu() function will be added to rcu cbset of “handler” thread, remove_rule_rcu() will remove the old rule from classifier of oftable. But even it has been added to rcu cbset, it will not be called before size of cbs[] reach max.

      /* To be called after version bump. */
      static void
      add_flow_finish(struct ofproto *ofproto, struct ofproto_flow_mod *ofm,
                      const struct openflow_mod_requester *req)
          OVS_REQUIRES(ofproto_mutex)
      {
          struct rule *old_rule = rule_collection_n(&ofm->old_rules)
              ? rule_collection_rules(&ofm->old_rules)[0] : NULL;
          struct rule *new_rule = rule_collection_rules(&ofm->new_rules)[0];
          struct ovs_list dead_cookies = OVS_LIST_INITIALIZER(&dead_cookies);

          replace_rule_finish(ofproto, ofm, req, old_rule, new_rule, &dead_cookies);
          learned_cookies_flush(ofproto, &dead_cookies);
          if (old_rule) {
              ovsrcu_postpone(remove_rule_rcu, old_rule);
          } else {
              ofmonitor_report(ofproto->connmgr, new_rule, NXFME_ADDED, 0,
                               req ? req->ofconn : NULL,
                               req ? req->request->xid : 0, NULL);

              /* Send Vacancy Events for OF1.4+. */
              send_table_status(ofproto, new_rule->table_id);
          }
      }


2.      In bridge destroy procedure of main thread, in function ofproto_destroy(), ofproto_destroy_defer__() will be added to cbset of “main” thread, it will not be called before size of cbs[] reach max.
      void
      ofproto_destroy(struct ofproto *p, bool del)
          OVS_EXCLUDED(ofproto_mutex)
      {
          struct ofport *ofport, *next_ofport;
          struct ofport_usage *usage;

          if (!p) {
              return;
          }

          if (p->meters) {
              meter_delete(p, 1, p->meter_features.max_meters);
              p->meter_features.max_meters = 0;
              free(p->meters);
              p->meters = NULL;
          }

          ofproto_flush__(p);
          HMAP_FOR_EACH_SAFE (ofport, next_ofport, hmap_node, &p->ports) {
              ofport_destroy(ofport, del);
          }

          HMAP_FOR_EACH_POP (usage, hmap_node, &p->ofport_usage) {
              free(usage);
          }

      #ifdef DPDK_EVS
          // must before p->ofproto_class->destruct(p, del);
          ofproto_free_sf_oftable(p);
          p->ofproto_class->destruct(p, del);
      #else
          p->ofproto_class->destruct(p);
      #endif

          /* We should not postpone this because it involves deleting a listening
           * socket which we may want to reopen soon. 'connmgr' may be used by other
           * threads only if they take the ofproto_mutex and read a non-NULL
           * 'ofproto->connmgr'. */
          ovs_mutex_lock(&ofproto_mutex);
          connmgr_destroy(p->connmgr);
          p->connmgr = NULL;
          ovs_mutex_unlock(&ofproto_mutex);

          /* Destroying rules is deferred, must have 'ofproto' around for them. */
          ovsrcu_postpone(ofproto_destroy_defer__, p);
      }

3.      As “handler” and “main” have cbs[] each, it is possible that, ofproto_destroy_defer__() is called earlier than remove_rule_rcu(), so classifier of oftable is not empty. Then it will cause assert failed in function

      /* Destroys 'table', including its classifier and eviction groups.
       *
       * The caller is responsible for freeing 'table' itself. */
      static void
      oftable_destroy(struct oftable *table)
      {
          ovs_assert(classifier_is_empty(&table->cls));

          ovs_mutex_lock(&ofproto_mutex);
          oftable_configure_eviction(table, 0, NULL, 0);
          ovs_mutex_unlock(&ofproto_mutex);

          hmap_destroy(&table->eviction_groups_by_id);
          heap_destroy(&table->eviction_groups_by_size);
          classifier_destroy(&table->cls);
          free(table->name);
      }

-----Original Message-----
From: Ben Pfaff [mailto:blp at ovn.org]
Sent: 2017年12月7日 6:25
To: Zhanghaibo (Euler) <zhanghaibo5 at huawei.com>
Cc: ovs-discuss at openvswitch.org; Yinpeijun <yinpeijun at huawei.com>; liucheng (J) <liucheng11 at huawei.com>; caihe <caihe at huawei.com>; Huangjian (J) <huangjian.huangjian at huawei.com>; Zhoujingbin <zhoujingbin at huawei.com>
Subject: Re: [ovs-discuss] ovs_assert(classifier_is_empty(&table->cls)) failed when restart openvswitch service.

On Wed, Dec 06, 2017 at 08:28:30PM +0000, Zhanghaibo (Euler) wrote:
> Hello all,
>
> I run into abort issue when restart openvswitch service, the coredump file shows that ovs_assert() failed in function oftable_destroy()/ofproto.c.
>
> The problem is pretty hard to reproduce, Do you have any idea about this? OVS release is v2.7.0, Any suggestion would be appreciated.
>
> Source codes and gdb information were copied below.

I'd first recommend upgrading to the latest on the 2.7 branch, which has over 230 bug fixes since 2.7.0 was released.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-discuss/attachments/20171209/e9259687/attachment-0001.html>


More information about the discuss mailing list