[ovs-dev] [PATCH V2] ofproto-dpif-upcall: Remove the dispatcher thread.

Alex Wang alexw at nicira.com
Tue Feb 18 18:26:19 UTC 2014


Thanks a lot Han for the close review ;D

Please see my comments inline.


On Sat, Feb 15, 2014 at 8:44 AM, Han Zhou <zhouhan at gmail.com> wrote:

> > -add_channel(struct dpif_linux *dpif, odp_port_t port_no, struct nl_sock
> *sock)
> > +add_vport_channels(struct dpif_linux *dpif, odp_port_t port_no,
> > +                   uint32_t **upcall_pids)
> >  {
> >      struct epoll_event event;
> > +    struct dpif_epoll *epolls = dpif->epolls;
> >      uint32_t port_idx = odp_to_u32(port_no);
> > +    uint32_t *pids = NULL;
>
> The variable pids is never used, but freed.
>
>

Thanks, forgot to delete this.



> +        }
> > +
> > +        if (epoll_ctl(epolls[i].epoll_fd, EPOLL_CTL_ADD,
> nl_sock_fd(sock),
> > +                      &event) < 0) {
> > +            error = errno;
> > +            goto error;
>
> If this epoll_ctl fails, the sock created above is not destroyed in
> error handling.
>
>

Thanks for the close review, i'll delete it in the error handling.




> ...
>
> > +static void
> > +destroy_all_channels(struct dpif_linux *dpif)
> > +{
> > +    unsigned int i;
> > +
> > +    if (!dpif->epolls) {
> >          return;
> >      }
> >
> > -    epoll_ctl(dpif->epoll_fd, EPOLL_CTL_DEL, nl_sock_fd(ch->sock),
> NULL);
> > -    dpif->event_offset = dpif->n_events = 0;
> > +    for (i = 0; i < dpif->uc_array_size; i++ ) {
> > +        struct dpif_linux_vport vport_request;
> > +        struct dpif_channel *ch = dpif->channels[i];
> > +
> > +        if (!ch->sock) {
>
> The condition here should be changed to "if (!ch)" because ch is now an
> array of channels.
>
>

Thanks, fixed.



> > @@ -541,41 +583,42 @@ dpif_linux_port_add__(struct dpif *dpif_, struct
> netdev *netdev,
> >      }
> ...
> > +    } else {
> > +        dpif_linux_vport_init(&request);
> > +        request.cmd = OVS_VPORT_CMD_SET;
> > +        request.dp_ifindex = dpif->dp_ifindex;
> > +        request.port_no = *port_nop;
> > +        request.n_pids = dpif->n_handlers;
> > +        request.upcall_pids = upcall_pids;
> > +        dpif_linux_vport_transact(&request, NULL, NULL);
> > +        free(upcall_pids);
> >      }
> >
> >      return 0;
>
> In this function, you separated vport transaction for creating vports
> and setting channel pids to kernel. But it could lead to problem when
> kernel started sending upcalls to user space when vports are added but
> channel pids not updated. I think it should still be kept as a single
> transaction to avoid race condition.
>
>

I think it is okay that we miss the upcalls for a short period.

The way current ovs calls 'port_add()' is that ODPP_NONE is given for
'*port_nop'.
And the datapath will densely allocate the port  number and notify it back
to
userspace.  So, there is no way to specify channel in the same transaction
in this
case.  Also, I don't want to separate socket creation from
add_vport_channel().




> ...
>
> >
> >  static uint32_t
> > -dpif_linux_port_get_pid(const struct dpif *dpif_, odp_port_t port_no)
> > +dpif_linux_port_get_pid(const struct dpif *dpif_, odp_port_t port_no,
> > +                        uint32_t hash)
> I suggest to use struct flow instead of hash as the parameter, because
> the handler mapping is dpif implementation specific. it is dpif
> implementation that determines how to utilize flow information for pid
> selection. E.g. dpif-XXX may use different hash algorithm than the
> flow_hash_5tuple().



Yeah, I had been thinking about this as well.  At that time, I think
"flow.h" is
logically higher level than dpif and should not be sent to dpif.  In fact,
the "#include <flow.h>" in "lib/dpif-linux" is never used.

I'll ask around then update with you.





>  >  {
> >      struct dpif_linux *dpif = dpif_linux_cast(dpif_);
> >      uint32_t port_idx = odp_to_u32(port_no);
> >      uint32_t pid = 0;
> >
> > -    ovs_mutex_lock(&dpif->upcall_lock);
> > -    if (dpif->epoll_fd >= 0) {
> > +    fat_rwlock_wrlock(&dpif->upcall_lock);
>
> Should this be fat_rwlock_rdlock? I don't see updates in below critical
> section.
>
>

Yes, you are right, fixed.



> @@ -1274,10 +1322,10 @@ dpif_linux_operate(struct dpif *dpif, struct
> dpif_op **ops, size_t n_ops)
> >  }
> >
> >  /* Synchronizes 'dpif->channels' with the set of vports currently in
> 'dpif' in
> > - * the kernel, by adding a new channel for any kernel vport that lacks
> one and
> > - * deleting any channels that have no backing kernel vports. */
> > + * the kernel, by adding a new set of channels for any kernel vport
> that lacks
> > + * one and deleting any channels that have no backing kernel vports. */
> >  static int
> > -dpif_linux_refresh_channels(struct dpif *dpif_)
> > +dpif_linux_refresh_channels(struct dpif *dpif_, uint32_t n_handlers)
> >  {
> ...
> >
> > -    /* To start with, we need an epoll fd. */
> > -    if (dpif->epoll_fd < 0) {
> > -        dpif->epoll_fd = epoll_create(10);
> > -        if (dpif->epoll_fd < 0) {
> > -            return errno;
> > +    if (dpif->n_handlers != n_handlers) {
> > +        destroy_all_channels(dpif);
> > +        dpif->epolls = xzalloc(n_handlers * sizeof *dpif->epolls);
> > +
> > +        for (i = 0; i < n_handlers; i++) {
> > +            dpif->epolls[i].epoll_fd = epoll_create(dpif->uc_array_size
> ?
> > +                                                    dpif->uc_array_size
> : 10);
> > +            if (dpif->epolls[i].epoll_fd < 0) {
> > +                return errno;
> > +            }
> >          }
> > +        dpif->n_handlers = n_handlers;
> > +    }
>
> Could it be more graceful when changing n_handlers? Just enlarge/shrink
> the channels arrays to minimize the traffic interruption when thread
> number is being changed?
>



There is the tradeoff.  And I think the amount of work is similar.

Your suggestion would require:
1.  'realloc' each channel array, del/create sockets in userspace.
2. epoll_ctl delete/register the socket.
3. call nl_transact() for each vport to update the new array of upcall_pids
in kernel.


So still I think the current way is cleaner.




> ...
>
> >
> >  static int
> > -dpif_linux_recv_set__(struct dpif *dpif_, bool enable)
> > +dpif_linux_recv_set__(struct dpif *dpif_, bool enable, uint32_t
> n_handlers)
> >  {
> >      struct dpif_linux *dpif = dpif_linux_cast(dpif_);
> >
> > -    if ((dpif->epoll_fd >= 0) == enable) {
> > +    if ((dpif->epolls != NULL) == enable) {
> > +        if (enable && dpif->n_handlers != n_handlers) {
> > +            dpif_linux_refresh_channels(dpif_, n_handlers);
>
> No return error checking. It is a little weird to do the refreshing
> here. Other dpif implementation would not know to do this in the
> interface dpif_recv_set. Please see my next comment.
>
> ...
>
> > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> > index 7b3e1eb..a53b3fd 100644
> > --- a/ofproto/ofproto-dpif.c
> > +++ b/ofproto/ofproto-dpif.c
> > @@ -470,7 +470,8 @@ type_run(const char *type)
> >
> >          backer->recv_set_enable = true;
> >
> > -        error = dpif_recv_set(backer->dpif, backer->recv_set_enable);
> > +        error = dpif_recv_set(backer->dpif, backer->recv_set_enable,
> > +                              n_handlers);
> >          if (error) {
> >              VLOG_ERR("Failed to enable receiving packets in dpif.");
> >              return error;
> > @@ -480,6 +481,7 @@ type_run(const char *type)
> >      }
> >
> >      if (backer->recv_set_enable) {
> > +        dpif_recv_set(backer->dpif, backer->recv_set_enable,
> n_handlers);
> >          udpif_set_threads(backer->udpif, n_handlers, n_revalidators);
> >      }
> >
>
> The logic of calling dpif_recv_set() here is somehow misleading. I
> understand that it is only for refreshing dpif->n_handlers, rather than
> enable/disable receiving. But would it be better to introduce a new
> interface in dpif to clearly update dpif->n_handler, which can be
> invoked in udpif_set_threads().
>
>

I'm okay with either.  will ask around then update with you.


I'll send V3 soon, ;D
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-dev/attachments/20140218/36ee6849/attachment-0005.html>


More information about the dev mailing list