[ovs-dev] [PATCH v8] dpif-netlink: distribute polling to discreet handlers
Mark Gray
mark.d.gray at redhat.com
Fri Sep 18 16:31:30 UTC 2020
On 17/09/2020 20:49, Aaron Conole wrote:
> Ilya Maximets <i.maximets at ovn.org> writes:
>
>> On 9/9/20 7:08 PM, Mark Gray wrote:
>>> From: Aaron Conole <aconole at redhat.com>
>>>
>>> Currently, the channel handlers are polled globally. On some
>>> systems, this causes a thundering herd issue where multiple
>>> handler threads become active, only to do no work and immediately
>>> sleep.
>>>
>>> The approach here is to push the netlink socket channels to discreet
>>> handler threads to process, rather than polling on every thread.
>>> This will eliminate the need to wake multiple threads.
>>>
>>> To check:
>>>
>>> ip netns add left
>>> ip netns add right
>>> ip link add center-left type veth peer name left0 netns left
>>> ip link add center-right type veth peer name right0 netns right
>>> ip link set center-left up
>>> ip link set center-right up
>>> ip -n left ip link set left0 up
>>> ip -n left ip addr add 172.31.110.10/24 dev left0
>>> ip -n right ip link set right0 up
>>> ip -n right ip addr add 172.31.110.11/24 dev right0
>>>
>>> ovs-vsctl add-br br0
>>> ovs-vsctl add-port br0 center-right
>>> ovs-vsctl add-port br0 center-left
>>>
>>> # in one terminal
>>> perf record -e sched:sched_wakeup,irq:softirq_entry -ag
>>>
>>> # in a separate terminal
>>> ip netns exec left arping -I left0 -c 1 172.31.110.11
>>>
>>> # in the perf terminal after exiting
>>> perf script
>>>
>>> Look for the number of 'handler' threads which were made active.
>>>
>>> Suggested-by: Ben Pfaff <blp at ovn.org>
>>> Co-authored-by: Mark Gray <mark.d.gray at redhat.com>
>>> Reported-by: David Ahern <dsahern at gmail.com>
>>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-December/365857.html
>>> Cc: Matteo Croce <technoboy85 at gmail.com>
>>> Cc: Flavio Leitner <fbl at sysclose.org>
>>> Fixes: 69c51582f ("dpif-netlink: don't allocate per thread netlink sockets")
>>> Signed-off-by: Aaron Conole <aconole at redhat.com>
>>> Signed-off-by: Mark Gray <mark.d.gray at redhat.com>
>>> ---
>>>
>>>
>>> v2: Oops - forgot to commit my whitespace cleanups.
>>> v3: one port latency results as per Matteo's comments
>>>
>>> Stock:
>>> min/avg/max/mdev = 21.5/36.5/96.5/1.0 us
>>> With Patch:
>>> min/avg/max/mdev = 5.3/9.7/98.4/0.5 us
>>> v4: Oops - first email did not send
>>> v5: min/avg/max/mdev = 9.3/10.4/33.6/2.2 us
>>> v6: Split out the AUTHORS patch and added a cover letter as
>>> per Ilya's suggestion.
>>> Fixed 0-day issues.
>>> v7: Merged patches as per Flavio's suggestion. This is
>>> no longer a series. Fixed some other small issues.
>>>
>>> lib/dpif-netlink.c | 330 +++++++++++++++++++++++++--------------------
>>> lib/id-pool.c | 10 ++
>>> lib/id-pool.h | 1 +
>>> 3 files changed, 197 insertions(+), 144 deletions(-)
>>>
>>> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
>>> index 7da4fb54d..a74838506 100644
>>> --- a/lib/dpif-netlink.c
>>> +++ b/lib/dpif-netlink.c
>>> @@ -37,6 +37,7 @@
>>> #include "dpif-provider.h"
>>> #include "fat-rwlock.h"
>>> #include "flow.h"
>>> +#include "id-pool.h"
>>> #include "netdev-linux.h"
>>> #include "netdev-offload.h"
>>> #include "netdev-provider.h"
>>> @@ -160,6 +161,9 @@ static void dpif_netlink_flow_to_dpif_flow(struct dpif_flow *,
>>>
>>> /* One of the dpif channels between the kernel and userspace. */
>>> struct dpif_channel {
>>> + struct hmap_node hmap_node;
>>
>> This needs a comment and all other new structure fileds too.
>> Ex.:
>> struct hmap_node hmap_node; /* In struct dpif_handler's channels hmap. */
>>
>>> + struct dpif_handler *handler;
Ok, updated
>>
>> Why do we need this back-pointer? Maybe it's better to just add 'handler'
>> argument to dpif_channel_del() ?
I have kept this as it is required due to the rework that Flavio has
suggested in another thread.
>>
>>> + uint32_t port_idx;
>>
>> I see a lot of 'port_idx' here and there, but it's not really an index anymore.
>> You removed all the arrays that used these variables as index.
>> Can we just use 'port_no' directly here? I know that there will be conversions
>> from odp to u32, but this should be more clear. If conversions are the issue,
>> we could store u32 version of 'port_no' here.
I think this is a good idea. 'port_idx' is littered around this file in
various places and it caused me an issue it a previous revision because
the naming convention was not consistent. I will just remove all
references to it.
>>
>> And, please, add a comment.
>>
>>> struct nl_sock *sock; /* Netlink socket. */
>>> long long int last_poll; /* Last time this channel was polled. */
>>> };
>>> @@ -183,6 +187,8 @@ struct dpif_handler {
>>> int n_events; /* Num events returned by epoll_wait(). */
>>> int event_offset; /* Offset into 'epoll_events'. */
>>>
>>> + struct hmap channels; /* map of channels for each port. */
>>
>> Comment style: Capital letter.
Ok
>>
>>> +
>>> #ifdef _WIN32
>>> /* Pool of sockets. */
>>> struct dpif_windows_vport_sock *vport_sock_pool;
>>> @@ -201,9 +207,8 @@ struct dpif_netlink {
>>> struct fat_rwlock upcall_lock;
>>> struct dpif_handler *handlers;
>>> uint32_t n_handlers; /* Num of upcall handlers. */
>>> - struct dpif_channel *channels; /* Array of channels for each port. */
>>> - int uc_array_size; /* Size of 'handler->channels' and */
>>> - /* 'handler->epoll_events'. */
>>> +
>>> + struct id_pool *port_ids; /* Set of added port ids */
>>
>> What is the reason of having this pool? Datapath port numbers are unique
>> except for ODPP_NONE, and all the functions you implemented actually
>> iterates over all channels anyway to find one with requested port_idx.
>> So, the only purpose of it is to avoid searching on incorect port number,
>> but this should be a rare case. Can we just remove it?
>>
>> OTOH, if it's necessary to have this id pool, I'd suggest to use hmapx
>> instead as it already has all required infrastructure and doesn't actually
>> care about range of values.
I have replaced this with a hmap of all the channels as suggested in
another thread by Flavio.
>>
>>>
>>> /* Change notification. */
>>> struct nl_sock *port_notifier; /* vport multicast group subscriber. */
>>> @@ -287,6 +292,44 @@ close_nl_sock(struct nl_sock *sock)
>>> #endif
>>> }
>>>
>>> +static size_t
>>> +dpif_handler_channels_count(const struct dpif_handler *handler)
>>> +{
>>> + return hmap_count(&handler->channels);
>>> +}
>>> +
>>> +static struct dpif_channel *
>>> +dpif_handler_channel_find(struct dpif_handler *handler, uint32_t idx)
>>> +{
>>> + struct dpif_channel *channel;
>>> +
>>> + HMAP_FOR_EACH_WITH_HASH (channel, hmap_node, hash_int(idx, 0),
>>> + &handler->channels) {
>>> + if (channel->port_idx == idx) {
>>> + return channel;
>>> + }
>>> + }
>>> +
>>> + return NULL;
>>> +}
>>> +
>>> +static void
>>> +dpif_channel_del(struct dpif_netlink *dpif,
>>> + struct dpif_channel *channel)
>>
>> As suggested above, it might make sense to just pass 'handler' here and
>> avoid having back-pointer inside the channel structure.
I have left this like this as I have reworked based on Flavio's suggestion.
>>
>>> +{
>>> + struct dpif_handler *handler = channel->handler;
>>> +
>>> +#ifndef _WIN32
>>> + epoll_ctl(handler->epoll_fd, EPOLL_CTL_DEL,
>>> + nl_sock_fd(channel->sock), NULL);
>>> + nl_sock_destroy(channel->sock);
>>> +#endif
>>> + hmap_remove(&handler->channels, &channel->hmap_node);
>>> + id_pool_free_id(dpif->port_ids, channel->port_idx);
>>> + handler->event_offset = handler->n_events = 0;
>>> + free(channel);
>>> +}
>>> +
>>> static struct dpif_netlink *
>>> dpif_netlink_cast(const struct dpif *dpif)
>>> {
>>> @@ -385,6 +428,7 @@ open_dpif(const struct dpif_netlink_dp *dp, struct dpif **dpifp)
>>>
>>> dpif->dp_ifindex = dp->dp_ifindex;
>>> dpif->user_features = dp->user_features;
>>> + dpif->port_ids = id_pool_create(0, MAX_PORTS);
>>> *dpifp = &dpif->dpif;
>>>
>>> return 0;
>>> @@ -446,167 +490,151 @@ error:
>>> }
>>> #endif /* _WIN32 */
>>>
>>> -/* Given the port number 'port_idx', extracts the pid of netlink socket
>>> +/* Given the port number 'port_no', extracts the pid of netlink socket
>>> * associated to the port and assigns it to 'upcall_pid'. */
>>> static bool
>>> -vport_get_pid(struct dpif_netlink *dpif, uint32_t port_idx,
>>> +vport_get_pid(struct dpif_netlink *dpif, odp_port_t port_no,
>>> uint32_t *upcall_pid)
>>
>> This function seems redundant as it does almost same thing as
>> dpif_netlink_port_get_pid__(). Can we combine them?
Yes, I guess it could be combined.
>>
>>> {
>>> - /* Since the nl_sock can only be assigned in either all
>>> - * or none "dpif" channels, the following check
>>> - * would suffice. */
>>> - if (!dpif->channels[port_idx].sock) {
>>> + uint32_t port_idx = odp_to_u32(port_no);
>>> + size_t i;
>>> +
>>> + ovs_assert(!WINDOWS || dpif->n_handlers <= 1);
>>> +
>>> + if (!id_pool_has_id(dpif->port_ids, port_idx)) {
>>> return false;
>>> }
>>> - ovs_assert(!WINDOWS || dpif->n_handlers <= 1);
>>>
>>> - *upcall_pid = nl_sock_pid(dpif->channels[port_idx].sock);
>>> + /* The 'port_idx' should only be valid for a single handler. */
>>> + for (i = 0; i < dpif->n_handlers; i++) {
>>> + struct dpif_channel *channel;
>>> +
>>> + channel = dpif_handler_channel_find(&dpif->handlers[i], port_idx);
>>> + if (channel) {
>>> + *upcall_pid = nl_sock_pid(channel->sock);
>>> + return true;
>>> + }
>>> + }
>>>
>>> - return true;
>>> + return false;
>>> }
>>>
>>> static int
>>> vport_add_channel(struct dpif_netlink *dpif, odp_port_t port_no,
>>> struct nl_sock *sock)
>>> {
>>> - struct epoll_event event;
>>> uint32_t port_idx = odp_to_u32(port_no);
>>> + struct dpif_channel *channel;
>>> + struct dpif_handler *handler;
>>> + struct epoll_event event;
>>> size_t i;
>>> - int error;
>>>
>>> - if (dpif->handlers == NULL) {
>>> + if (dpif->handlers == NULL ||
>>> + id_pool_has_id(dpif->port_ids, port_idx)) {
>>> close_nl_sock(sock);
>>> - return 0;
>>> + return EINVAL;
>>> }
>>>
>>> - /* We assume that the datapath densely chooses port numbers, which can
>>> - * therefore be used as an index into 'channels' and 'epoll_events' of
>>> - * 'dpif'. */
>>> - if (port_idx >= dpif->uc_array_size) {
>>> - uint32_t new_size = port_idx + 1;
>>> -
>>> - if (new_size > MAX_PORTS) {
>>> - VLOG_WARN_RL(&error_rl, "%s: datapath port %"PRIu32" too big",
>>> - dpif_name(&dpif->dpif), port_no);
>>> - return EFBIG;
>>> - }
>>> + if (port_idx >= MAX_PORTS) {
>>> + VLOG_WARN_RL(&error_rl, "%s: datapath port %"PRIu32" too big",
>>> + dpif_name(&dpif->dpif), port_no);
>>
>> This looks a bit wierd since we're checking port_idx and printing port_no
>> instead. Might be refined if we will just use port_no everywhere.
Yes, it will be.
>>
>>> + return EFBIG;
>>> + }
>>>
>>> - dpif->channels = xrealloc(dpif->channels,
>>> - new_size * sizeof *dpif->channels);
>>> + handler = &dpif->handlers[0];
>>>
>>> - for (i = dpif->uc_array_size; i < new_size; i++) {
>>> - dpif->channels[i].sock = NULL;
>>> + for (i = 0; i < dpif->n_handlers; i++) {
>>> + if (dpif_handler_channels_count(&dpif->handlers[i]) <
>>> + dpif_handler_channels_count(handler)) {
>>> + handler = &dpif->handlers[i];
>>> }
>>> + }
>>>
>>> - for (i = 0; i < dpif->n_handlers; i++) {
>>> - struct dpif_handler *handler = &dpif->handlers[i];
>>> + channel = xmalloc(sizeof *channel);
>>> + channel->port_idx = port_idx;
>>> + channel->sock = sock;
>>> + channel->last_poll = LLONG_MIN;
>>> + channel->handler = handler;
>>> + hmap_insert(&handler->channels, &channel->hmap_node,
>>> + hash_int(port_idx, 0));
>>>
>>> - handler->epoll_events = xrealloc(handler->epoll_events,
>>> - new_size * sizeof *handler->epoll_events);
>>> -
>>> - }
>>> - dpif->uc_array_size = new_size;
>>> - }
>>> + handler->epoll_events = xrealloc(handler->epoll_events,
>>> + dpif_handler_channels_count(handler) *
>>> + sizeof *handler->epoll_events);
>>>
>>> memset(&event, 0, sizeof event);
>>> event.events = EPOLLIN | EPOLLEXCLUSIVE;
>>> event.data.u32 = port_idx;
>>>
>>> - for (i = 0; i < dpif->n_handlers; i++) {
>>> - struct dpif_handler *handler = &dpif->handlers[i];
>>> -
>>> #ifndef _WIN32
>>> - if (epoll_ctl(handler->epoll_fd, EPOLL_CTL_ADD, nl_sock_fd(sock),
>>> - &event) < 0) {
>>> - error = errno;
>>> - goto error;
>>> - }
>>> -#endif
>>> - }
>>> - dpif->channels[port_idx].sock = sock;
>>> - dpif->channels[port_idx].last_poll = LLONG_MIN;
>>> -
>>> - return 0;
>>> -
>>> -error:
>>> -#ifndef _WIN32
>>> - while (i--) {
>>> - epoll_ctl(dpif->handlers[i].epoll_fd, EPOLL_CTL_DEL,
>>> - nl_sock_fd(sock), NULL);
>>> + if (epoll_ctl(handler->epoll_fd, EPOLL_CTL_ADD, nl_sock_fd(sock),
>>> + &event) < 0) {
>>> + hmap_remove(&handler->channels, &channel->hmap_node);
>>> + free(channel);
>>> + id_pool_free_id(dpif->port_ids, port_idx);
>>> + return errno;
>>> }
>>> #endif
>>> - dpif->channels[port_idx].sock = NULL;
>>>
>>> - return error;
>>> + id_pool_add(dpif->port_ids, port_idx);
>>> + return 0;
>>> }
>>>
>>> -static void
>>> -vport_del_channels(struct dpif_netlink *dpif, odp_port_t port_no)
>>> +static bool
>>> +vport_del_channel(struct dpif_netlink *dpif, odp_port_t port_no)
>>> {
>>> - uint32_t port_idx = odp_to_u32(port_no);
>>> size_t i;
>>>
>>> - if (!dpif->handlers || port_idx >= dpif->uc_array_size
>>> - || !dpif->channels[port_idx].sock) {
>>> - return;
>>> - }
>>> -
>>> for (i = 0; i < dpif->n_handlers; i++) {
>>> - struct dpif_handler *handler = &dpif->handlers[i];
>>> -#ifndef _WIN32
>>> - epoll_ctl(handler->epoll_fd, EPOLL_CTL_DEL,
>>> - nl_sock_fd(dpif->channels[port_idx].sock), NULL);
>>> -#endif
>>> - handler->event_offset = handler->n_events = 0;
>>> + struct dpif_channel *channel;
>>> +
>>> + channel = dpif_handler_channel_find(&dpif->handlers[i],
>>> + odp_to_u32(port_no));
>>> + if (channel) {
>>> + dpif_channel_del(dpif, channel);
>>> + return true;
>>> + }
>>> }
>>> -#ifndef _WIN32
>>> - nl_sock_destroy(dpif->channels[port_idx].sock);
>>> -#endif
>>> - dpif->channels[port_idx].sock = NULL;
>>> +
>>> + return false;
>>> }
>>>
>>> static void
>>> destroy_all_channels(struct dpif_netlink *dpif)
>>> OVS_REQ_WRLOCK(dpif->upcall_lock)
>>> {
>>> - unsigned int i;
>>> + size_t i;
>>>
>>> if (!dpif->handlers) {
>>> return;
>>> }
>>>
>>> - for (i = 0; i < dpif->uc_array_size; i++ ) {
>>> - struct dpif_netlink_vport vport_request;
>>> - uint32_t upcall_pids = 0;
>>> -
>>> - if (!dpif->channels[i].sock) {
>>> - continue;
>>> - }
>>> -
>>> - /* Turn off upcalls. */
>>> - dpif_netlink_vport_init(&vport_request);
>>> - vport_request.cmd = OVS_VPORT_CMD_SET;
>>> - vport_request.dp_ifindex = dpif->dp_ifindex;
>>> - vport_request.port_no = u32_to_odp(i);
>>> - vport_request.n_upcall_pids = 1;
>>> - vport_request.upcall_pids = &upcall_pids;
>>> - dpif_netlink_vport_transact(&vport_request, NULL, NULL);
>>> -
>>> - vport_del_channels(dpif, u32_to_odp(i));
>>> - }
>>> -
>>> for (i = 0; i < dpif->n_handlers; i++) {
>>> struct dpif_handler *handler = &dpif->handlers[i];
>>> + struct dpif_channel *channel, *next;
>>> +
>>> + HMAP_FOR_EACH_SAFE (channel, next, hmap_node, &handler->channels) {
>>> + struct dpif_netlink_vport vport_request;
>>> + uint32_t upcall_pids = 0;
>>
>> Empty line, please.
OK
>>
>>> + /* Turn off upcalls. */
>>> + dpif_netlink_vport_init(&vport_request);
>>> + vport_request.cmd = OVS_VPORT_CMD_SET;
>>> + vport_request.dp_ifindex = dpif->dp_ifindex;
>>> + vport_request.port_no = u32_to_odp(channel->port_idx);
>>> + vport_request.n_upcall_pids = 1;
>>> + vport_request.upcall_pids = &upcall_pids;
>>> + dpif_netlink_vport_transact(&vport_request, NULL, NULL);
>>> + ovs_assert(channel != NULL);
>>> + dpif_channel_del(dpif, channel);
>>> + }
>>>
>>> dpif_netlink_handler_uninit(handler);
>>> free(handler->epoll_events);
>>> }
>>> - free(dpif->channels);
>>> +
>>> free(dpif->handlers);
>>> dpif->handlers = NULL;
>>> - dpif->channels = NULL;
>>> dpif->n_handlers = 0;
>>> - dpif->uc_array_size = 0;
>>> }
>>>
>>> static void
>>> @@ -618,9 +646,11 @@ dpif_netlink_close(struct dpif *dpif_)
>>>
>>> fat_rwlock_wrlock(&dpif->upcall_lock);
>>> destroy_all_channels(dpif);
>>> + id_pool_destroy(dpif->port_ids);
>>> fat_rwlock_unlock(&dpif->upcall_lock);
>>>
>>> fat_rwlock_destroy(&dpif->upcall_lock);
>>> +
>>> free(dpif);
>>> }
>>>
>>> @@ -1003,7 +1033,9 @@ dpif_netlink_port_del__(struct dpif_netlink *dpif, odp_port_t port_no)
>>> #endif
>>> error = dpif_netlink_vport_transact(&vport, NULL, NULL);
>>>
>>> - vport_del_channels(dpif, port_no);
>>> + if (!vport_del_channel(dpif, port_no)) {
>>
>> If packets receiving disabled with dpif_recv_set(), there will be no channels
>> and this check will fail. And if it we will return here we will leave tunnel
>> port in a system which is not a good thing.
>>
>> And we will also leak 'dpif_port' here.
>
> Agreed.
Good catch Ilya. I think the rework based on Flavio's comment will
change this but I have taken it into consideration in the rework.
>
>>> + return EINVAL;
>>> + }
>>>
>>> if (!error && !ovs_tunnels_out_of_tree) {
>>> error = dpif_netlink_rtnl_port_destroy(dpif_port.name, dpif_port.type);
>>> @@ -1084,23 +1116,39 @@ dpif_netlink_port_get_pid__(const struct dpif_netlink *dpif,
>>> odp_port_t port_no)
>>> OVS_REQ_RDLOCK(dpif->upcall_lock)
>>> {
>>> + struct dpif_channel *min_channel = NULL, *found_channel = NULL;
>>> uint32_t port_idx = odp_to_u32(port_no);
>>> uint32_t pid = 0;
>>> + size_t i;
>>> +
>>> + /* The ODPP_NONE "reserved" port number uses the "ovs-system"'s
>>> + * channel, since it is not heavily loaded. */
>>> + uint32_t idx = port_no == ODPP_NONE ? 0 : port_idx;
>>> +
>>> + if (!id_pool_has_id(dpif->port_ids, port_idx)) {
>>> + return 0;
>>> + }
>>> +
>>> + if (dpif->handlers) {
>>> + for (i = 0; i < dpif->n_handlers; i++) {
>>> + if (dpif_handler_channel_find(&dpif->handlers[i], idx)) {
>>> + found_channel = dpif_handler_channel_find(&dpif->handlers[i],
>>> + idx);
>>
>> Looking up twice seems inefficient. Same for the 0 case below.
Agreed, I have reworked this.
>>
>>> + break;
>>> + }
>>>
>>> - if (dpif->handlers && dpif->uc_array_size > 0) {
>>> - /* The ODPP_NONE "reserved" port number uses the "ovs-system"'s
>>> - * channel, since it is not heavily loaded. */
>>> - uint32_t idx = port_idx >= dpif->uc_array_size ? 0 : port_idx;
>>> -
>>> - /* Needs to check in case the socket pointer is changed in between
>>> - * the holding of upcall_lock. A known case happens when the main
>>> - * thread deletes the vport while the handler thread is handling
>>> - * the upcall from that port. */
>>> - if (dpif->channels[idx].sock) {
>>> - pid = nl_sock_pid(dpif->channels[idx].sock);
>>> + if (dpif_handler_channel_find(&dpif->handlers[i], 0)) {
>>> + min_channel = dpif_handler_channel_find(&dpif->handlers[i], 0);
>>> + }
>>> }
>>> }
>>>
>>> + if (found_channel) {
>>> + pid = nl_sock_pid(found_channel->sock);
>>> + } else if (min_channel) {
>>> + pid = nl_sock_pid(min_channel->sock);
>>
>> This might be a question for Aaron mostly.
>>
>> Could you, please, elaborate why we have a behavior change here?
>> Old code just returnd 0 as a pid, but after this change we will return
>> pid of a channel of port 0.
>
> I see what you mean - previously, we only got pid for index 0 when the
> port index was out of range. Now we will get it even when the port
> index is in range, but not found. That shouldn't be - it is my mistake.
I modified this a bit to combine, simplify and I think also make a bit
more efficient. Please give it a review.
>
>>> + }
>>> +
>>> return pid;
>>> }
>>>
>>> @@ -2342,12 +2390,14 @@ dpif_netlink_operate(struct dpif *dpif_, struct dpif_op **ops, size_t n_ops,
>>> static void
>>> dpif_netlink_handler_uninit(struct dpif_handler *handler)
>>> {
>>> + hmap_destroy(&handler->channels);
>>> vport_delete_sock_pool(handler);
>>> }
>>>
>>> static int
>>> dpif_netlink_handler_init(struct dpif_handler *handler)
>>> {
>>> + hmap_init(&handler->channels);
>>> return vport_create_sock_pool(handler);
>>> }
>>> #else
>>> @@ -2355,6 +2405,7 @@ dpif_netlink_handler_init(struct dpif_handler *handler)
>>> static int
>>> dpif_netlink_handler_init(struct dpif_handler *handler)
>>> {
>>> + hmap_init(&handler->channels);
>>> handler->epoll_fd = epoll_create(10);
>>> return handler->epoll_fd < 0 ? errno : 0;
>>> }
>>> @@ -2362,6 +2413,7 @@ dpif_netlink_handler_init(struct dpif_handler *handler)
>>> static void
>>> dpif_netlink_handler_uninit(struct dpif_handler *handler)
>>> {
>>> + hmap_destroy(&handler->channels);
>>> close(handler->epoll_fd);
>>> }
>>> #endif
>>> @@ -2374,9 +2426,7 @@ static int
>>> dpif_netlink_refresh_channels(struct dpif_netlink *dpif, uint32_t n_handlers)
>>> OVS_REQ_WRLOCK(dpif->upcall_lock)
>>> {
>>> - unsigned long int *keep_channels;
>>> struct dpif_netlink_vport vport;
>>> - size_t keep_channels_nbits;
>>> struct nl_dump dump;
>>> uint64_t reply_stub[NL_DUMP_BUFSIZE / 8];
>>> struct ofpbuf buf;
>>> @@ -2412,22 +2462,16 @@ dpif_netlink_refresh_channels(struct dpif_netlink *dpif, uint32_t n_handlers)
>>>
>>> for (i = 0; i < n_handlers; i++) {
>>> struct dpif_handler *handler = &dpif->handlers[i];
>>> -
>>
>> Please, keep.
OK
>>
>>> handler->event_offset = handler->n_events = 0;
>>> }
>>>
>>> - keep_channels_nbits = dpif->uc_array_size;
>>> - keep_channels = bitmap_allocate(keep_channels_nbits);
>>> -
>>> ofpbuf_use_stub(&buf, reply_stub, sizeof reply_stub);
>>> dpif_netlink_port_dump_start__(dpif, &dump);
>>> while (!dpif_netlink_port_dump_next__(dpif, &dump, &vport, &buf)) {
>>> - uint32_t port_no = odp_to_u32(vport.port_no);
>>> uint32_t upcall_pid;
>>> int error;
>>>
>>> - if (port_no >= dpif->uc_array_size
>>> - || !vport_get_pid(dpif, port_no, &upcall_pid)) {
>>> + if (!vport_get_pid(dpif, vport.port_no, &upcall_pid)) {
>>> struct nl_sock *sock;
>>> error = create_nl_sock(dpif, &sock);
>>>
>>> @@ -2475,25 +2519,15 @@ dpif_netlink_refresh_channels(struct dpif_netlink *dpif, uint32_t n_handlers)
>>> }
>>> }
>>>
>>> - if (port_no < keep_channels_nbits) {
>>> - bitmap_set1(keep_channels, port_no);
>>> - }
>>> continue;
>>>
>>> error:
>>> - vport_del_channels(dpif, vport.port_no);
>>> + vport_del_channel(dpif, vport.port_no);
>>> }
>>> +
>>> nl_dump_done(&dump);
>>> ofpbuf_uninit(&buf);
>>>
>>> - /* Discard any saved channels that we didn't reuse. */
>>> - for (i = 0; i < keep_channels_nbits; i++) {
>>> - if (!bitmap_is_set(keep_channels, i)) {
>>> - vport_del_channels(dpif, u32_to_odp(i));
>>> - }
>>> - }
>>> - free(keep_channels);
>>> -
>>> return retval;
>>> }
>>>
>>> @@ -2714,6 +2748,8 @@ dpif_netlink_recv__(struct dpif_netlink *dpif, uint32_t handler_id,
>>> OVS_REQ_RDLOCK(dpif->upcall_lock)
>>> {
>>> struct dpif_handler *handler;
>>> + size_t n_channels;
>>> +
>>> int read_tries = 0;
>>>
>>> if (!dpif->handlers || handler_id >= dpif->n_handlers) {
>>> @@ -2721,14 +2757,15 @@ dpif_netlink_recv__(struct dpif_netlink *dpif, uint32_t handler_id,
>>> }
>>>
>>> handler = &dpif->handlers[handler_id];
>>> - if (handler->event_offset >= handler->n_events) {
>>> + n_channels = dpif_handler_channels_count(handler);
>>> + if (handler->event_offset >= handler->n_events && n_channels) {
>>> int retval;
>>>
>>> handler->event_offset = handler->n_events = 0;
>>>
>>> do {
>>> retval = epoll_wait(handler->epoll_fd, handler->epoll_events,
>>> - dpif->uc_array_size, 0);
>>> + n_channels, 0);
>>> } while (retval < 0 && errno == EINTR);
>>>
>>> if (retval < 0) {
>>> @@ -2741,7 +2778,10 @@ dpif_netlink_recv__(struct dpif_netlink *dpif, uint32_t handler_id,
>>>
>>> while (handler->event_offset < handler->n_events) {
>>> int idx = handler->epoll_events[handler->event_offset].data.u32;
>>> - struct dpif_channel *ch = &dpif->channels[idx];
>>> + struct dpif_channel *ch = dpif_handler_channel_find(handler, idx);
>>> + if (!ch) {
>>> + return EAGAIN;
>>
>> How can it be that handler received event on port that it doesn't have channel for?
>> Maybe assertion will be better here?
>
> Hrrm... I guess the only case I can imagine is if during upcall
> processing we remove a port? I don't know if it can actually happen
> though. I agree, maybe an assert / log would be better.
Agreed, I have added an assert (and another one, based on the rework
Flavio suggested)
>
>>> + }
>>>
>>> handler->event_offset++;
>>>
>>> @@ -2845,12 +2885,14 @@ dpif_netlink_recv_purge__(struct dpif_netlink *dpif)
>>> if (dpif->handlers) {
>>> size_t i;
>>>
>>> - if (!dpif->channels[0].sock) {
>>> - return;
>>> - }
>>> - for (i = 0; i < dpif->uc_array_size; i++ ) {
>>> + for (i = 0; i < dpif->n_handlers; i++) {
>>> + struct dpif_handler *handler = &dpif->handlers[i];
>>> + struct dpif_channel *channel;
>>> +
>>> + HMAP_FOR_EACH (channel, hmap_node, &handler->channels) {
>>> + nl_sock_drain(channel->sock);
>>> + }
>>>
>>> - nl_sock_drain(dpif->channels[i].sock);
>>> }
>>> }
>>> }
>>> diff --git a/lib/id-pool.c b/lib/id-pool.c
>>> index 69910ad08..bef822f6b 100644
>>> --- a/lib/id-pool.c
>>> +++ b/lib/id-pool.c
>>> @@ -93,6 +93,16 @@ id_pool_find(struct id_pool *pool, uint32_t id)
>>> return NULL;
>>> }
>>>
>>> +bool
>>> +id_pool_has_id(struct id_pool *pool, uint32_t id)
>>> +{
>>> + if (!id_pool_find(pool, id)) {
>>> + return false;
>>> + }
>>> +
>>> + return true;
>>> +}
>>> +
>>> void
>>> id_pool_add(struct id_pool *pool, uint32_t id)
>>> {
>>> diff --git a/lib/id-pool.h b/lib/id-pool.h
>>> index 8721f8793..62876e2a5 100644
>>> --- a/lib/id-pool.h
>>> +++ b/lib/id-pool.h
>>> @@ -29,6 +29,7 @@ void id_pool_destroy(struct id_pool *);
>>> bool id_pool_alloc_id(struct id_pool *, uint32_t *id);
>>> void id_pool_free_id(struct id_pool *, uint32_t id);
>>> void id_pool_add(struct id_pool *, uint32_t id);
>>> +bool id_pool_has_id(struct id_pool *, uint32_t id);
>>>
>>> /*
>>> * ID pool.
>>>
>
More information about the dev
mailing list