[ovs-discuss] Event on vhostuser file descriptor 1024 terminates vswitchd thread
Patrik Andersson R
patrik.r.andersson at ericsson.com
Tue Mar 8 04:55:22 UTC 2016
Hi Ciara,
thank you for your comments. Appreciated!
Added some minor notes.
Regards,
Patrik
On 03/03/2016 04:56 PM, Loftus, Ciara wrote:
>> Hello,
> Hi Patrik,
>
> The majority of your concerns seem to be to do with the DPDK code itself so I would suggest posting to the dev at dpdk.org mailing list to raise your concerns, especially questions 2 & 3. Either way some comments/questions below.
>
> Thanks,
> Ciara
>
>> while doing robustness tests on OVS with DPDK datapath, I'm experiencing
>> vswitchd terminating due to a buffer overrun in DPDK
>> fdset_event_dispatch().
>> The underlying root cause for this, I'm quite sure, lies in our modified
>> version
>> of OpenStack or tests cases where some vhostuser ports happen to be left
>> behind when VMs with more than 16 vNICs are deleted using "nova delete".
> How does "nova delete" bring down VMs? Graceful shutdown or another way? Are the ports explicitly deleted after the VM is deleted, with del-port?
Yes, I think it is a graceful shutdown done by the "nova delete". Ports
are deleted
explicitly but not necessarily after the VM shutdown -- which gives us
the problem
discussed here:
http://openvswitch.org/pipermail/discuss/2016-February/020271.html
>> Thus, currently, the test leads to an accumulation of vhostuser sockets and
>> the termination is triggered when around >500 sockets are open at the
>> same time (accumulated sockets + sockets for 4 X VMs X 24 vNICs/VM).
>>
>> Neither OVS (2.4.0, with corrections from the 2.4 branch) or DPDK (2.1.0)
>> properly protects against a situation where file descriptor numbers exceeds
>> what can be held by the fd_set bit mask, as far as I can determine. The
>> question then is if this is a bug or if I'm missing something?
>> I'm well aware that this is an unlikely scenario but I do prefer graceful
>> handling if possible.
>>
>> 1) should there be a limit to the number of vhostuser ports that can be
>> created in OVS or should the limit be dictated by DPDK only?
> I think we first need to figure out what that limit is/if there is one and then if it can be avoided.
I came to the conclusion that it is hard to figure out in OVS what a
reasonable limit would
be. Instead DPDK should be the place.
>> 2) in rte_vhost_driver_register() there seems to be an implicit
>> assumption
>> that the file descriptor number and the vserver_cnt are related,
>> which in
>> general, does not hold. Can this check be improved?
> This would be a question for the DPDK Mailing List
Yes, will try to phrase a discussion topic there.
>> 3) should fdset_event_dispatch() protect against events from file
>> descriptors
>> greater than or equal to FD_SETSIZE (=__FD_SETSIZE=1024)?
>>
> Likewise
After careful consideration, I think checking at the "source" is better,
corresponding roughly
to 2) above.
>> What is your opinion?
>>
>>
>> ---
>>
>>
>> In short, what happens is roughly that a vhostuser port is registered in
>> (OVS, netdev-dpdk.c):
>>
>> static int
>> netdev_dpdk_vhost_user_construct(struct netdev *netdev_)
>> {
>> struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>> int err;
>>
>> ovs_mutex_lock(&dpdk_mutex);
>> /* Take the name of the vhost-user port and append it to the
>> location where
>> * the socket is to be created, then register the socket.
>> */
>> snprintf(netdev->vhost_id, sizeof(netdev->vhost_id), "%s/%s",
>> vhost_sock_dir, netdev_->name);
>> err = rte_vhost_driver_register(netdev->vhost_id);
>> ....
>> }
>>
>>
>> The call to rte_vhost_driver_register() ends up in (DPDK,
>> vhost-net-user.c):
>>
>> int
>> rte_vhost_driver_register(const char *path)
>> {
>> struct vhost_server *vserver;
>>
>> pthread_mutex_lock(&g_vhost_server.server_mutex);
>> if (ops == NULL)
>> ops = get_virtio_net_callbacks();
>>
>> if (g_vhost_server.vserver_cnt == MAX_VHOST_SERVER) {
>> RTE_LOG(ERR, VHOST_CONFIG,
>> "error: the number of servers reaches maximum\n");
>> pthread_mutex_unlock(&g_vhost_server.server_mutex);
>> return -1;
>> }
>> ...
>> }
>>
>> Here a check is done against MAX_VHOST_SERVER (=1024),
>> the number of vhostuser ports are not yet at maximum number.
>> But still the process can have more file descriptors open
>> than it has vhostuser sockets.
>>
>>
>> Then we come to the actual failure, where an event on a file
>> descriptor (fd=1024) leads to a memory access outside of the
>> fd_set bit mask, at (DPDK, fd_man.c, FD_ISSET line shown by "-->"):
>>
>> void
>> fdset_event_dispatch(struct fdset *pfdset)
>> {
>> fd_set rfds, wfds;
>> int i, maxfds;
>> struct fdentry *pfdentry;
>> int num = MAX_FDS;
>> fd_cb rcb, wcb;
>> void *dat;
>> int fd;
>> int remove1, remove2;
>> int ret;
>>
>> if (pfdset == NULL)
>> return;
>>
>> while (1) {
>> struct timeval tv;
>> tv.tv_sec = 1;
>> tv.tv_usec = 0;
>> FD_ZERO(&rfds);
>> FD_ZERO(&wfds);
>> pthread_mutex_lock(&pfdset->fd_mutex);
>>
>> maxfds = fdset_fill(&rfds, &wfds, pfdset);
>>
>> pthread_mutex_unlock(&pfdset->fd_mutex);
>>
>> /*
>> * When select is blocked, other threads might unregister
>> * listenfds from and register new listenfds into fdset.
>> * When select returns, the entries for listenfds in the fdset
>> * might have been updated. It is ok if there is unwanted call
>> * for new listenfds.
>> */
>> ret = select(maxfds + 1, &rfds, &wfds, NULL, &tv);
>> if (ret <= 0)
>> continue;
>>
>> for (i = 0; i < num; i++) {
>> remove1 = remove2 = 0;
>> pthread_mutex_lock(&pfdset->fd_mutex);
>> pfdentry = &pfdset->fd[i];
>> fd = pfdentry->fd;
>> rcb = pfdentry->rcb;
>> wcb = pfdentry->wcb;
>> dat = pfdentry->dat;
>> pfdentry->busy = 1;
>> pthread_mutex_unlock(&pfdset->fd_mutex);
>>
>> ----> if (fd >= 0 && FD_ISSET(fd, &rfds) && rcb)
>> rcb(fd, dat, &remove1);
>>
>>
>> if (fd >= 0 && FD_ISSET(fd, &wfds) && wcb)
>> wcb(fd, dat, &remove2);
>> ...
>> }
>> }
>> }
>>
>>
>> Regards,
>>
>> Patrik
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/discuss
More information about the discuss
mailing list