[ovs-discuss] Event on vhostuser file descriptor 1024 terminates vswitchd thread

Patrik Andersson R patrik.r.andersson at ericsson.com
Tue Mar 8 04:55:22 UTC 2016


Hi Ciara,

thank you for your comments. Appreciated!

Added some minor notes.

Regards,

Patrik


On 03/03/2016 04:56 PM, Loftus, Ciara wrote:
>> Hello,
> Hi Patrik,
>
> The majority of your concerns seem to be to do with the DPDK code itself so I would suggest posting to the dev at dpdk.org mailing list to raise your concerns, especially questions 2 & 3. Either way some comments/questions below.
>
> Thanks,
> Ciara
>
>> while doing robustness tests on OVS with DPDK datapath, I'm experiencing
>> vswitchd terminating due to a buffer overrun in DPDK
>> fdset_event_dispatch().
>> The underlying root cause for this, I'm quite sure, lies in our modified
>> version
>> of OpenStack or tests cases where some vhostuser ports happen to be left
>> behind when VMs with more than 16 vNICs are deleted using "nova delete".
> How does "nova delete" bring down VMs? Graceful shutdown or another way? Are the ports explicitly deleted after the VM is deleted, with del-port?
Yes, I think it is a graceful shutdown done by the "nova delete". Ports 
are deleted
explicitly but not necessarily after the VM shutdown -- which gives us 
the problem
discussed here:

   http://openvswitch.org/pipermail/discuss/2016-February/020271.html

>> Thus, currently, the test leads to an accumulation of vhostuser sockets and
>> the termination is triggered when around >500 sockets are open at the
>> same time (accumulated sockets + sockets for 4 X VMs X 24 vNICs/VM).
>>
>> Neither OVS (2.4.0, with corrections from the 2.4 branch) or DPDK (2.1.0)
>> properly protects against a situation where file descriptor numbers exceeds
>> what can be held by the fd_set bit mask, as far as I can determine. The
>> question then is if this is a bug or if I'm missing something?
>> I'm well aware that this is an unlikely scenario but I do prefer graceful
>> handling if possible.
>>
>>    1) should there be a limit to the number of vhostuser ports that can be
>>        created in OVS or should the limit be dictated by DPDK only?
> I think we first need to figure out what that limit is/if there is one and then if it can be avoided.
I came to the conclusion that it is hard to figure out in OVS what a 
reasonable limit would
be. Instead DPDK should be the place.

>>    2) in rte_vhost_driver_register() there seems to be an implicit
>> assumption
>>        that the file descriptor number and the vserver_cnt are related,
>> which in
>>        general, does not hold. Can this check be improved?
> This would be a question for the DPDK Mailing List
Yes, will try to phrase a discussion topic there.
>>    3) should fdset_event_dispatch() protect against events from file
>> descriptors
>>        greater than or equal to FD_SETSIZE (=__FD_SETSIZE=1024)?
>>
> Likewise
After careful consideration, I think checking at the "source" is better, 
corresponding roughly
to 2) above.
>> What is your opinion?
>>
>>
>> ---
>>
>>
>> In short, what happens is roughly that a vhostuser port is registered in
>> (OVS, netdev-dpdk.c):
>>
>> static int
>> netdev_dpdk_vhost_user_construct(struct netdev *netdev_)
>> {
>>       struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>>       int err;
>>
>>       ovs_mutex_lock(&dpdk_mutex);
>>       /* Take the name of the vhost-user port and append it to the
>> location where
>>        * the socket is to be created, then register the socket.
>>        */
>>       snprintf(netdev->vhost_id, sizeof(netdev->vhost_id), "%s/%s",
>>               vhost_sock_dir, netdev_->name);
>>       err = rte_vhost_driver_register(netdev->vhost_id);
>> ....
>> }
>>
>>
>> The call to rte_vhost_driver_register() ends up in (DPDK,
>> vhost-net-user.c):
>>
>> int
>> rte_vhost_driver_register(const char *path)
>> {
>>       struct vhost_server *vserver;
>>
>>       pthread_mutex_lock(&g_vhost_server.server_mutex);
>>       if (ops == NULL)
>>           ops = get_virtio_net_callbacks();
>>
>>       if (g_vhost_server.vserver_cnt == MAX_VHOST_SERVER) {
>>           RTE_LOG(ERR, VHOST_CONFIG,
>>               "error: the number of servers reaches maximum\n");
>>           pthread_mutex_unlock(&g_vhost_server.server_mutex);
>>           return -1;
>>       }
>> ...
>> }
>>
>> Here a check is done against MAX_VHOST_SERVER (=1024),
>> the number of vhostuser ports are not yet at maximum number.
>> But still the process can have more file descriptors open
>> than it has vhostuser sockets.
>>
>>
>> Then we come to the actual failure, where an event on a file
>> descriptor (fd=1024) leads to a memory access outside of the
>> fd_set bit mask, at (DPDK, fd_man.c, FD_ISSET line shown by "-->"):
>>
>> void
>> fdset_event_dispatch(struct fdset *pfdset)
>> {
>>       fd_set rfds, wfds;
>>       int i, maxfds;
>>       struct fdentry *pfdentry;
>>       int num = MAX_FDS;
>>       fd_cb rcb, wcb;
>>       void *dat;
>>       int fd;
>>       int remove1, remove2;
>>       int ret;
>>
>>       if (pfdset == NULL)
>>           return;
>>
>>       while (1) {
>>           struct timeval tv;
>>           tv.tv_sec = 1;
>>           tv.tv_usec = 0;
>>           FD_ZERO(&rfds);
>>           FD_ZERO(&wfds);
>>           pthread_mutex_lock(&pfdset->fd_mutex);
>>
>>           maxfds = fdset_fill(&rfds, &wfds, pfdset);
>>
>>           pthread_mutex_unlock(&pfdset->fd_mutex);
>>
>>           /*
>>            * When select is blocked, other threads might unregister
>>            * listenfds from and register new listenfds into fdset.
>>            * When select returns, the entries for listenfds in the fdset
>>            * might have been updated. It is ok if there is unwanted call
>>            * for new listenfds.
>>            */
>>           ret = select(maxfds + 1, &rfds, &wfds, NULL, &tv);
>>           if (ret <= 0)
>>               continue;
>>
>>           for (i = 0; i < num; i++) {
>>               remove1 = remove2 = 0;
>>               pthread_mutex_lock(&pfdset->fd_mutex);
>>               pfdentry = &pfdset->fd[i];
>>               fd = pfdentry->fd;
>>               rcb = pfdentry->rcb;
>>               wcb = pfdentry->wcb;
>>               dat = pfdentry->dat;
>>               pfdentry->busy = 1;
>>               pthread_mutex_unlock(&pfdset->fd_mutex);
>>
>> ---->      if (fd >= 0 && FD_ISSET(fd, &rfds) && rcb)
>>                   rcb(fd, dat, &remove1);
>>
>>
>>               if (fd >= 0 && FD_ISSET(fd, &wfds) && wcb)
>>                   wcb(fd, dat, &remove2);
>> ...
>>           }
>>       }
>> }
>>
>>
>> Regards,
>>
>> Patrik
>> _______________________________________________
>> discuss mailing list
>> discuss at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/discuss




More information about the discuss mailing list