[ovs-dev] [PATCH v5 1/1] netdev-dpdk: add dpdk vhost ports

Pravin Shelar pshelar at nicira.com
Mon Dec 22 18:01:54 UTC 2014


On Sun, Dec 21, 2014 at 9:31 AM, Traynor, Kevin <kevin.traynor at intel.com> wrote:
> Hi,
>
> I'd like to get some feedback about using RCU on the virtio_dev structure as per the comment below. Comments inline.
>
> Thanks,
> Kevin.
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces at openvswitch.org] On Behalf Of Pravin Shelar
>> Sent: Tuesday, October 21, 2014 12:00 AM
>> To: Tahhan, Maryam
>> Cc: dev at openvswitch.org
>> Subject: Re: [ovs-dev] [PATCH v5 1/1] netdev-dpdk: add dpdk vhost ports
>>
>> On Mon, Sep 29, 2014 at 10:10 AM, maryam.tahhan <maryam.tahhan at intel.com> wrote:
>> > This patch implements the vhost-net offload API.  It adds support for
>> > a new port type to userspace datapath called dpdkvhost. This allows KVM
>> > (QEMU) to offload the servicing of virtio-net devices to it's associated
>> > dpdkvhost port. Instructions for use are in INSTALL.DPDK.
>> >
>> > This has been tested on Intel multi-core platforms and with clients that
>> > have virtio-net interfaces.
>> >
>> > ver 5:
>> >   - rebased against latest master
>> > ver 4:
>> >   - added eventfd_link.h and eventfd_link.c to EXTRA_DIST in utilities/automake.mk
>> >   - rebased with master to work with DPDK 1.7
>> > ver 3:
>> >   - rebased with master
>> > ver 2:
>> >   - rebased with master
>> >
>> > Signed-off-by: maryam.tahhan <maryam.tahhan at intel.com>
>> > ---
>> Thanks for the patch, I have following comments.
>> - NON_PMD_THREAD_TX_QUEUE is not used in the patch.
>> - why do we need to limit MAX_BASENAME_SZ to 12
>> - dev_basename[] should be configurable at run time or it should be
>> vswitchd parameter.
>> - any reason for liming MAX_PKT_BURST to 32?
>> - netdev_dpdk->type should be a enum type with DPDK and VHOST members.
>> I am not sure if you really need type member in netdev_dpdk. you can
>> just define separate device ops for DPDK and VHOST of you was to do
>> special processing for each device.
>> - there is no check on gpa_to_vva() return value.
>> - in function virtio_dev_rx() variables can be defined in local block
>> rather than in function block. this simplifies code reading.
>> - in function virtio_dev_rx() virtio_hdr is always zero, so it can be
>> static variable rather than on stack.
>> - in function virtio_dev_rx() virtio_hdr should be copied before packet data.
>> - in function virtio_dev_rx() can we prefetch next vq->desc before
>> coping packet data?
>> - ovs-mutex is bit heavy weight, can you use rte-spin-lock
>> - can you reverse name of virtio_dev_rx(), virtio_dev_tx(), since this
>> is ovs-netdev code we could use name same as ovs context.
>> - virtio_dev_tx() is always called from PMD thread, so no need to the check.
>> - there is no synchronization for netdev_dpdk->rx_count and tx_count.
>> - destroy_device() can use RCU based mechanism rather than polling packet count.
>> - Currently new_device() assigns post in linear fashion. How does
>> handle multiple bridge case where vhost port might belong to available
>> port on different bridge?
>> - virtio_net_device_ops ops should be set before registering cuse device.
>> - destroy function sets virtio_dev to NULL, so no need to check for
>> remove flag, ofcourse you need to RCUfy it first.
> (Additional comment later added): device destroy can set p->virtio_dev to NULL and call
> ovsrcu_postpone(). All read/write code path can check for NULL before
> using virtio_dev. That should be enough, I do not see any need to
> check tx_count or rx_count.
>
>
> I'm seeing two issues with using RCU for the virtio_dev...
>
> 1. The memory of the virtio_dev is not allocated in netdev-dpdk, but is passed in as part of the new_device() callback from the DPDK vhost library. As soon as the destroy_device() fn returns, I don't think we can have any guarantees about the state of this memory. So for another thread to continue to use it, or for a postpone callback to modify it after the destroy_device() fn has returned would be unsafe. In any of the other examples I see of RCU set/postpone, it looks like the memory has been allocated locally so we can guarantee the integrity of it until all threads have quiesced, at which time we can free it in the postpone callback.
>
> 2. The cuse thread loop is in the fuse library and only occasionally calls the OVS code for new_device()/destroy_device(), so we can't make it periodically quiesce - we can only call ovsrcu_quiesce_start() at the very start of the thread. When I put a call to ovsrcu_postpone() in destroy_device(), I see that the rcu thread is continually waiting for the cuse thread to quiesce and the postpone callback does not get called. I'm thinking that if the ovsrcu_postpone() fn is used in destroy_device() then the cuse thread would have to have a periodic call to quiesce which I don't think is possible?
>
> I think we may have no option but to block in the destroy_device() fn until we know that virtio_dev is not being used anymore by other threads. We can stop threads initiating more operations on it by checking for NULL or REMOVE flag and we can check that in-progress operations are finished by the tx/rx counters.
>
> Let me know if I've misunderstood and there is a way to use the RCUs here.
>

Can you use ovsrcu_synchronize() in destroy_device() to prevent the race?

>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev



More information about the dev mailing list