[ovs-dev] [RFC] dpif-netdev: only poll enabled vhost queues
Kevin Traynor
ktraynor at redhat.com
Tue Apr 9 14:49:12 UTC 2019
On 09/04/2019 12:41, Ilya Maximets wrote:
> On 09.04.2019 11:34, Ilya Maximets wrote:
>> On 08.04.2019 16:44, David Marchand wrote:
>>> Hello Ilya,
>>>
>>> On Mon, Apr 8, 2019 at 10:27 AM Ilya Maximets <i.maximets at samsung.com <mailto:i.maximets at samsung.com>> wrote:
>>>
>>> On 04.04.2019 22:49, David Marchand wrote:
>>> > We tried to lower the number of rebalances but we don't have a
>>> > satisfying solution at the moment, so this patch rebalances on each
>>> > update.
>>>
>>> Hi.
>>>
>>> Triggering the reconfiguration on each vring state change is a bad thing.
>>> This could be abused by the guest to break the host networking by infinite
>>> disabling/enabling queues. Each reconfiguration leads to removing ports
>>> from the PMD port caches and their reloads. On rescheduling all the ports
>>>
>>>
>>> I'd say the reconfiguration itself is not wanted here.
>>> Only rebalancing the queues would be enough.
>>
>> As you correctly mentioned in commit message where will be no real port
>> configuration changes.
>>
>> Under 'reconfiguration' I mean datapath reconfiguration and 'rebalancing'
>> is one of its stages.
>>
>>>
>>>
>>> could be moved to a different PMD threads resulting in EMC/SMC/dpcls
>>> invalidation and subsequent upcalls/packet reorderings.
>>>
>>>
>>> I agree that rebalancing does trigger EMC/SMC/dpcls invalidation when moving queues.
>>> However, EMC/SMC/dpcls are per pmd specific, where would we have packet reordering ?
>>
>> Rx queue could be scheduled to different PMD thread --> new packets will go to different
>> Tx queue. It's unlikely, but it will depend on device/driver which packets will be sent
>> first. The main issue here that it happens to other ports, not only to port we're trying
>> to reconfigure.
>>
>>>
>>>
>>>
>>> Same issues was discussed previously while looking at possibility of
>>> vhost-pmd integration (with some test results):
>>> https://mail.openvswitch.org/pipermail/ovs-dev/2016-August/320430.html
>>>
>>>
>>> Thanks for the link, I will test this.
>>>
>>>
>>>
>>> One more reference:
>>> 7f5f2bd0ce43 ("netdev-dpdk: Avoid reconfiguration on reconnection of same vhost device.")
>>>
>>>
>>> Yes, I saw this patch.
>>> Are we safe against guest drivers/applications that play with VIRTIO_NET_F_MQ, swapping it continuously ?
>>
>> Good point. I didn't test that, but it looks like we're not safe here.
>> Kernel and DPDK drivers has F_MQ enabled by default so it'll require
>> some changes/explicit disabling. But I agree that this could produce
>> issues if someone will do that.
>>
>> We could probably handle this using 'max seen qp_num' approach. But I'm
>> not a fan of this. Need to figure out how to do this correctly.
>>
>> In general, I think, that we should not allow cases where guest is able
>> to manipulate the host configuration.
>>
>>>
>>>
>>>
>>>
>>> Anyway, do you have some numbers of how much time PMD thread spends on polling
>>> disabled queues? What the performance improvement you're able to achieve by
>>> avoiding that?
>>>
>>>
>>> With a simple pvp setup of mine.
>>> 1c/2t poll two physical ports.
>>> 1c/2t poll four vhost ports with 16 queues each.
>>> Only one queue is enabled on each virtio device attached by the guest.
>>> The first two virtio devices are bound to the virtio kmod.
>>> The last two virtio devices are bound to vfio-pci and used to forward incoming traffic with testpmd.
>>>
>>> The forwarding zeroloss rate goes from 5.2Mpps (polling all 64 vhost queues) to 6.2Mpps (polling only the 4 enabled vhost queues).
>>
>> That's interesting. However, this doesn't look like a realistic scenario.
>> In practice you'll need much more PMD threads to handle so many queues.
>> If you'll add more threads, zeroloss test could show even worse results
>> if one of idle VMs will periodically change the number of queues. Periodic
>> latency spikes will cause queue overruns and subsequent packet drops on
>> hot Rx queues. This could be partially solved by allowing n_rxq to grow only.
>> However, I'd be happy to have different solution that will not hide number
>> of queues from the datapath.
>
> What do you think about something like this (not even compile tested):
> ---
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index bd9718824..0cf492a3b 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -591,6 +591,8 @@ struct polled_queue {
> struct dp_netdev_rxq *rxq;
> odp_port_t port_no;
> bool emc_enabled;
> + bool enabled;
> + uint64_t change_seq;
> };
>
> /* Contained by struct dp_netdev_pmd_thread's 'poll_list' member. */
> @@ -5375,6 +5377,9 @@ pmd_load_queues_and_ports(struct dp_netdev_pmd_thread *pmd,
> poll_list[i].rxq = poll->rxq;
> poll_list[i].port_no = poll->rxq->port->port_no;
> poll_list[i].emc_enabled = poll->rxq->port->emc_enabled;
> + poll_list[i].enabled = netdev_rxq_enabled(poll->rxq);
> + poll_list[i].change_seq =
> + netdev_get_change_seq(poll->rxq->port->netdev);
> i++;
> }
>
> @@ -5440,6 +5445,10 @@ reload:
>
> for (i = 0; i < poll_cnt; i++) {
>
> + if (!poll_list[i].enabled) {
> + continue;
> + }
> +
I had similar thought that doing something like this might be a good
compromise, but wonder about the possible delays in polling newly
enabled queues.
The state could just be checked directly here,
+ if (!netdev_rxq_enabled(poll_list[i].rxq)) {
+ continue;
+ }
or if it is too much cost, then only check for disabled to catch the
disable->enable case early. DPDK will handle enable->disable case so we
could avoid the check here and do a slower update when lc > 1024.
+ if (!poll_list[i].enabled) {
+ if (OVS_LIKELY(!netdev_rxq_enabled(poll_list[i].rxq)))
+ continue;
+ else {
+ poll_list[i].enabled = true;
+ }
+ }
of course if they cost almost as much as letting DPDK do the checks in
the first place, then they are not worth it.
> if (poll_list[i].emc_enabled) {
> atomic_read_relaxed(&pmd->dp->emc_insert_min,
> &pmd->ctx.emc_insert_min);
> @@ -5476,6 +5485,15 @@ reload:
> if (reload) {
> break;
> }
> +
> + for (i = 0; i < poll_cnt; i++) {
> + uint64_t current_seq =
> + netdev_get_change_seq(poll_list[i].rxq->port->netdev);
> + if (poll_list[i].change_seq != current_seq) {
> + poll_list[i].change_seq = current_seq;
> + poll_list[i].enabled = netdev_rxq_enabled(poll_list[i].rxq);
> + }
> + }
> }
> pmd_perf_end_iteration(s, rx_packets, tx_packets,
> pmd_perf_metrics_enabled(pmd));
> ---
> + replacing of your 'netdev_request_reconfigure(&dev->up)' inside 'vring_state_changed'
> with 'netdev_change_seq_changed(&dev->up)'?
>
> This should help to not reconfigure/reschedule everything and not waste much time on
> polling. Also this will give ability to faster react on queue state changes.
>
> After that we'll be able to fix reconfiguration on F_MQ manipulations with something
> simple like that:
> ---
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 91af377e8..1937561cc 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -3492,8 +3528,8 @@ new_device(int vid)
> newnode = dev->socket_id;
> }
>
> - if (dev->requested_n_txq != qp_num
> - || dev->requested_n_rxq != qp_num
> + if (dev->requested_n_txq < qp_num
> + || dev->requested_n_rxq < qp_num
> || dev->requested_socket_id != newnode) {
> dev->requested_socket_id = newnode;
> dev->requested_n_rxq = qp_num;
> ---
> Decreasing of 'qp_num' will not cause big issues because we'll not poll
> disabled queues.
>
> And there could be one separate change to drop the requested values if socket connection
> closed (I hope that guest is not able to control that):
> ---
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 91af377e8..1937561cc 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -185,13 +185,16 @@ static const struct rte_eth_conf port_conf = {
> */
> static int new_device(int vid);
> static void destroy_device(int vid);
> +static void destroy_connection(int vid);
> static int vring_state_changed(int vid, uint16_t queue_id, int enable);
> static const struct vhost_device_ops virtio_net_device_ops =
> {
> .new_device = new_device,
> .destroy_device = destroy_device,
> .vring_state_changed = vring_state_changed,
> - .features_changed = NULL
> + .features_changed = NULL,
> + .new_connection = NULL,
> + .destroy_connection = destroy_connection
> };
>
> enum { DPDK_RING_SIZE = 256 };
> @@ -3462,6 +3465,39 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev)
> free(enabled_queues);
> }
>
> +/*
> + * vhost-user socket connection is closed.
> + */
> +static void
> +destroy_connection(int vid)
> +{
> + struct netdev_dpdk *dev;
> + char ifname[IF_NAME_SZ];
> +
> + rte_vhost_get_ifname(vid, ifname, sizeof ifname);
> +
> + ovs_mutex_lock(&dpdk_mutex);
> + LIST_FOR_EACH(dev, list_node, &dpdk_list) {
> + ovs_mutex_lock(&dev->mutex);
> + if (nullable_string_is_equal(ifname, dev->vhost_id)) {
> + uint32_t qp_num = NR_QUEUE;
> +
> + /* Restore the number of queue pairs to default. */
> + if (dev->requested_n_txq != qp_num
> + || dev->requested_n_rxq != qp_num) {
> + dev->requested_n_rxq = qp_num;
> + dev->requested_n_txq = qp_num;
> + netdev_request_reconfigure(&dev->up);
> + }
> + ovs_mutex_unlock(&dev->mutex);
> + break;
> + }
> + ovs_mutex_unlock(&dev->mutex);
> + }
> + ovs_mutex_unlock(&dpdk_mutex);
> +}
> +
> +
> /*
> * A new virtio-net device is added to a vhost port.
> */
> ---
>
> Best regards, Ilya Maximets.
>
More information about the dev
mailing list