[ovs-dev] [PATCH v4 2/5] netdev-dpdk: Add netdev_dpdk_vhost_txq_flush function.

Bodireddy, Bhanuprakash bhanuprakash.bodireddy at intel.com
Fri Aug 11 13:11:09 UTC 2017


>On 09.08.2017 15:35, Bodireddy, Bhanuprakash wrote:
>>>>
>>>> +static int
>>>> +netdev_dpdk_vhost_tx_burst(struct netdev_dpdk *dev, int qid) {
>>>> +    struct dpdk_tx_queue *txq = &dev->tx_q[qid];
>>>> +    struct rte_mbuf **cur_pkts = (struct rte_mbuf
>>>> +**)txq->vhost_burst_pkts;
>>>> +
>>>> +    int tx_vid = netdev_dpdk_get_vid(dev);
>>>> +    int tx_qid = qid * VIRTIO_QNUM + VIRTIO_RXQ;
>>>> +    uint32_t sent = 0;
>>>> +    uint32_t retries = 0;
>>>> +    uint32_t sum, total_pkts;
>>>> +
>>>> +    total_pkts = sum = txq->vhost_pkt_cnt;
>>>> +    do {
>>>> +        uint32_t ret;
>>>> +        ret = rte_vhost_enqueue_burst(tx_vid, tx_qid,
>>>> + &cur_pkts[sent],
>>> sum);
>>>> +        if (OVS_UNLIKELY(!ret)) {
>>>> +            /* No packets enqueued - do not retry. */
>>>> +            break;
>>>> +        } else {
>>>> +            /* Packet have been sent. */
>>>> +            sent += ret;
>>>> +
>>>> +            /* 'sum' packet have to be retransmitted. */
>>>> +            sum -= ret;
>>>> +        }
>>>> +    } while (sum && (retries++ < VHOST_ENQ_RETRY_NUM));
>>>> +
>>>> +    for (int i = 0; i < total_pkts; i++) {
>>>> +        dp_packet_delete(txq->vhost_burst_pkts[i]);
>>>> +    }
>>>> +
>>>> +    /* Reset pkt count. */
>>>> +    txq->vhost_pkt_cnt = 0;
>>>> +
>>>> +    /* 'sum' refers to packets dropped. */
>>>> +    return sum;
>>>> +}
>>>> +
>>>> +/* Flush the txq if there are any packets available. */ static int
>>>> +netdev_dpdk_vhost_txq_flush(struct netdev *netdev, int qid,
>>>> +                            bool concurrent_txq OVS_UNUSED) {
>>>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>>> +    struct dpdk_tx_queue *txq;
>>>> +
>>>> +    qid = dev->tx_q[qid % netdev->n_txq].map;
>>>> +
>>>> +    /* The qid may be disabled in the guest and has been set to
>>>> +     * OVS_VHOST_QUEUE_DISABLED.
>>>> +     */
>>>> +    if (OVS_UNLIKELY(qid < 0)) {
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    txq = &dev->tx_q[qid];
>>>> +    /* Increment the drop count and free the memory. */
>>>> +    if (OVS_UNLIKELY(!is_vhost_running(dev) ||
>>>> +                     !(dev->flags & NETDEV_UP))) {
>>>> +
>>>> +        if (txq->vhost_pkt_cnt) {
>>>> +            rte_spinlock_lock(&dev->stats_lock);
>>>> +            dev->stats.tx_dropped+= txq->vhost_pkt_cnt;
>>>> +            rte_spinlock_unlock(&dev->stats_lock);
>>>> +
>>>> +            for (int i = 0; i < txq->vhost_pkt_cnt; i++) {
>>>> +                dp_packet_delete(txq->vhost_burst_pkts[i]);
>>>
>>> Spinlock (tx_lock) must be held here to avoid queue and mempool
>breakage.
>>
>> I think you are right. tx_lock might be acquired for freeing the packets.
>
>I think that 'vhost_pkt_cnt' reads and updates also should be protected to
>avoid races.

From the discussion in the thread https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337133.html,
We are going to acquire tx_lock for updating the map and flushing the queue inside vring_state_changed(). 

That triggers a deadlock in the  flushing function as we have already acquired the same lock in netdev_dpdk_vhost_txq_flush().
This is the same problem for freeing the memory and protecting the updates to vhost_pkt_cnt.

   if (OVS_LIKELY(txq->vhost_pkt_cnt)) {
         rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
        netdev_dpdk_vhost_tx_burst(dev, qid);
        rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
   }

As the problem is triggered when the guest queues are enabled/disabled, with a small race window where packets can get enqueued in to the queue just after the flush and before map value is updated in cb function(vring_state_changed()), how abt this?

Technically as the queues are disabled, there is no point in flushing the packets, so let's free the packets and set the txq->vhost_pkt_cnt in vring_state_changed() itself instead of calling flush().

vring_state_changed().
------------------------------------------------------
rte_spinlock_lock(&dev->tx_q[qid].tx_lock);

mapped_qid = dev->tx_q[qid].map;
 if (OVS_UNLIKELY(qid != mapped_qid)) {
        rte_spinlock_lock(&dev->tx_q[mapped_qid].tx_lock);
}

if (enable) {
                dev->tx_q[qid].map = qid;
  } else {
                struct dpdk_tx_queue *txq = &dev->tx_q[qid];
                if (txq->vhost_pkt_cnt) {
                    rte_spinlock_lock(&dev->stats_lock);
                    dev->stats.tx_dropped+= txq->vhost_pkt_cnt;
                    rte_spinlock_unlock(&dev->stats_lock);

                    for (int i = 0; i < txq->vhost_pkt_cnt; i++) {
                        dp_packet_delete(txq->vhost_burst_pkts[i]);
                    }
                    txq->vhost_pkt_cnt = 0;
                }

                dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED;
  }
-------------------------------------------------------------------------

Regards,
Bhanuprakash.

>
>> ---------------------------------------------------------------------------
>>     rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
>>     for (int i = 0; i < txq->vhost_pkt_cnt; i++) {
>>              dp_packet_delete(txq->vhost_burst_pkts[i]);
>>     }
>>     rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
>>
>> - Bhanuprakash
>>


More information about the dev mailing list