[ovs-dev] [PATCH 1/9] netdev: Extend rx_recv to pass multiple packets.

Jarno Rajahalme jrajahalme at nicira.com
Wed Mar 19 15:44:37 UTC 2014


On Mar 18, 2014, at 8:19 PM, Pravin Shelar <pshelar at nicira.com> wrote:

> On Tue, Mar 18, 2014 at 3:56 PM, Jarno Rajahalme <jrajahalme at nicira.com> wrote:
>> 
>> On Mar 18, 2014, at 1:53 PM, Pravin <pshelar at nicira.com> wrote:
>> 
>>> DPDK can receive multiple packets but current netdev API does
>>> not allow that.  Following patch allows dpif-netdev receive batch
>>> of packet in a rx_recv() call for any netdev port.  This will be
>>> used by dpdk-netdev.
>>> 
>>> Signed-off-by: Pravin B Shelar <pshelar at nicira.com>
>>> ---
>>> lib/automake.mk       |    1 +
>>> lib/dpif-netdev.c     |   51 ++++++++++++++++++-------------------------------
>>> lib/dpif-netdev.h     |   40 ++++++++++++++++++++++++++++++++++++++
>>> lib/netdev-dummy.c    |   26 +++++++++----------------
>>> lib/netdev-linux.c    |   20 ++++++++++++++++---
>>> lib/netdev-provider.h |   24 +++++------------------
>>> lib/netdev.c          |   24 +++++------------------
>>> lib/netdev.h          |    2 +-
>>> lib/packets.c         |    9 +++++++++
>>> lib/packets.h         |    1 +
>>> 10 files changed, 107 insertions(+), 91 deletions(-)
>>> create mode 100644 lib/dpif-netdev.h
>>> 
>>> diff --git a/lib/automake.mk b/lib/automake.mk
>>> index e22165c..832b7f9 100644
>>> --- a/lib/automake.mk
>>> +++ b/lib/automake.mk
>>> @@ -52,6 +52,7 @@ lib_libopenvswitch_la_SOURCES = \
>>>      lib/dhparams.h \
>>>      lib/dirs.h \
>>>      lib/dpif-netdev.c \
>>> +     lib/dpif-netdev.h \
>>>      lib/dpif-provider.h \
>>>      lib/dpif.c \
>>>      lib/dpif.h \
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index 3bbfd2a..d78fb25 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -34,6 +34,7 @@
>>> #include "classifier.h"
>>> #include "csum.h"
>>> #include "dpif.h"
>>> +#include "dpif-netdev.h"
>>> #include "dpif-provider.h"
>>> #include "dummy.h"
>>> #include "dynamic-string.h"
>>> @@ -68,10 +69,6 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
>>> /* Configuration parameters. */
>>> enum { MAX_FLOWS = 65536 };     /* Maximum number of flows in flow table. */
>>> 
>>> -/* Enough headroom to add a vlan tag, plus an extra 2 bytes to allow IP
>>> - * headers to be aligned on a 4-byte boundary.  */
>>> -enum { DP_NETDEV_HEADROOM = 2 + VLAN_HEADER_LEN };
>>> -
>>> /* Queues. */
>>> enum { N_QUEUES = 2 };          /* Number of queues for dpif_recv(). */
>>> enum { MAX_QUEUE_LEN = 128 };   /* Maximum number of packets per queue. */
>>> @@ -1443,6 +1440,7 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute)
>>> {
>>>    struct dp_netdev *dp = get_dp_netdev(dpif);
>>>    struct pkt_metadata *md = &execute->md;
>>> +    struct ofpbuf *packet;
>>>    struct flow key;
>>> 
>>>    if (execute->packet->size < ETH_HEADER_LEN ||
>>> @@ -1453,8 +1451,9 @@
> (struct dpif *dpif, struct dpif_execute *execute)
>>>    /* Extract flow key. */
>>>    flow_extract(execute->packet, md, &key);
>>> 
>>> +    packet = ofpbuf_clone(execute->packet);
>>>    ovs_rwlock_rdlock(&dp->port_rwlock);
>>> -    dp_netdev_execute_actions(dp, &key, execute->packet, md, execute->actions,
>>> +    dp_netdev_execute_actions(dp, &key, packet, md, execute->actions,
>>>                              execute->actions_len);
>>>    ovs_rwlock_unlock(&dp->port_rwlock);
>>> 
>> 
>> It is not clear to me why the packet should be cloned here, especially since you are already copying the data in the userspace action. Also, it seems this will leak memory in cases where there is no OUTPUT or USERSPACE action, as the dp_execute_action() will not be called in those cases (e.g., a drop flow).
>> 
>> 
> In Model I am following DKDP dpdk-recv generate ofpbuf and send consumes it.
> That means dpdk-send calls ofpbuf_delete() on every buffer dpif-netdev
> sends to it.
> Here in dp_execute_action()  I need to clone ofpbuf so that dpdk-send
> can release it. execute->packet passed can not freed by
> ofpbuf_delete().
> I missed drop case, I will fix the code.
> 

I may be wrong but it seems to me that this generates unnecessary memory allocations/frees for the ofpbufs themselves. On recv interface you could use an array ofpbufs instead of pointers, and have the send to unknit instead of delete. Not using additional indirection in the recv array may also be more cache friendly.

> 
>>> @@ -1586,12 +1585,10 @@ dp_forwarder_main(void *f_)
>>> {
>>>    struct dp_forwarder *f = f_;
>>>    struct dp_netdev *dp = f->dp;
>>> -    struct ofpbuf packet;
>>> 
>>>    f->name = xasprintf("forwarder_%u", ovsthread_id_self());
>>>    set_subprogram_name("%s", f->name);
>>> 
>>> -    ofpbuf_init(&packet, 0);
>>>    while (!latch_is_set(&dp->exit_latch)) {
>>>        bool received_anything;
>>>        int i;
>>> @@ -1605,25 +1602,19 @@ dp_forwarder_main(void *f_)
>>>                if (port->rx
>>>                    && port->node.hash >= f->min_hash
>>>                    && port->node.hash <= f->max_hash) {
>>> -                    int buf_size;
>>> +                    struct ofpbuf *packets[MAX_RX_BATCH];
>>> +                    int count;
>>>                    int error;
>>> -                    int mtu;
>>> -
>>> -                    if (netdev_get_mtu(port->netdev, &mtu)) {
>>> -                        mtu = ETH_PAYLOAD_MAX;
>>> -                    }
>>> -                    buf_size = DP_NETDEV_HEADROOM + VLAN_ETH_HEADER_LEN + mtu;
>>> -
>>> -                    ofpbuf_clear(&packet);
>>> -                    ofpbuf_reserve_with_tailroom(&packet, DP_NETDEV_HEADROOM,
>>> -                                                 buf_size);
>>> 
>>> -                    error = netdev_rx_recv(port->rx, &packet);
>>> +                    error = netdev_rx_recv(port->rx, packets, &count);
>>>                    if (!error) {
>>> +                        int i;
>>>                        struct pkt_metadata md
>>>                            = PKT_METADATA_INITIALIZER(port->port_no);
>>> 
>>> -                        dp_netdev_port_input(dp, &packet, &md);
>>> +                        for (i = 0; i < count; i++) {
>>> +                            dp_netdev_port_input(dp, packets[i], &md);
>>> +                        }
>>>                        received_anything = true;
>>>                    } else if (error != EAGAIN && error != EOPNOTSUPP) {
>>>                        static struct vlog_rate_limit rl
>>> @@ -1659,7 +1650,6 @@ dp_forwarder_main(void *f_)
>>> 
>>>        poll_block();
>>>    }
>>> -    ofpbuf_uninit(&packet);
>>> 
>>>    free(f->name);
>>> 
>>> @@ -1741,6 +1731,7 @@ dp_netdev_port_input(struct dp_netdev *dp, struct ofpbuf *packet,
>>>    } else {
>>>        ovsthread_counter_inc(dp->n_missed, 1);
>>>        dp_netdev_output_userspace(dp, packet, DPIF_UC_MISS, &key, NULL);
>>> +        ofpbuf_delete(packet);
>>>    }
>> 
>> Maybe this should be done in both cases (see below).
>> 
> I want dpdk-send release buffer, therefore I will fix drop case to fix
> the issue.
> 
>>> }
>>> 
>>> @@ -1767,6 +1758,7 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *packet,
>>>        if (userdata) {
>>>            buf_size += NLA_ALIGN(userdata->nla_len);
>>>        }
>>> +        buf_size += packet->size;
>>>        ofpbuf_init(buf, buf_size);
>>> 
>>>        /* Put ODP flow. */
>>> @@ -1780,10 +1772,8 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *packet,
>>>                                          NLA_ALIGN(userdata->nla_len));
>>>        }
>>> 
>>> -        /* Steal packet data. */
>>> -        ovs_assert(packet->source == OFPBUF_MALLOC);
>>> -        upcall->packet = *packet;
>>> -        ofpbuf_use(packet, NULL, 0);
>>> +        upcall->packet.data = ofpbuf_put(buf, packet->data, packet->size);
>>> +        upcall->packet.size = packet->size;
>>> 
>> 
>> I do not see a reason in this patch why the packet data should be copied here. However, I can guess that the reason for it comes apparent with the later patches?
>> 
> 
> I would like to directly pass packet pointer, but that requires lot of
> changes due to struct dpif_upcall change. since this is slow path I am
> not worried about this case at this point.
> 

The existing code effectively passed the packet pointer, by swapping the contents of the upcall->packet and *packet. I assumed that the upcoming DPDK code would not allow for “DPDK memory” to be passed to the upcall as the rationale for this change. But if such restriction does not exist, then why this change?

>>>        seq_change(dp->queue_seq);
>>> 
>>> @@ -1825,15 +1815,8 @@ dp_execute_cb(void *aux_, struct ofpbuf *packet,
>>> 
>>>        userdata = nl_attr_find_nested(a, OVS_USERSPACE_ATTR_USERDATA);
>>> 
>>> -        /* Make a copy if we are not allowed to steal the packet's data. */
>>> -        if (!may_steal) {
>>> -            packet = ofpbuf_clone_with_headroom(packet, DP_NETDEV_HEADROOM);
>>> -        }
>>>        dp_netdev_output_userspace(aux->dp, packet, DPIF_UC_ACTION, aux->key,
>>>                                   userdata);
>>> -        if (!may_steal) {
>>> -            ofpbuf_uninit(packet);
>>> -        }
>>>        break;
>>>    }
>>>    case OVS_ACTION_ATTR_PUSH_VLAN:
>>> @@ -1846,6 +1829,10 @@ dp_execute_cb(void *aux_, struct ofpbuf *packet,
>>>    case __OVS_ACTION_ATTR_MAX:
>>>        OVS_NOT_REACHED();
>>>    }
>>> +
>>> +    if (may_steal) {
>>> +        ofpbuf_delete(packet);
>>> +    }
>> 
>> You can not rely on this to happen for all the packets. It seems you could delete this and the cloning in dpif_netdev_execute(), and have dp_port_input() do the delete also after execution?
>> 
> right, I need to fix cases like drop.
> As mentioned earlier I want dpdk-send delete the packet. But I see
> this has made code bit complicated I will see if it can be simplified.
> 

I’d like you to consider the “array of ofpbufs” I proposed above…

…

>>> static int
>>> -netdev_linux_rx_recv(struct netdev_rx *rx_, struct ofpbuf *buffer)
>>> +netdev_linux_rx_recv(struct netdev_rx *rx_, struct ofpbuf **packet, int *c)
>>> {
>>>    struct netdev_rx_linux *rx = netdev_rx_linux_cast(rx_);
>>> -    int retval;
>>> +    struct netdev *netdev = rx->up.netdev;
>>> +    struct ofpbuf *buffer;
>>> +    ssize_t retval;
>>> +    int mtu;
>>> +
>>> +    if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) {
>>> +        mtu = ETH_PAYLOAD_MAX;
>>> +    }
>>> +
>>> +    buffer = ofpbuf_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu, DP_NETDEV_HEADROOM);
>>> 
>>>    retval = (rx->is_tap
>>>              ? netdev_linux_rx_recv_tap(rx->fd, buffer)
>>> @@ -995,8 +1006,11 @@ netdev_linux_rx_recv(struct netdev_rx *rx_, struct ofpbuf *buffer)
>>>    if (retval && retval != EAGAIN && retval != EMSGSIZE) {
>>>        VLOG_WARN_RL(&rl, "error receiving Ethernet packet on %s: %s",
>>>                     ovs_strerror(errno), netdev_rx_get_name(rx_));
>>> +    } else {
>>> +        packet_set_size(buffer, buffer->size);
>>> +        packet[0] = buffer;
>>> +        *c = 1;
>> 
>> I don't see the buffer being released in all the other cases (when it's ownership does not transfer to the caller).
>> 
> You mean other than drop packet case?
> 

Yes, the code path where we have just allocated a new ofpbuf, and then call ovs_strerror() above.

  Jarno

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openvswitch.org/pipermail/ovs-dev/attachments/20140319/29a8dd31/attachment-0005.html>


More information about the dev mailing list