[ovs-dev] OVS Netlink zerocopy vs Xen netback zerocopy
zoltan.kiss at citrix.com
Wed Feb 19 15:50:43 UTC 2014
Currently I'm working on a patchset which reintroduces grant mapping
into netback. We used it before Linux Xen bits were upstreamed, but we
had to change to grant copy as the original solution were fundamentally
not upstreamable. But the advantage would be huge, as we could replace
copy guest pages by Xen to mapping guest pages to Dom0.
Parallel to this I'm working on a grant mapping optimization, which
makes it possible to avoid m2p_override for grant mapped pages. It
causes lock contention and we don't need it if the pages doesn't go to
userspace. This could be a safe assumption, as those pages would stay in
kernel space while switched by OVS, and if they end up on the local
port, delivered to Dom0 IP stack, deliver_skb will call skb_orphan_frags
which swaps out those foreign (=grant mapped from guest) pages by local
copies and notify netback through a callback that it can give back the
pages to the guest.
And after that bit long introduction here comes the main question: OVS
recently introduced Netlink zerocopy, which by my understanding means
that Netlink messages from kernel are not copied but mapped to
userspace. And such message can contain a whole packet if it haven't
matched any flows in the kernel, or the flow action said so. As far as I
saw skb_zerocopy will clone the frags from the real packet skb to the
Netlink skb. Note, the linear buffer is local memory in netback case as
well, we copy the beginning of the packet (max 128 bytes) there, only
the pages on frags are foreign ones.
I don't know the internals of Netlink that much, how a packet is
forwarded up in this case, but that concerns me, as if the pages on the
skb_shinfo(skb)->frags array are still the foreign ones, and userspace
wants to touch that data, we are in trouble.
If this is the scenario, I think the best would be to call
skb_orphan_frags before skb_zerocopy in queue_userspace_packet, so the
frags will become local. Fortunately this is a corner case, as it
shouldn't happen very often that the kernel sends up packets bigger than
What do you think about the solution in the last paragraph? Or do we
need it at all?
More information about the dev