[ovs-dev] [PATCH] Extends the existing mirror configuration parameters

Wang, Liang-min liang-min.wang at intel.com
Wed May 19 14:17:33 UTC 2021


> -----Original Message-----
> From: Maxime Coquelin <maxime.coquelin at redhat.com>
> Sent: Wednesday, May 19, 2021 8:50 AM
> To: Wang, Liang-min <liang-min.wang at intel.com>; Miskell, Timothy
> <timothy.miskell at intel.com>; dev at openvswitch.org
> Subject: Re: [PATCH] Extends the existing mirror configuration parameters
> 
> 
> 
> On 5/19/21 1:53 PM, Wang, Liang-min wrote:
> >> -----Original Message-----
> >> From: Maxime Coquelin <maxime.coquelin at redhat.com>
> >> Sent: Wednesday, May 19, 2021 3:56 AM
> >> To: Wang, Liang-min <liang-min.wang at intel.com>; Miskell, Timothy
> >> <timothy.miskell at intel.com>; dev at openvswitch.org
> >> Subject: Re: [PATCH] Extends the existing mirror configuration
> parameters
> >>
> >> Hi Liang-min,
> >>
> >> When replying inline, please do not prefix with ">>" as it is handled as
> >> quoted text. There is no need to prefix.
> >>
> >> On 5/18/21 8:00 PM, Wang, Liang-min wrote:
> >>>> -----Original Message-----
> >>>> From: Maxime Coquelin <maxime.coquelin at redhat.com>
> >>>> Sent: Tuesday, May 18, 2021 12:15 PM
> >>>> To: Miskell, Timothy <timothy.miskell at intel.com>;
> dev at openvswitch.org
> >>>> Cc: Wang, Liang-min <liang-min.wang at intel.com>
> >>>> Subject: Re: [PATCH] Extends the existing mirror configuration
> >> parameters
> >>>>
> >>>> Hi Timothy, Liang-min,
> >>>>
> >>>> Thanks for rebasing the patch.
> >>>> A list of delta against the first RFC could help the reviewers.
> >>>> I notice one change in the right direction is the conversion to Vhost
> >>>> API datapath instead of Vhost PMD.
> >>>>
> >>>>> In this patch we support both Vhost API and Vhost PMD because OVS
> >> supports both
> >>>>> VhostUser and Vdev ports.
> >>>>>
> >>>> Also, I would suggest to have the patch split in several incremental
> >>>> patches to ease the review.
> >>>>
> >>>>> Thank you for suggestion. We will provide incremental patches on
> next
> >> submission
> >>>>>
> >>>> On 5/10/21 6:00 PM, Timothy Miskell wrote:
> >>>>> From: Liang-min Wang <liang-min.wang at intel.com>
> >>>>>
> >>>>> The following parameters are added:
> >>>>>  - mirror-offload: to turn on/off mirror offloading.
> >>>>>  - output-port-name: specify a port, using name string, that is on a
> >> different
> >>>>>    bridge
> >>>>>  - output-src-vlan: output port vlan for each select-src-port.
> >>>>>  - output-dst-vlan: output port vlan for each select-dst-port.
> >>>>>  - flow-src-mac: use src mac address of each select-dst-port for the
> >> header
> >>>>>    scan.
> >>>>>  - flow-dst-mac: use dst mac address of each select-src-port for the
> >> header
> >>>>>    scan.
> >>>>>  - mirror-tunnel-addr: BDF string of the tunnel device.
> >>>>>
> >>>>> ovs-vsctl test change because new mirroring parameters are
> introduced
> >> in
> >>>> this patch
> >>>>
> >>>> It would help to provide examples of usage of these new parameters.
> >>>>
> >>>>> Will add examples in the new patches
> >>>>>
> >>>>> Create a defer procedure call thread to handle all mirror offload
> >> requests.
> >>>>> This is a light-weight thread which remains in sleep-state when there
> is
> >> no
> >>>> new request.
> >>>>> This is created between ovs-vsctl and mirror offloading back end
> >>>>>
> >>>>> Implementing DPDK tx-burst (VIRTIO ingress traffic
> >>>>> mirror) and rx-burst (VIRTIO egress traffic mirror) callbacks.
> >>>>> Each callback  functions implement the following tasks:
> >>>>>  1. Enable per-packet VLAN insertion
> >>>>>    - for port mirroring, all packets are enabled per-packet VLAN
> insertion.
> >>>>>    - for flow mirroring, only packet header matches the required mac
> >>>> address
> >>>>>      are enabled.
> >>>>>  2. Sending the packets to the specified transport port (output-port in
> >>>>>     mirror offload configuration)
> >>>>>    - for port mirroring, all packets are sent to the transport port.
> >>>>>    - for flow mirroring, only matched packets are sent.
> >>>>>  3. Restore each packet attributes (remove DPDK per-packet offload
> >> flag)
> >>>>
> >>>> I will for sure have more questions later, but please find a few
> >>>> comments/questions below:
> >>>>
> >>>>> Signed-off-by: Liang-min Wang <liang-min.wang at intel.com>
> >>>>> Tested-by: Timothy Miskell <Timothy.Miskell at intel.com>
> >>>>> Suggested-by: Munish Mehan <mm6021 at att.com>
> >>>>> ---
> >>>>>  lib/automake.mk            |   2 +
> >>>>>  lib/netdev-dpdk-mirror.c   | 516
> >>>> +++++++++++++++++++++++++++++++++++++
> >>>>>  lib/netdev-dpdk-mirror.h   |  83 ++++++
> >>>>>  lib/netdev-dpdk.c          | 397 ++++++++++++++++++++++++++++
> >>>>>  lib/netdev-provider.h      |  16 ++
> >>>>>  lib/netdev.c               | 386 +++++++++++++++++++++++++++
> >>>>>  lib/netdev.h               |  16 ++
> >>>>>  tests/ovs-vsctl.at         |   2 +
> >>>>>  vswitchd/bridge.c          | 271 ++++++++++++++++++-
> >>>>>  vswitchd/vswitch.ovsschema |  24 +-
> >>>>>  vswitchd/vswitch.xml       |  50 ++++
> >>>>>  11 files changed, 1759 insertions(+), 4 deletions(-)
> >>>>>  create mode 100644 lib/netdev-dpdk-mirror.c
> >>>>>  create mode 100644 lib/netdev-dpdk-mirror.h
> >>>>>
> >>
> >> ...
> >>
> >>>>> +
> >>>>> +/* port/flow mirror traffic processors */
> >>>>> +static inline uint16_t
> >>>>> +netdev_custom_mirror_offload_cb(uint16_t qidx, struct rte_mbuf
> >>>> **pkts,
> >>>>> +    uint16_t nb_pkts, void *user_params)
> >>>>> +{
> >>>>> +    struct mirror_param *data = user_params;
> >>>>> +    uint16_t i, dst_qidx, match_count = 0;
> >>>>> +    uint16_t pkt_trans;
> >>>>> +    uint16_t dst_port_id = data->dst_port_id;
> >>>>> +    uint16_t dst_vlan_id = data->dst_vlan_id;
> >>>>> +    struct rte_mbuf **pkt_buf = &data->pkt_buf[qidx * data-
> >>>>> max_burst_size];
> >>>>> +
> >>>>> +    if (nb_pkts == 0) {
> >>>>> +        return 0;
> >>>>> +    }
> >>>>> +
> >>>>> +    if (nb_pkts > data->max_burst_size) {
> >>>>> +        VLOG_ERR("Per-flow batch size, %d, exceeds maximum limit\n",
> >>>> nb_pkts);
> >>>>> +        return 0;
> >>>>> +    }
> >>>>> +
> >>>>> +    for (i = 0; i < nb_pkts; i++) {
> >>>>> +        if (data->custom_scan(pkts[i], user_params)) {
> >>>>> +            pkt_buf[match_count] = pkts[i];
> >>>>> +            pkt_buf[match_count]->ol_flags |= PKT_TX_VLAN_PKT;
> >>>>
> >>>> Does it work if the packet already has a VLAN inserted?
> >>>>
> >>>>> Good catch. The design is based upon no VLAN insertion offloading is
> >> applied on source traffic.
> >>
> >> That is a significant limitation, I haven not seen it documented in the
> >> series. Also, the IEEE paper mentions using QinQ when the packet has
> >> already a VLAN inserted, but it is not in the series. Is there a reason
> >> QinQ is not possible?
> >>
> >> IIUC, that means the TAP VM would see traffic having no VLAN, whereas
> it
> >> has in reality?
> >>
> >
> > It does support 802.1q (single VLAN). I mis-understood your last email.
> > This design assumes there is no PKT_TX_VLAN_PKT flag set on the source
> mbuf.
> 
> I understood that assumption, and it seems OVS does not use VLAN offload
> currently so it should be OK (even though I would suggest adding a check
> that the flag isn't already set not to mess up with the regular traffic
> if one day it is added).
> 
> My question was:
> If you packet already has a VLAN tag *already* inserted (i.e. not to be
> inserted), won't the mirrored packet lose that information? I don't
> expect QinQ will be setup by the VF driver as
> 

With this design the mirrored traffic won't lose VLAN tag. 
For single VLAN packets, this design will add an outer VLAN tag.
The outer VLAN can be stripped if the destination VNF (vProbe) turn on VLAN stripping.
The existing DPDK library does support QinQ offload. BTW, this change is not visible in
source VNF (no need to change source VNF setting) because VLAN insertion
happens at mirror tunnel device.

> > This assumption is based upon that VIRTIO spec. doesn't support
> > per-packet VLAN insertion offloading.
> 
> I am not sure to understand this comment, but for Vhost PMD we do the
> VLAN insertion in SW:
> https://elixir.bootlin.com/dpdk/latest/source/drivers/net/vhost/rte_eth_vh
> ost.c#L447
> 

> I think this VLAN insertion in Vhost PMD is problematic twice with this
> series:
> 1. The offloaded VLAN insertion would be lost
> 2. rte_vlan_insert() will fail as it needs the mbuf refcount to be 1. It
> means the packet would be mirrored but would never reach its expected
> destination as dropped by the Vhost PMD.
> 

I see your pointes now. I agree with your assessment. We could either 
1.  support no Vhost PMD, or 2. check the offload flag at the run-time.

> >
> >>>>> +            pkt_buf[match_count]->vlan_tci = dst_vlan_id;
> >>>>> +            rte_mbuf_refcnt_update(pkt_buf[match_count], 1);
> >>>>
> >>>>
> >>>>
> >>>>> +            match_count++;
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +    dst_qidx = (data->n_dst_queue > qidx)?qidx:(data->n_dst_queue
> -
> >> 1);
> >>>>
> >>>> Wouldn't it scale better with:
> >>>> dst_qidx = qidx % data->n_dst_queue
> >>>> ?
> >>>>
> >>>>> We tried to avoid using "%" operator. We could add "unlikely" and the
> >> suggested "%" to make improvement
> >>
> >> Not sure adding 'unlikely' is really necessary. The cost of the modulo
> >> operation is nothing compared to all we do in this path.
> >>
> >>>>>
> >>>>> +
> >>>>> +    rte_spinlock_lock(&data->locks[dst_qidx]);
> >>>>> +    pkt_trans = rte_eth_tx_burst(dst_port_id, dst_qidx, pkt_buf,
> >>>> match_count);
> >>>>> +    rte_spinlock_unlock(&data->locks[dst_qidx]);
> >>>>> +
> >>>>> +    for (i = 0; i < match_count; i++) {
> >>>>> +        pkt_buf[i]->ol_flags &= ~PKT_TX_VLAN_PKT;
> >>>>> +    }
> >>>>
> >>>> In order to further reduce the performance impact of mirroring, have
> you
> >>>> envisaged to offload it to dedicated PMD threads?
> >>>>
> >>>>> The mirror-tunnel design is a comprised approach between hardware
> >> TAP and software TAP.
> >>>>> The tunnel itself is designed to have very little impact on source traffic
> >> processing core. From
> >>>>> our benchmark on SR-IOV, L2 forwarding, mirroring, we only observed
> >> 10-20% impact on 64-byte packets, and
> >>>>> we did not observe impact when running traffic with packet size with
> >> 128-byte or above.
> >>
> >> The cover letter mentions 20-30% for 64B packets, and the IEEE paper
> >> seems to indicate a significant impact up to 512B packets (~25%?).
> >>
> >> Maybe the workload is different in these tests, that could explain why
> >> you don't see an impact starting 128B?
> >>
> >
> > I believe the 20-30% drop is for VIRTIO port mirroring not SR-IOV mirroring.
> 
> What you mean by SRIOV mirroring is ingress traffic on one VF is
> mirrored to another VF? Cannot it be done directly in HW?
> 

The SR-IOV performance number provided here because of prior ask for additional
PMD core for scaling.

> Regarding Virtio port mirroring, we can see in the benchmark results you
> presented at OVSCon 20 a significant impact for ingress mirroring for
> all packet size with your VLAN offload solution. I agree this is much
> better than with current mirroring solution, but there should be room
> for improvement.
> 
> > The 512B packet (~25%?) is for default VIRTIO mirroring (OVS default
> > Implementation) not this design.
> 
> No, this is with your mirroring solution. See the green bar in the
> benchmark results in the paper.
> 

I see the disparity here. The 25% is from the 1st IEEE paper where no mirror tunnel
mechanism is devised and there is no visible degradation on 512B packets over
our 2nd IEEE paper (expected to be published on June, https://im2021.ieee-im.org/)

> >>>>>
> >>>>> +
> >>>>> +    while (unlikely (pkt_trans < match_count)) {
> >>>>> +        rte_pktmbuf_free(pkt_buf[pkt_trans]);
> >>>>> +        pkt_trans++;
> >>>>> +    }
> >>>>> +
> >>>>> +    return nb_pkts;
> >>>>> +}
> >>>>> +
> >>
> >> ...
> >>
> >>>>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> >>>>> index 4597a215d..fd2049a7f 100644
> >>>>> --- a/vswitchd/vswitch.xml
> >>>>> +++ b/vswitchd/vswitch.xml
> >>>>> @@ -4869,11 +4869,35 @@ ovs-vsctl add-port br0 p0 -- set Interface
> p0
> >>>> type=patch options:peer=p1 \
> >>>>>          selected VLANs.
> >>>>>        </p>
> >>>>>
> >>>>> +      <column name="mirror_tunnel_addr">
> >>>>> +        BDF string of the tunnel device on which mirrored traffic will be
> >>>>> +        transmitted.
> >>>>> +      </column>
> >>>>> +
> >>>>>        <column name="select_all">
> >>>>>          If true, every packet arriving or departing on any port is
> >>>>>          selected for mirroring.
> >>>>>        </column>
> >>>>>
> >>>>> +      <column name="mirror_offload">
> >>>>> +        If true, a hw-assisted port mirroring is configured instead
> >>>>> +        default mirroring.
> >>>>> +      </column>
> >>>>> +
> >>>>> +      <column name="flow_src_mac">
> >>>>> +        The source MAC address(es) for per-flow mirroring. Each MAC
> >>>>> +        address is separate by ','. This parametr is paired with
> >>>>> +        select_dst_port. A '0' MAC address indicates the requested
> mirror
> >>>>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
> >>>>> +      </column>
> >>>>> +
> >>>>> +      <column name="flow_dst_mac">
> >>>>> +        The destination MAC address(es) for per-flow mirroring. Each
> MAC
> >>>>> +        address is separate by ','. This parametr is paired with
> >>>>> +        select_src_port. A '0' MAC address indicates the requested
> mirror
> >>>>> +        is a per-port mirroring, otherwise it's a per-flow mirroring
> >>>>> +      </column>
> >>>>> +
> >>>>>        <column name="select_dst_port">
> >>>>>          Ports on which departing packets are selected for mirroring.
> >>>>>        </column>
> >>>>> @@ -4955,6 +4979,32 @@ ovs-vsctl add-port br0 p0 -- set Interface p0
> >>>> type=patch options:peer=p1 \
> >>>>>          </p>
> >>>>>        </column>
> >>>>>
> >>>>> +      <column name="output_src_vlan">
> >>>>> +        <p>Output VLAN for selected source port packets, if
> >> nonempty.</p>
> >>>>> +        <p>
> >>>>> +          <em>Please note:</em> This is different than
> >>>>> +          <ref column="output-vlan"/> This vlan is used to add an
> additional
> >>>>> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
> >>>>> +          The receive end could choose to filter out this additional vlan.
> >>>>> +          This option is provided so the mirrored traffic could maintain its
> >>>>> +          original vlan informaiton, and this mirror can be used to filter
> >>>>> +          out un-wanted traffic such as in <ref
> column="mirror_offload"/>.
> >>>>> +        </p>
> >>>>> +      </column>
> >>>>> +
> >>>>> +      <column name="output_dst_vlan">
> >>>>> +        <p>Output VLAN for selected destination port packets, if
> >>>> nonempty.</p>
> >>>>> +        <p>
> >>>>> +          <em>Please note:</em> This is different than
> >>>>> +          <ref column="output-vlan"/> This vlan is used to add an
> additional
> >>>>> +          vlan tag on the mirror traffic, regardless it contains vlan or not.
> >>>>> +          The receive end could choose to filter out this additional vlan.
> >>>>> +          This option is provided so the mirrored traffic could maintain its
> >>>>> +          original vlan informaiton, and this mirror cab be used to filter
> >>>>> +          out un-wanted traffic such as in <ref
> column="mirror_offload"/>.
> >>>>> +        </p>
> >>>>> +      </column>
> >>>>> +
> >>>>>        <column name="snaplen">
> >>>>>          <p>Maximum per-packet number of bytes to mirror.</p>
> >>>>>          <p>A mirrored packet with size larger than <ref
> >> column="snaplen"/>
> >>>>>
> >>>
> >>
> >> These parameters are DPDK specific, but nothing mentions it. It would
> >> confuse the OVS-Kernel users.
> >>
> >
> > The current implementation only supports OVS-DPDK.
> 
> Yes, that is my point. It is not mentioned in the documentation that it
> is DPDK specific, and we should not expect the user to dive into the
> code to understand it is not supported with OVS Kernel.
> 

Got it. Will add this caveat in the document. Thanks.

> >> Regards,
> >> Maxime
> >



More information about the dev mailing list