[ovs-dev] [PATCH RFC v2 0/6] dpif-netdev: Manual pinning of RX queues + XPS.

Thu Jul 7 00:27:28 UTC 2016

Hi Ilya,

apologies for the delay

On 20/06/2016 07:22, "Ilya Maximets" <i.maximets at samsung.com> wrote:

>On 11.06.2016 02:53, Daniele Di Proietto wrote:
>> On 02/06/2016 06:55, "Ilya Maximets" <i.maximets at samsung.com> wrote:
>> 
>>> Hi, Daniele.
>>> Thanks for review.
>>>
>>> On 02.06.2016 04:33, Daniele Di Proietto wrote:
>>>> Hi Ilya,
>>>>
>>>> apologies for the delay.
>>>>
>>>> I didn't take a extremely detailed look at this series, but I have
>>>> a few high level comments.
>>>>
>>>> Thanks for adding a command to configure the rxq affinity.  Have
>>>> you thought about using the database instead?  I think it will
>>>> be easier to use because it survives restarts, and one can batch
>>>> the affinity assignment for multiple ports without explicitly
>>>> calling pmd-reconfigure.  I'm not sure what the best interface
>>>> would look like. Perhaps a string in Interface:other_config that
>>>> maps rxqs with core ids?
>>>>
>>>> I'd prefer to avoid exporting an explicit command like
>>>> dpif-netdev/pmd-reconfigure.  If we use the database we don't have to,
>>>> right?
>>>
>>> I thought about solution with database. Actually, I can't see big
>>> difference between database and appctl in this case. For automatic
>>> usage both commands may be scripted, but for manual pinning this
>>> approaches equally uncomfortable.
>>> IMHO, if it will be database it shouldn't be a per 'Interface'
>>> string with mapping, because one map influences on other ports
>>> (core isolation). Also there is an issue with synchronization with
>>> 'pmd-cpu-mask' that should be performed manually anyway.
>>> appctl command may be changed to receive string of all mappings and
>>> trigger reconfiguration. In this case there will be no need to have
>>> explicit 'dpif-netdev/pmd-reconfigure'.
>> 
>> Do we really need to implement core isolation? I'd prefer an interface where
>> if an interface has an affinity we enforce that (as far as we can with the
>> current pmd-cpu-mask), and for other interfaces we keep the current model.
>> Probably there are some limitation I'm not seeing with this model.
>
>Generally, core isolation prevents polling of other ports on PMD thread.
>This is useful to keep constant polling rate on some performance
>critical port while adding/deleting of other ports. Without isolation
>we will need to pin exactly all ports to achieve desired level of performance.
>
>> I'd prefer to keep the mapping in the database because it's more in line
>> with the rest of OVS configuration.  The database survives crashes, restarts
>> and reboots.
>
>Ok. How about something like this:
>
>	* Per-port database entry for available core-ids:
>
>	  # ovs-vsctl set interface <iface> \
>                      other_config:pmd-rxq-affinity=<rxq-affinity-list>
>
>	  where:
>	  <rxq-affinity-list> ::= NULL | <non-empty-list>
>	  <non-empty-list> ::= <affinity-pair> |
>	                       <affinity-pair> , <non-empty-list>
>	  <affinity-pair> ::= <queue-id> : <core-id>
>
>	  Example:
>
>	  # ./bin/ovs-vsctl set interface dpdk0 options:n_rxq=4 \
>	                        other_config:pmd-rxq-affinity="0:3,1:7,3:8"
>	      Queue #0 pinned to core 3;
>	      Queue #1 pinned to core 7;
>	      Queue #2 not pinned.
>	      Queue #3 pinned to core 8;

Unless someone has better ideas, this looks good

>
>	* Configurable mask of isolated PMD threads:
>
>	  # ./bin/ovs-vsctl set Open_vSwitch . \
>	                    other_config:pmd-isol-cpus=<core-list>
>	  Empty means "none".

I still think this looks kind of complicated.  These are the options:

1) Do not deal with isolation.  If some "isolation" is required, the user
   has to assign the rxqs of every port.

2) Automatically isolate cores that have rxq explicitly assigned to them.

3) Add a pmd-isol-cpus parameter

4) Add a rxq-default-affinity (the opposite of pmd-isol-cpus).

1) and 2) only add a single configuration parameter, but
1) puts the burden on the controller (as you point out), and 2) is
not too flexible.

3) and 4) are equivalent, they allow more flexibility, but add two
configuration parameters.

If you think more flexibility is required, we can go for 3) or 4).

Otherwise we can probably try implementing 2).

Are there better ideas?

>	* Pinned RX queue can be polled only by PMD thread on core to
>	  which it is pinned.
>
>	* Only pinned RX queues can be polled by isolated PMD thread.
>
>>>
>>>> I'm not sure what's the best way to introduce XPS in OVS.  First of all,
>>>> for physical NICs I'd want to keep the current model where possible
>>>> (one queue per core, no locking).  Maybe we can introduce some mapping
>>>> just for vhost-user ports in netdev-dpdk to implement packet steering?
>>>
>>> We can just set default values for 'n_txq' more accurately: 'n_cores() + 1'
>>> for phy ports and '1' for virtual. To avoid locking on TX to phy ports
>>> we may just check that 'n_txq' >= 'n_cores() + 1'. We can do that because
>>> reconfiguration required to change 'n_txq' and, in current implementation
>>> of XPS, one queue will not be used twice if we have unused queues.
>> 
>> Ok, if it can be done without any overhead for phy ports, I'm fine with that.
>> 
>>>
>>>> Also, have you considered packet reordering issues?  If a 5-tuple is
>>>> processed by one core and we often change the tx queue, we end up
>>>> with the same 5-tuple on two different tx queues.
>>>
>>> To avoid reordering issues there is a timeout mechanism inside XPS.
>>> Current tx queue_id may be changed only if there was no packets to
>>> this queue for a significant amount of time (XPS_CYCLES).
>> 
>> Ok, no problem then.
>> 
>>>
>>>> Lastly I'm not 100% sure we need a "n-txq" parameter.  I think that
>>>> for vhost-user ports, we know the value in new_device() (we should
>>>> do that for rxqs too now that we have netdev_reconfigure).  physical
>>>> NICs txqs should match the cpu cores and avoid locking, where
>>>> possible.
>>>
>>> Actually, we don't need this if default values will be set as described
>>> above and reconfiguration request will be used inside 'new_device()'
>>> (I guess, this should be implemented separately).
>>> But it's harmless and, actually, almost free in implementation. May be
>>> it can be used for some specific cases or devices.
>> 
>> In general, we try to limit the knobs that a user can play with, unless they
>> provide some value.
>> 
>> In this case, if we need something for netdev-dummy, we can implement it
>> explicitly for it and not avoid exposing it otherwise.
>
>Ok. I agree. We can introduce something like 'options:dummy_n_txq' for
>dummy interfaces only.
>
>What do you think?

Agreed

Thanks,

Daniele