[ovs-discuss] [ovs-dev] mbuf pool sizing

Kevin Traynor ktraynor at redhat.com
Thu Jan 25 16:26:20 UTC 2018


On 01/24/2018 10:19 AM, Venkatesan Pradeep wrote:
> Hi Kevin,
> 
> My primary concern is that someone upgrading to OVS2.9 may find that configurations that were previously working fine no longer do because the memory dimensioned for OVS may not be sufficient. It could be argued that since the shared mempool method allocates a fixed number of buffers it may not be enough in all cases  but the fact remains that existing deployments that are working just fine may have issues after upgrading and that needs to be addressed.
> 
> Even with the per-port allocation scheme we can only have a rough estimate. RxQ buffer sizing is adequate but TxQ  buffer sizing is not for the following reasons:
> 
> 1)	The estimate should consider the possibility of packets from one port being stuck on all other port txqs and so the *worst case * TxQ buffer sizing for stolen packets should really be the Sigma of (dev->requested_n_txq * dev->requested_txq_size) for every other port. This will bloat up the pool size. Also when a new port is added or an existing port’s queue attributes are changed, every other port’s mempool has to be resized and that may fail.  A high value for MIN_NB_MBUF is likely helping to cover the shortfall.
> 2)	Currently, In the case of tx to vhostuser queues, packets are always copied and so in the above calculation we need to consider only physical dpdk ports. I haven’t looked closely at the proposed zero-copy change but I assume if that is enabled we would have to take into account the queue size for vhostuser ports as well.
> 3) For cloned packets (dev->requested_n_txq * dev->requested_txq_size) would suffice
> 4) Tx batching would add a bit more to the estimate
>

I completely agree with everything you said above and thanks for
pointing out those additional cases.

> That said, unless the TxQs are drained slowly in comparison to the rate at which the packets are enqueued, the queue occupancy may never be high enough to justify the worst case allocation estimate and lots of memory will be wasted. 
> 
> Shared mempool does have an advantage since it allows more efficient sharing of the mbufs but yes using a one-size-fits-all approach won’t work in all cases. Even when different MTUs are involved, if the values are close enough the associated ports will share the memory pools and we may only need a small number of memory pools. Perhaps making the size configurable or even having them grow dynamically when the usage goes high would be something to consider?
> 

It could be a good solution, but the problem now is, it is too late for
a big change like that for OVS 2.9.

For the time being, how about just adding in the mbuf core cache
(because we know we will want to use that) and changing MIN_NB_MBUF to
(4096 * 2) to cover for the other cases where tx queues/tx batching etc
may hold some? That should make it ~20 ports per socket before the user
would require additional memory.

Kevin.

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ac2e38e..7959e3f 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -96,5 +96,5 @@ static struct vlog_rate_limit rl =
VLOG_RATE_LIMIT_INIT(5, 20);
  * have enough hugepages) we keep halving the number until the allocation
  * succeeds or we reach MIN_NB_MBUF */
-#define MIN_NB_MBUF          (4096 * 4)
+#define MIN_NB_MBUF          (4096 * 2)
 #define MP_CACHE_SZ          RTE_MEMPOOL_CACHE_MAX_SIZE

@@ -532,4 +532,5 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
               + dev->requested_n_txq * dev->requested_txq_size
               + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
+              + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * MP_CACHE_SZ
               + MIN_NB_MBUF;



> Regards,
> 
> Pradeep
> 
> 
> -----Original Message-----
> From: Kevin Traynor [mailto:ktraynor at redhat.com] 
> Sent: Wednesday, January 24, 2018 12:15 AM
> To: Venkatesan Pradeep <venkatesan.pradeep at ericsson.com>; ovs-dev at openvswitch.org; ovs-discuss at openvswitch.org; Robert Wojciechowicz <robertx.wojciechowicz at intel.com>; Ian Stokes <ian.stokes at intel.com>; Ilya Maximets <i.maximets at samsung.com>; Kavanagh, Mark B <mark.b.kavanagh at intel.com>
> Subject: Re: [ovs-dev] mbuf pool sizing
> 
> On 01/23/2018 11:42 AM, Kevin Traynor wrote:
>> On 01/17/2018 07:48 PM, Venkatesan Pradeep wrote:
>>> Hi,
>>>
>>> Assuming that all ports use the same MTU,  in OVS2.8 and earlier, a 
>>> single mempool of 256K buffers (MAX_NB_MBUF = 4096 * 64) will be 
>>> created and shared by all the ports
>>>
>>> With the OVS2.9 mempool patches, we have port specific allocation and the number of mbufs created for each port is based on the following formula (with a lower limit of MIN_NB_MBUF = 4096*4)
>>>        n_mbufs = dev->requested_n_rxq * dev->requested_rxq_size
>>>               + dev->requested_n_txq * dev->requested_txq_size
>>>               + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST
>>>               + MIN_NB_MBUF;
>>>
>>> Using minimal value (1) for n_rxq and n_rxq and default value (2048) for requested_rxq_size and requested_txq_size, the above translates to
>>>       n_mbufs = 1*2048 + 1*2048 + 1*32 + 4096*4  = 20512
>>>
>>> Assuming all ports have the same MTU, this means that approximately 13 ports in OVS2.9 will consume as much memory as the single mempool shared by all ports in OVS2.8 (256*1024 / 20512) .
>>>
>>> When a node is upgraded from OVS2.8 to OVS2.9 it is quite possible that the memory set aside for OVS may be insufficient. I'm not sure if this aspect has been discussed previously and wanted to bring this up for discussion.
>>>
>>
>> Hi Pradeep, I don't think it has been discussed. I guess the thinking 
>> was that with a giant shared mempool, it was over provisioning when 
>> there was a few ports, and in the case where there was a lot of ports 
>> there could be some starvation at run time. It also meant if you had a 
>> mix of different MTU's you had multiple giant shared mempools and 
>> could run out of memory very quickly at config or run time then also.
>>
>> So I can see the argument for having a mempool per port, as it is more 
>> fine grained and if you are going to run short of memory, it will at 
>> least be at config time. The problem is if you give some over 
>> provision to each port and you have a lot of ports, you hit the 
>> situation you are seeing.
>>
>> I think some amount of over provision per port is needed because you 
>> don't want to be cutting it so fine that you run into memory issues at 
>> run time about local mbuf caches on cores running out, or even if 
>> someone used dpdk rings to send the mbuf somewhere else for a time.
>> There may be other corner cases too. Perhaps as compromise the min 
>> size could be reduce from 4096*4 to 4096*2 or 4096.
>>
>> Thoughts?
>>
> 
> I just sent a compile tested only RFC here https://mail.openvswitch.org/pipermail/ovs-dev/2018-January/343581.html
> 
>> Kevin.
>>
>>> Regards,
>>>
>>> Pradeep
>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev at openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>>
>> _______________________________________________
>> dev mailing list
>> dev at openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 



More information about the discuss mailing list