3

I require 64GB to fit an entire dataset in memory for deep learning, but have only 12GB RAM. Virtual memory being the next-best alternative, I learned it can be effectively increased via increasing the pagefile size - but this source suggests it'd increase system instability.

All other sources state to the contrary, only noting lowered SSD lifespan, which isn't a problem - but I rather not take chances; this said, is there a limit to how much pagefile size can be increased without yielding instability?


Additional info: Win10 OS, 26GB OS-allocated pagefile size (need 52GB + c, c = safe minimum)


PRE-ANSWER: proceeded as described here, with ~70GB memory-mapped data; the average data load speedup is 42-FOLD. I suspect this figure may be bumped to ~130, though won't work on it now unless someone answers this. Lastly, this is sustainable and won't degrade the SSD, as the use is 99.9%+ reads. Will post full answer with details eventually.

  • Possible duplicate of [Any reason not to disable the Windows pagefile given enough physical RAM?](https://superuser.com/questions/30345/any-reason-not-to-disable-the-windows-pagefile-given-enough-physical-ram) – phuclv Sep 03 '19 at 02:20
  • $100 on more memory would be well spent. Asking a school or business for spare PCs or parts would be good too. In my experience, a large page file doesn't hurt, but doesn't help much either. Best to leave the OS on automatic. – Christopher Hostage Sep 10 '19 at 01:21
  • @ChristopherHostage I would preallocate the page file so you don't get an out of disk space condition while swapping--things tend to puke when that happens. – Loren Pechtel Sep 10 '19 at 02:27

2 Answers2

2

The page file supports swapping (a.k.a paging) 4K blocks of data (which are called pages) in RAM out to disk and back.

Code that the CPU is running must live in physical RAM. Also, Windows, like other OSes, uses "unused" RAM to cache disk I/O until it is flushed (and if disk data is just read and re-read, it might just stay in "unused" RAM for a longtime).

In a multitasking operating system, there may be some code that is owned by tasks that are waiting on some event that hasn't happened recently, like user input. It helps system performance to page this out to a disk file and call it back in when the events happen, so that code that is actually doing something on your computer can leverage the free RAM.

Now of course, the operating system can page code that might actually be doing something, but is a lower priority, if a sudden request for more memory than the system has comes in. This is better in most cases than denying a program outright a request for memory, if it isn't too much more physical RAM than what is available.

At some point, if you keep allocating memory that isn't there, your program will be competing with basic Windows services and other programs running on your computer. Plus, you've removed all the unused RAM, so disk I/O won't be cached at all. You will experience a massive decrease in performance that will affect all processes on the system, including system ones.

The instability described as harmful can come from basic Windows functions becoming unresponsive because they are going back and forth from disk to RAM and swapping with your machine learning program and other programs. For example, clicking on a desktop icon may take minutes to respond. So you might think the system is frozen totally when it's just going through swapping like crazy and will eventually respond.

LawrenceC
  • 73,030
  • 15
  • 129
  • 214
  • 1
    In other words, if I (1) keep a good amount of unused RAM, and (2) reserve a good amount of pagefile space for system processes, I'm good to go? Per (1), if RAM utility according to Task Manager is 8GB/12GB, is 4GB then "unused"? Per (2), any standard number for how much reserve space is sufficient, or it varies? Response's appreciated – OverLordGoldDragon Sep 02 '19 at 21:47
  • 1
    I agree with this answer. However, I want to make it clear that if you need 64 GB of RAM and you only have 12, it is possible your application might not work properly, or at all. If a significant amount of paging is occurring and your deep learning app has some sort of timeouts implemented, you are going to run into problems. More physical RAM would be the solution. – Keltari Sep 02 '19 at 22:42
  • 1
    @Keltari The goal of more RAM is only to speed up training - it's not actually "required"; an NVMe PCIe SSD w/ [3.5GB/s](https://amzn.to/2keCXWB) read speed is already around 10% of RAM, which is huge - hence knowing the answer to my comment question matters – OverLordGoldDragon Sep 02 '19 at 23:59
  • @OverLordGoldDragon, increasing your pagefile to increase your "RAM" won't gain you any more speed, if anything it'll make the system slower due to constant swapping to disk. The OS will already dedicate as much physical RAM to your program as it reasonably can, pushing whatever other processes/data it can to your page file. – kicken Sep 03 '19 at 02:04
  • 1
    @kicken That is false; accessing paged arrays is much faster than loading them from memory - on my system, by an order of magnitude. – OverLordGoldDragon Sep 03 '19 at 02:14
  • @OverLordGoldDragon, "accessing paged arrays is much faster than loading them from memory - on my system, by an order of magnitude" is the opposite of my experience and the standard literature. I suspect that the test was not valid. – Christopher Hostage Sep 10 '19 at 01:20
  • @ChristopherHostage Ran a full epoch w/ ~70GB memory-mapped data; average data load time speedup: **42-FOLD**. Further, I suspect this figure may be bumped to ~130, though won't spend time on it now - unless someone answers [this](https://superuser.com/questions/1481251/how-to-store-data-in-pagefile) – OverLordGoldDragon Sep 11 '19 at 23:20
0

It sounds like your program is going to be going all over that dataset while it's running. That is going to cause a tremendous amount of swapping. You point out a fast SSD can be at 10% of RAM speed--but your program might have wanted 100 bytes of data while the system proceeded to read 4096 bytes off the disk. Your 10% doesn't mean it merely takes 10x as long to run.

Furthermore, if your program is modifying the data as it works with it things get far worse--dirty pages get written. If there's much modification of data you'll deplete the drive of spare blocks and your write speed can become truly atrocious. (A page must erased before being written. Last I knew that was an operation measured in a substantial number of milliseconds, although I'm not finding current data. Normally a drive keeps a supply of empty pages around to handle writes but when the writes come in faster than it can wipe the pages the pool gets depleted and a write must wait until the eraser finishes with a page.)

Loren Pechtel
  • 2,594
  • 2
  • 20
  • 23
  • No writes, only reads - the 64GB dataset's divided into 134 subsets, 500MB each, and is read into and out of RAM every 500ms. Any concerns in this case? Also, [relevant issue](https://stackoverflow.com/questions/57862708/persistent-memory-mapping-of-arrays-to-pagefile) - asked differently, _can_ data be stored persistently in pagefile? – OverLordGoldDragon Sep 10 '19 at 02:46