3

Debian 9.4, Linux 4.9

I sometimes compile something that hardly fits in the RAM, or a rouge process suddenly starts eating memory beyond what's available. When the process goes past the available RAM, Linux starts thrashing the disk even though I have zero swap enabled (no swap was an attempt to avoid this). I guess it starts discarding and reloading stuff like the mmapped parts of the binaries that are currently running?

At this point my X session quickly becomes unresponsive, and all I can do is wait dozens of minutes until the entire X session gets killed and I can log back in.

I tried to search around for solutions, but nothing seems to work. The OOM killer doesn't catch this process and with vm.overcommit_memory=2 I can't even log in with GDM and Gnome.

Is there a way to tell Linux not to swap at all? That way I would at least get a chance that the rouge process will be killed by a failed malloc, and even if not, at least I wouldn't need to wait while staring at an unresponsive machine.

Or any other hints how to manage this scenario?

  • 2
    I think you're looking at it the wrong way. What you describe is exactly what is expected to happen when more than the available RAM is required and there's no swap to compensate for. So, your problem is **not having swap** rather than Linux swapping with `swapoff` which doesn't happen and is totally nonsensical. –  Apr 10 '18 at 12:27
  • Swapping algorithms allow the freeing of code pages, which are unmodified, allowing memory for other applications. This is not swapping, since nothing is written out, but the more (unfreeable) data pages that are in memory, the more disc activity for freeing and reading code pages into the dwindling remaining memory, with the obvious impact on performance. – AFH Apr 10 '18 at 13:00
  • 1
    Incidentally, why are you trying to run without swap? It will almost certainly lock up at some point, when there are no free memory pages. You can add a swap _file_ on the fly at any time, as [this link](https://help.ubuntu.com/community/SwapFaq#How_do_I_add_a_swap_file.3F) describes. – AFH Apr 10 '18 at 13:23
  • i'm running without swap already as an attempt to avoid the thrashing and locking up my machine when i e.g. forget to quit the browser while the build is running in the background (it runs for 3 hours, and only the end requires much RAM). – attila lendvai Apr 10 '18 at 14:52
  • The way to avoid thrashing and "locking up" is to either buy more RAM, or run less stuff (or smaller stuff) at one time. – Jamie Hanrahan Apr 11 '18 at 08:33
  • To make paging from (and to) disk faster, consider an SSD. Not just for your "swap" space but for the whole OS + apps. – Jamie Hanrahan Apr 11 '18 at 15:14
  • i appreciate your attempt at help, but man, please, do assume more from people hanging around on superuser.com! i already have an SSD, and i'm aware that if i upgrade my machine, well, it will be upgraded. duh! the solution i'm looking for is a reasonable behavior from Linux when errors happen. just replace my compilation example with a program bug that sometimes suddenly eats up all the ram. – attila lendvai Apr 12 '18 at 10:34
  • Is a *rouge process* the opposite of a *green thread?* – tripleee Apr 12 '18 at 11:43
  • Attila, you didn't mention you had an SSD. And answers (and comments) here are not just for the original questioner. The principle I stated about how to avoid thrashing is valid for everyone. By the way, running without swap is generally a poor idea - if the OS is short on RAM it will still be evicting pages from RAM; you are simply forcing it to evict only code pages and other read-only mmap'd files, when there may well be better candidates for eviction among the changed data pages. In other words by limiting the OS's choices you are forcing it to make poorer ones. – Jamie Hanrahan Apr 13 '18 at 19:10
  • 1
    @JamieHanrahan why would adding *more* swap make the situation better? if a process eventually takes up all the memory+swap ALL candidates for eviction are poor choices and there will still be a MASSIVE 1000 factor slow down, and thus it will take `1+SWAP_SIZE/RAM_SIZE*1000` times *longer* for the system to kill the process for taking up to much memory. This is exactly the opposite of what the OP is trying to do. – user3338098 Jun 15 '19 at 22:55

2 Answers2

2

If you are compiling sources that require almost all the available RAM, if not more, probably the only performant solution is adding real RAM. Having said that, you may try adding a very large amount of swap (say 2x or 3x the RAM) and set /proc/sys/vm/swappiness to a low value, like 1 (note that with kernel 3.5+ setting it to 0 totally disables swap), so that swap is used only if effectively necessary. This should minimize thrashing.

  • I agree with this answer. The point here is to let the system swap out the dirty pages of inactive processes to swap, freeing up RAM for the compilation process(es). This way the compilation has a better chance of finishing its job in reasonable time. If this still doesn't help, then you really need more physical RAM. – Johan Myréen Apr 10 '18 at 13:26
  • it's rather unfortunate that there's nothing better than this, but thanks for the hint! i'm running with this and this way i can get a larger window of opportunity to intervene at least when i'm around when it happens. – attila lendvai Apr 12 '18 at 10:44
  • Good! Happy to hear that this was of some help! – Marco Pantaleoni Apr 12 '18 at 11:02
  • you really need a large swap for this to help though, because a smaller swap gets filled at the first runaway allocation, and then the second time, with an almost full swap now, it behaves similarly to the no-swap scenario. – attila lendvai Apr 14 '18 at 20:22
  • Indeed it seems there is no way other than adding swap. But at least, Linux kernel developers have started talking about this. See https://lkml.org/lkml/2019/8/4/15 – md2k7 Aug 06 '19 at 04:47
2

I don't understand how people can recommend adding more RAM or more swap space. A misbehaving application can eat it all and reproduce the problem.

This kind of freezes are a serious architectural bug in the linux kernel. The only way to recover once the freezing happens is to force the OOM with a magic key (alt+sysrq+f). The kernel log will tell you later what was killed and why.

Several projects are trying to prevent this freezes from the userspace. See earlyOOM for example.