0

I want to install slurm to manage properly my DIY cluster. I want to use the cluster (HPC) for parallel simulations. I have 3 nodes (1 master 2 slaves) Ubuntu Server 20.04

I followed the instructions from nekodaemon.com (I can't access the website right now) in the chapter "Slurm Quick installation for cluster on Ubuntu 20.04", but I removed the last line they say to add on the compute node

CgroupMountpoint=/sys/fs/cgroup because it created an error when launching the start

Process: 46877 ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS (code=exited, status=1/FAILURE)

May 02 10:15:54 ben1 systemd[1]: Starting Slurm node daemon...
May 02 10:15:54 ben1 slurmd[46877]: error: _parse_next_key: Parsing error at unrecognized key: CgroupMountpoint
May 02 10:15:54 ben1 slurmd[46877]: error: Parse error in file /etc/slurm-llnl/slurm.conf line 149: "CgroupMountpoint=/sys/fs/cgroup"
May 02 10:15:54 ben1 slurmd[46877]: fatal: Unable to process configuration file
May 02 10:15:54 ben1 systemd[1]: slurmd.service: Control process exited, code=exited, status=1/FAILURE
May 02 10:15:54 ben1 systemd[1]: slurmd.service: Failed with result 'exit-code'.
May 02 10:15:54 ben1 systemd[1]: Failed to start Slurm node daemon.

After this I was able to start munge and slurm on the master node but on the compute node:

I run:

sudo systemctl start slurmd 

I get:

Job for slurmd.service failed because the control process exited with error code.
See "systemctl status slurmd.service" and "journalctl -xe" for details.

Then I run journalctl -xe and I get:

The job identifier is 22481 and the job result is failed.
May 02 10:48:48 ben1 sudo[47959]: pam_unix(sudo:session): session closed for user root
May 02 10:49:04 ben1 multipath[47985]: sdc: can't store path info
May 02 10:49:04 ben1 multipathd[771]: sdc: spurious uevent, path not found
May 02 10:49:04 ben1 multipathd[771]: uevent trigger error
May 02 10:49:05 ben1 multipath[47992]: sdc: can't store path info
May 02 10:49:06 ben1 multipathd[771]: sdc: spurious uevent, path not found
May 02 10:49:06 ben1 multipathd[771]: uevent trigger error
karel
  • 110,292
  • 102
  • 269
  • 299

0 Answers0