Highest Voted 'slurm' Questions - Super User Stack Exchange

13

votes

2 answers

How can I find out how long my slurm job took to execute?

One idea I have to find out how long my slurm job is taking is to use squeue --job How do I find out how long my job took to complete though once the job is complete?

linux slurm

asked Nov 02 '14 at 00:22

demongolem

567
4
10
25

5

votes

1 answer

remove slurm sacct command double entries: "extern"

Jobs currently running show two entries, one of them has an .extern suffix. Completed (or failed) jobs also have a third entry: .batch. Is there a way to remove (or not show these) from the sacct output? What are these entries?

cluster parallel-processing slurm

asked Nov 17 '16 at 21:39

DilithiumMatrix

549
1
4
15

4

votes

1 answer

Slurm initialization fails in a Raspberry Pi cluster with Raspbian 9.4

I am trying to set up Slurm in a Raspberry Pi cluster with Raspbian 9.4. I am able to start slurmctld, but when I try to launch slurmd I get the following output: pi@node1:~ $ slurmd -Dvvvc slurmd: debug: Log file re-opened slurmd: error: Domain…

raspberry-pi raspbian slurm

asked Jul 16 '18 at 11:30

Bub Espinja

151
5

3

votes

2 answers

How to cancel a job that is on completing (CG) state?

I normally submitted some jobs using sbatch and canceled some of them after using scancel. However, they are in state CG and I cannot remove the jobs from my list. There is any way to get ride off those CG jobs? Sadly, I'm not the administrator of…

cluster slurm

asked Jun 15 '19 at 14:09

Iago Carvalho

131
1
3

2

votes

1 answer

Slurm on AWS returns slurmstepd: error: execve(): : No such file or directory

I have installed a Burstable and Event-driven HPC Cluster on AWS Using Slurm according to this tutorial. With this installation I can burst instances and run jobs in the Slurm environment on EC2. After running: #!/bin/bash #SBATCH --nodes=2 #SBATCH…

amazon-web-services amazon-ec2 slurm

asked Jun 14 '19 at 12:59

Serialchiller

41
1
3

2

votes

1 answer

How to use slurm request for only one core instead of a node or socket?

I wrote Perl scripts to analyze my simulating data. This is not a concurrent program. In the cluster, there are eight nodes. Each of node has 2 sockets which possesses 10 cores. I want to submit my job using Slurm and only request one core to…

linux cpu slurm

asked Feb 12 '19 at 03:17

Leon

121
1
6

2

votes

0 answers

How to use SLURM's --dependency=expand: correctly

I have 1 slurm job unfinished out of 5 that's been running 19 hours and I'm concerned that it will hit walltime before it finishes. I'm not the admin and it's the weekend, so I would like to try using this feature I discovered recently shown in…

slurm

asked Nov 03 '18 at 12:32

hepcat72

155
7

1

vote

0 answers

How to make a host file in SLURM with $SLURM_JOB_NODELIST

I have access to a HPC with 40 cores on each node. I have a batch file to run a total of 35 codes which are in separate folders. Each code is an open mp code which requires 4 cores each. so how do I allocate resources such that each code gets 4…

bash hpc slurm

asked May 29 '21 at 13:50

Libin Varghese

11
2

1

vote

1 answer

slurmd: Invalid job credential

I'm having some problems with a test configuration of Slurm on my laptop. I'm trying to run four slurmd instances on one machine, which is also the same machine as slurmctld runs on. I have a local munged running as user munge. slurmd and slurmctld…

hpc slurm

asked Oct 25 '19 at 13:45

lukas

11
2

1

vote

0 answers

Slurm - GPU enforcement with cgroups

I am running slurm 19.05 on a single machine (Ubuntu 18.04) for scheduling GPU tasks. However, I am having trouble to setup the gpu enforcement with cgroups. If I set ConstrainDevice=yes in my cgroup.conf file, tensorflow is not able to access my…

gpu slurm

asked Sep 10 '19 at 07:06

Jonas

11
1

1

vote

1 answer

Ubuntu 18.10 and modify installed package - OpenMPI

I've installed openmpi-bin (OpenMPI 3.1) on Ubuntu 18.10. I also run slurm on the same machine and would like to recompile or reconfigure my installation of OpenMPI to cope with Slurm-feature. If one installs OpenMPI from source, there is a setting…

apt slurm ubuntu-18.10

asked Jun 13 '19 at 18:01

Paer

21
3

1

vote

0 answers

job state=failed reason=nonzero exit code, SLURM

I'm new to Slurm, I have been trying to run a simple job. I'm running Slurm on top of a VM. Here's my…

linux virtual-machine centos slurm

asked May 04 '19 at 17:33

Ash Bougui

11
3

1

vote

0 answers

Ansys parallele job on Slurm Cluster stuck without error or exit message

I am working on a Slurm Cluster, executing Ansys (V18.2) jobs in a parallelized way. Large jobs (meaning large solver files) often stuck with out error message or exit message, the jobs keep running until timeout is reached. Due to large job size,…

cpu batch parallel-processing slurm

asked Apr 10 '19 at 06:39

Anatol

11
1

1

vote

1 answer

Ansys Remote Solver with SLURM cluster

I am trying to connect Ansys running on CentOS 7 to use our HPC cluster which using SLURM as a scheduler. I have looked into all the configuration file I could think of. I even wrote my custom hps_commands_SLURM.xml file I get the…

slurm

asked Feb 28 '18 at 16:19

Shahan M

121
7

1

vote

1 answer

SLURM configuration: cons_res with CR_Core either cannot allocate resource or jobs end up in CG status

I am new to SLURM. I am trying to configure slurm in a new cluster. I have 4 nodes each has 14 cores. I wanted to share nodes in a way that every core can run independently (i.e., node01 can have 14 independent serial jobs going on at the same…

linux cluster hpc slurm

asked Mar 04 '17 at 18:19

Somesh

11
2

Questions tagged [slurm]