SLURM configuration: cons_res with CR_Core either cannot allocate resource or jobs end up in CG status

Question

I am new to SLURM. I am trying to configure slurm in a new cluster.

I have 4 nodes each has 14 cores. I wanted to share nodes in a way that every core can run independently (i.e., node01 can have 14 independent serial jobs going on at the same time), but no core should run more than one job. Going through the documentaion I figured I need to set

SelectType              = select/cons_res
SelectTypeParameters    = CR_CORE

So I did so in slurm.conf and restarted slurmctld. But now if I submit a job, I get either it cannot find node configuration, or the job ends up CG state.

Example 1:

 [sr@clstr mpitests]$ cat newHello.slrm 
#!/bin/sh
#SBATCH --time=00:01:00
#SBATCH -N 1
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=4

module add shared openmpi/gcc/64 slurm

module load somesh/scripts/1.0

mpirun helloMPIf90

Leads to:

[sr@clstr mpitests]$ sbatch -v newHello.slrm 
sbatch: defined options for program `sbatch'
sbatch: ----------------- ---------------------
sbatch: user              : `sr'
sbatch: uid               : 1003
sbatch: gid               : 1003
sbatch: cwd               : /home/sr/clusterTests/mpitests
sbatch: ntasks            : 4 (set)
sbatch: nodes             : 1-1
sbatch: jobid             : 4294967294 (default)
sbatch: partition         : default
sbatch: profile           : `NotSet'
sbatch: job name          : `newHello.slrm'
sbatch: reservation       : `(null)'
sbatch: wckey             : `(null)'
sbatch: distribution      : unknown
sbatch: verbose           : 1
sbatch: immediate         : false
sbatch: overcommit        : false
sbatch: time_limit        : 1
sbatch: nice              : -2
sbatch: account           : (null)
sbatch: comment           : (null)
sbatch: dependency        : (null)
sbatch: qos               : (null)
sbatch: constraints       : 
sbatch: geometry          : (null)
sbatch: reboot            : yes
sbatch: rotate            : no
sbatch: network           : (null)
sbatch: array             : N/A
sbatch: cpu_freq_min      : 4294967294
sbatch: cpu_freq_max      : 4294967294
sbatch: cpu_freq_gov      : 4294967294
sbatch: mail_type         : NONE
sbatch: mail_user         : (null)
sbatch: sockets-per-node  : -2
sbatch: cores-per-socket  : -2
sbatch: threads-per-core  : -2
sbatch: ntasks-per-node   : 4
sbatch: ntasks-per-socket : -2
sbatch: ntasks-per-core   : -2
sbatch: mem_bind          : default
sbatch: plane_size        : 4294967294
sbatch: propagate         : NONE
sbatch: switches          : -1
sbatch: wait-for-switches : -1
sbatch: core-spec         : NA
sbatch: burst_buffer      : `(null)'
sbatch: remote command    : `/home/sr/clusterTests/mpitests/newHello.slrm'
sbatch: power             : 
sbatch: wait              : yes
sbatch: Consumable Resources (CR) Node Selection plugin loaded with argument 4
sbatch: Cray node selection plugin loaded
sbatch: Linear node selection plugin loaded with argument 4
sbatch: Serial Job Resource Selection plugin loaded with argument 4
sbatch: error: Batch job submission failed: Requested node configuration is not available

Example 2:

[sr@clstr mpitests]$ cat newHello.slrm 
#!/bin/sh
#SBATCH --time=00:01:00
#SBATCH -N 1
#SBATCH --ntasks=1
#SBATCH --ntasks-per-node=1

module add shared openmpi/gcc/64 slurm

module load somesh/scripts/1.0

helloMPIf90

Leads to:

[sr@clstr mpitests]$ sbatch -v newHello.slrm 
sbatch: defined options for program `sbatch'
sbatch: ----------------- ---------------------
sbatch: user              : `sr'
sbatch: uid               : 1003
sbatch: gid               : 1003
sbatch: cwd               : /home/sr/clusterTests/mpitests
sbatch: ntasks            : 1 (set)
sbatch: nodes             : 1-1
sbatch: jobid             : 4294967294 (default)
sbatch: partition         : default
sbatch: profile           : `NotSet'
sbatch: job name          : `newHello.slrm'
sbatch: reservation       : `(null)'
sbatch: wckey             : `(null)'
sbatch: distribution      : unknown
sbatch: verbose           : 1
sbatch: immediate         : false
sbatch: overcommit        : false
sbatch: time_limit        : 1
sbatch: nice              : -2
sbatch: account           : (null)
sbatch: comment           : (null)
sbatch: dependency        : (null)
sbatch: qos               : (null)
sbatch: constraints       : 
sbatch: geometry          : (null)
sbatch: reboot            : yes
sbatch: rotate            : no
sbatch: network           : (null)
sbatch: array             : N/A
sbatch: cpu_freq_min      : 4294967294
sbatch: cpu_freq_max      : 4294967294
sbatch: cpu_freq_gov      : 4294967294
sbatch: mail_type         : NONE
sbatch: mail_user         : (null)
sbatch: sockets-per-node  : -2
sbatch: cores-per-socket  : -2
sbatch: threads-per-core  : -2
sbatch: ntasks-per-node   : 1
sbatch: ntasks-per-socket : -2
sbatch: ntasks-per-core   : -2
sbatch: mem_bind          : default
sbatch: plane_size        : 4294967294
sbatch: propagate         : NONE
sbatch: switches          : -1
sbatch: wait-for-switches : -1
sbatch: core-spec         : NA
sbatch: burst_buffer      : `(null)'
sbatch: remote command    : `/home/sr/clusterTests/mpitests/newHello.slrm'
sbatch: power             : 
sbatch: wait              : yes
sbatch: Consumable Resources (CR) Node Selection plugin loaded with argument 4
sbatch: Cray node selection plugin loaded
sbatch: Linear node selection plugin loaded with argument 4
sbatch: Serial Job Resource Selection plugin loaded with argument 4
Submitted batch job 108

[sr@clstr mpitests]$ squeue 
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               108      defq newHello     sr CG       0:01      1 node001

[sr@clstr mpitests]$ scontrol show job=108
JobId=108 JobName=newHello.slrm
   UserId=sr(1003) GroupId=sr(1003) MCS_label=N/A
   Priority=4294901756 Nice=0 Account=(null) QOS=normal
   JobState=COMPLETING Reason=NonZeroExitCode Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=1:0
   RunTime=00:00:01 TimeLimit=00:01:00 TimeMin=N/A
   SubmitTime=2017-03-03T18:25:51 EligibleTime=2017-03-03T18:25:51
   StartTime=2017-03-03T18:26:01 EndTime=2017-03-03T18:26:02 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=defq AllocNode:Sid=clstr:20260
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=node001
   BatchHost=node001
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=1,node=1
   Socks/Node=* NtasksPerN:B:S:C=1:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/sr/clusterTests/mpitests/newHello.slrm
   WorkDir=/home/sr/clusterTests/mpitests
   StdErr=/home/sr/clusterTests/mpitests/slurm-108.out
   StdIn=/dev/null
   StdOut=/home/sr/clusterTests/mpitests/slurm-108.out
   Power=

In the case of second example, it stays in CG state until I reset the node.

If I reset the slurm.conf to SelectType=select/linear, things behave normally as they should.

I am at a loss as to where am I making mistake. Is it to do with the slurm configuration, or with my slurm job submission script, or something else entirely.

If anyone can point me to the right direction, that would very helpful.

[Note: I originally posted it in stackoverflow, but realized superuser may be a better forum.]

score 0 · Accepted Answer · answered Mar 08 '17 at 17:53

0

Seems like I just needed to reboot the entire cluster! Now jobs behave like they should with cons_res.

It probably had to do with issues with file system as suggested in the slurm documentation.

answered Mar 08 '17 at 17:53

Somesh

11
2

SLURM configuration: cons_res with CR_Core either cannot allocate resource or jobs end up in CG status

1 Answers1