I'm having some trouble with a specific node. Until I resolve it, I don't want any jobs to run on ii. How can I temporarily take this node out of the nodes "pool"?
Asked
Active
Viewed 1.1k times
4 Answers
6
To disable:
qmod -d *@node_name
To re-enable:
qmod -e *@node_name
Kevin Panko
- 7,346
- 22
- 44
- 53
user322498
- 61
- 1
- 2
-
Why is this downvoted? – Albert Jan 26 '15 at 11:12
-
As advice, I had an issue getting wildcard queue names to work. I ran a `qstat -f`, got the queue on the host I wanted to disabled and used that as the argument after **-d** in `qmod -d` – Devin Aug 17 '15 at 16:05
2
If you're running 6.1 or better, here's the best way. Create a new hostgroup called @disabled
qconf -ahgrp @disabled
Create a new resource quota set with
qconf -arqs limit hosts @disabled to slots=0
Now, to disable a host, just add it to the host group
qconf -aattr hostgroup hostlist MYHOST @disabled
To reenable the host, remove it from the host group
qconf -dattr hostgroup hostlist MYHOST @disabled
This process will stop new jobs from being scheduled to the machine and allow the currently running jobs to complete.
Kevin Panko
- 7,346
- 22
- 44
- 53
Daniel Templeton
- 21
- 1
-
This does not seem to work. Jobs still get executed on the problematic. What can go wrong here? I can see it was added to @disbaled (using qconf -mhgrp @disabled) and I have enabled the quota set. – David B Dec 04 '10 at 12:16
-
By the way, the resource quota set looks like this: `{ name disabled_hosts description created by me enabled TRUE limit hosts @disabled to slots=0 }` – David B Dec 04 '10 at 12:19
-
By the way, this did work: `{ name disabled_hosts description created by me enabled TRUE limit hosts {my_bad_host} to slots=0 }`, so I guess it has something to do with @disabled. – David B Dec 04 '10 at 12:30
0
gridsuspend - Suspends one or more hosts from executing grid jobs. Example: gridsuspend -s -r "reason comment here" <host_name> 1d
-
Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 25 '22 at 19:57
0
Without knowing your SGE version I cannot say for certain that this will achieve the desired outcome, however, qconf -de foo will delete the execution host foo. qconf -ae foo will then add the host foo back to the execution list.
Tok
- 499
- 4
- 3
-
This also doesn't seem to work. Jobs still get executed on th problematic node. – David B Dec 04 '10 at 12:17
-