Questions tagged [slurm]
The slurm tag has no usage guidance.
27
questions
0
votes
0
answers
562
views
Resolving Slurm cgroups Plugin Errors on Ubuntu 22.04 Nodes
I'm working with Slurm and facing issues specifically with the cgroups plugin on Ubuntu 22.04 nodes. Our team is relatively new to Slurm, and we've been trying to optimize our resource management for ...
1
vote
1
answer
1k
views
Error code 140 in command running through Nextflow on SLURM
[Note: question heavily edited to correspond to the actual problem]
I'm trying to debug a command that fails only in specific conditions. The failure is with an exitcode 140, but I have no other ...
0
votes
0
answers
91
views
Dynamically checking and allocating SLURM nodes within a python script
I have a computationally expensive simulation function I am looking to distribute accross a multi-node cluster. The code looks something like this:
input_tasks = [input_0, input_1, ..., input_n]
for i ...
0
votes
1
answer
2k
views
How to sync UIDs and GIDs across multiple machines with minimal impact on users' experience?
I have two workstations, WS 1 and WS 2, and a server, S, all running Ubuntu 22.04. These machines were previously managed independently, so users could have accounts on some or all of them, and ...
1
vote
0
answers
2k
views
How to make a host file in SLURM with $SLURM_JOB_NODELIST
I have access to a HPC with 40 cores on each node. I have a batch file to run a total of 35 codes which are in separate folders. Each code is an open mp code which requires 4 cores each. so how do I ...
0
votes
1
answer
887
views
Common home folder for slurm cluster user on nodes and front end
I am trying to put together a SLURM cluster with an Odroid XU4 front end (Ubuntu 20.04-5.4 mate), Odroid MC1 nodes (12 nodes total: Ubuntu 20.04.1-5.4-minimal), and an Odroid HC1 NFS server (...
0
votes
1
answer
67
views
How can I pass two arguments to `--mail-type` option of `salloc`?
I would like to pass two arguments to an option of a shell command, specifically, for salloc. I can choose to do either of the following
salloc -n 1 -t 24:00:00 --mail-type=BEGIN
salloc -n 1 -t 24:00:...
0
votes
1
answer
91
views
Linux Mint "slurm" appears on login screen
On my login screen recently the text slurm appeared above my login name. What can be its reason? How can it be removed?
I use Linux Mint version 19.1 'Tessa' with its Cinnamon desktop environment.
...
0
votes
1
answer
2k
views
SLURM setting nodes to drain due to low socket-core-thread-cpu count
I have SLURM set up with a couple of workstations. There are different kinds, but let's take one with a CPU which has 4 cores and no additional SMT, so 4 threads in total. lscpu shows me the following:...
1
vote
1
answer
1k
views
slurmd: Invalid job credential
I'm having some problems with a test configuration of Slurm on my laptop. I'm trying to run four slurmd instances on one machine, which is also the same machine as slurmctld runs on. I have a local ...
1
vote
0
answers
913
views
Slurm - GPU enforcement with cgroups
I am running slurm 19.05 on a single machine (Ubuntu 18.04) for scheduling GPU tasks. However, I am having trouble to setup the gpu enforcement with cgroups.
If I set ConstrainDevice=yes in my cgroup....
0
votes
1
answer
402
views
Slurm nodes on AWS set to drain at boot
I am working to configure slurm on an AWS cluster created with CloudFormation. At boot time some of the nodes get set to a "drain" state, with the stated reason being "Low socketcorethread count". ...
3
votes
2
answers
11k
views
How to cancel a job that is on completing (CG) state?
I normally submitted some jobs using sbatch and canceled some of them after using scancel. However, they are in state CG and I cannot remove the jobs from my list.
There is any way to get ride off ...
2
votes
1
answer
10k
views
Slurm on AWS returns slurmstepd: error: execve(): : No such file or directory
I have installed a Burstable and Event-driven HPC Cluster on AWS Using Slurm according to this tutorial.
With this installation I can burst instances and run jobs in the Slurm environment on EC2. ...
1
vote
1
answer
232
views
Ubuntu 18.10 and modify installed package - OpenMPI
I've installed openmpi-bin (OpenMPI 3.1) on Ubuntu 18.10. I also run slurm on the same machine and would like to recompile or reconfigure my installation of OpenMPI to cope with Slurm-feature.
If one ...