“How to ban users from login to bigMem0”
Follow: https://slurm.schedmd.com/pam_slurm_adopt.html
For understanding of PAM, check man pam.d
and https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system-level_authentication_guide/pluggable_authentication_modules#About_PAM.
Additional references: https://www.suse.com/c/deploying-slurm-pam-modules-on-sle-compute-nodes/
Detailed steps
- In your Slurm build directory, navigate to
slurm/contribs/pam_slurm_adopt/
and run
make
sudo make install
This will place pam_slurm_adopt.a
, pam_slurm_adopt.la
, and pam_slurm_adopt.so
in the PAM dir you specified in ./configure
. In out example, it’s /lib/slurm/pam/
.
Create soft links of the above three in /lib64/security/
.
sudo ln -s /lib/slurm/pam/pam_slurm_adopt.so /lib64/security/pam_slurm_adopt.so
sudo ln -s /lib/slurm/pam/pam_slurm_adopt.a /lib64/security/pam_slurm_adopt.a
sudo ln -s /lib/slurm/pam/pam_slurm_adopt.la /lib64/security/pam_slurm_adopt.la
DEBUG History ldd /lib64/security/pam_slurm_adopt.so
returns libslurm.so.38 => not found
. Using whereis libslurm.so
finds that libslurm.so
is in /usr/lib
. So, sudo ldconfig /usr/lib/
, and then ldd /lib64/security/pam_slurm_adopt.so
works fine. (Otherwise, pam_slurm_adopt
will not work.)
- Add the following lines to
/etc/pam.d/sshd
at the end ofaccount
section (beforepassword
section):
-account sufficient pam_access.so
-account required pam_slurm_adopt.so
And edit the pam_access configuration file (/etc/security/access.conf
):
+:root:cron crond :0 tty1 tty2 tty3 tty4 tty5 tty6
+:root:127.0.0.1
+:root:192.168.2.
+:wheel:ALL
+:admin:ALL
-:Student:ALL
-:Faculty:ALL
-:Collaborator:ALL
Note the “-“ before the account entry for pam_slurm_adopt. It allows PAM to fail gracefully if the pam_slurm_adopt.so file is not found. If Slurm is on a shared filesystem, such as NFS, then this is suggested to avoid being locked out of a node while the shared filesystem is mounting or down.
- The pam_systemd module will conflict with pam_slurm_adopt, so you need to disable it in all files that are included in sshd or system-auth (e.g. password-auth, common-session, etc.). Find all occurence via
sudo grep -rn /etc/pam.d -e 'pam.local'
. From the example above, the following two lines have been commented out in the included password-auth file:#account sufficient pam_localuser.so #-session optional pam_systemd.so
Note: This may involve editing a file that is auto-generated. Do not run the config script that generates the file or your changes will be erased.
- Stop and mask systemd-logind.
sudo systemctl stop systemd-logind
sudo systemctl mask systemd-logind
Based on this answer, use getenforce
to check the working mode of SELinux. If it is enforcing(1), set it to permissiver(0).
sudo setenforce 0
All the above steps need no restarting of PAM or anything.
slurm.conf
must includes PrologFlags=contain
. If you modify slurm.conf
you must do sudo systemctl reload slurmctld
& sudo systemctl reload slurmd
.
The working mechanism
(written by yjzhang, very colloquial (informal), very likely wrong in details so please read the following with a grain of salt.)
How does pam_slurm_adopt
work?
First, it bans all the ssh login of ordinary users (Student, Collaborator and Faculty groups) in the above access.conf
. But it must allow the ssh logins when the user has active jobs on the node! How to identify this?
Let’s have a look at the job launch process of slurm. At job allocation time (prolog), slurm uses the ProcTrack
plugin (it is cgroup
in our setting) to create a job container on all allocated compute nodes. And this container is used for user processes outside Slurm control! pam_slurm_adopt
places processes launched through a direct user login into this container. This is called an “extern” step in the job into which ssh-launched processes will be adopted. That’s why in the official guide it says, the user’s connection is “adopted” into the “external” step of the job.
In slurm.conf
, PrologFlags=contain
must be in place before using this module. The module bases its checks on local steps that have already been launched. If the user has no steps on the node, such as the extern step, the module will assume that the user has no jobs allocated to the node. Depending on your configuration of the PAM module, you might accidentally deny all user ssh attempts without PrologFlags=contain
.
Other Notices
Included files such as common-account should normally be included before pam_slurm_adopt.
pam_slurm_adopt
must be used with the task/cgroup task plugin and either the proctrack/cgroup or the proctrack/cray_aries proctrack plugin. (Explained before)
The UsePAM
option in slurm.conf
is not related to pam_slurm_adopt
.
Verify that UsePAM is set to On in /etc/ssh/sshd_config
(it should be on by default).