Skip to main content Link Menu Expand (external link) Document Search Copy Copied

“How to ban users from login to bigMem0”

Follow: https://slurm.schedmd.com/pam_slurm_adopt.html

For understanding of PAM, check man pam.d and https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system-level_authentication_guide/pluggable_authentication_modules#About_PAM.

Additional references: https://www.suse.com/c/deploying-slurm-pam-modules-on-sle-compute-nodes/

Detailed steps

  1. In your Slurm build directory, navigate to slurm/contribs/pam_slurm_adopt/ and run
make
sudo make install

This will place pam_slurm_adopt.a, pam_slurm_adopt.la, and pam_slurm_adopt.so in the PAM dir you specified in ./configure. In out example, it’s /lib/slurm/pam/.

Create soft links of the above three in /lib64/security/.

sudo ln -s /lib/slurm/pam/pam_slurm_adopt.so /lib64/security/pam_slurm_adopt.so
sudo ln -s /lib/slurm/pam/pam_slurm_adopt.a /lib64/security/pam_slurm_adopt.a
sudo ln -s /lib/slurm/pam/pam_slurm_adopt.la /lib64/security/pam_slurm_adopt.la

DEBUG History ldd /lib64/security/pam_slurm_adopt.so returns libslurm.so.38 => not found. Using whereis libslurm.so finds that libslurm.so is in /usr/lib. So, sudo ldconfig /usr/lib/, and then ldd /lib64/security/pam_slurm_adopt.so works fine. (Otherwise, pam_slurm_adopt will not work.)

  1. Add the following lines to /etc/pam.d/sshd at the end of account section (before password section):
-account    sufficient      pam_access.so
-account    required        pam_slurm_adopt.so

And edit the pam_access configuration file (/etc/security/access.conf):

+:root:cron crond :0 tty1 tty2 tty3 tty4 tty5 tty6
+:root:127.0.0.1
+:root:192.168.2.
+:wheel:ALL
+:admin:ALL
-:Student:ALL
-:Faculty:ALL
-:Collaborator:ALL

Note the “-“ before the account entry for pam_slurm_adopt. It allows PAM to fail gracefully if the pam_slurm_adopt.so file is not found. If Slurm is on a shared filesystem, such as NFS, then this is suggested to avoid being locked out of a node while the shared filesystem is mounting or down.

  1. The pam_systemd module will conflict with pam_slurm_adopt, so you need to disable it in all files that are included in sshd or system-auth (e.g. password-auth, common-session, etc.). Find all occurence via sudo grep -rn /etc/pam.d -e 'pam.local'. From the example above, the following two lines have been commented out in the included password-auth file:
    #account    sufficient    pam_localuser.so
    #-session   optional      pam_systemd.so
    

    Note: This may involve editing a file that is auto-generated. Do not run the config script that generates the file or your changes will be erased.

  2. Stop and mask systemd-logind.
sudo systemctl stop systemd-logind
sudo systemctl mask systemd-logind

Based on this answer, use getenforce to check the working mode of SELinux. If it is enforcing(1), set it to permissiver(0).

sudo setenforce 0

All the above steps need no restarting of PAM or anything.

slurm.conf must includes PrologFlags=contain. If you modify slurm.conf you must do sudo systemctl reload slurmctld & sudo systemctl reload slurmd.

The working mechanism

(written by yjzhang, very colloquial (informal), very likely wrong in details so please read the following with a grain of salt.)

How does pam_slurm_adopt work?

First, it bans all the ssh login of ordinary users (Student, Collaborator and Faculty groups) in the above access.conf. But it must allow the ssh logins when the user has active jobs on the node! How to identify this?

Let’s have a look at the job launch process of slurm. At job allocation time (prolog), slurm uses the ProcTrack plugin (it is cgroup in our setting) to create a job container on all allocated compute nodes. And this container is used for user processes outside Slurm control! pam_slurm_adopt places processes launched through a direct user login into this container. This is called an “extern” step in the job into which ssh-launched processes will be adopted. That’s why in the official guide it says, the user’s connection is “adopted” into the “external” step of the job.

In slurm.conf, PrologFlags=contain must be in place before using this module. The module bases its checks on local steps that have already been launched. If the user has no steps on the node, such as the extern step, the module will assume that the user has no jobs allocated to the node. Depending on your configuration of the PAM module, you might accidentally deny all user ssh attempts without PrologFlags=contain.

Other Notices

Included files such as common-account should normally be included before pam_slurm_adopt.

pam_slurm_adopt must be used with the task/cgroup task plugin and either the proctrack/cgroup or the proctrack/cray_aries proctrack plugin. (Explained before)

The UsePAM option in slurm.conf is not related to pam_slurm_adopt.

Verify that UsePAM is set to On in /etc/ssh/sshd_config (it should be on by default).


Copyright © 2020-2024 Advancedsolver Admin Team.