Mount namespaces handbook¶
This handbook is intended to describe the usage of mount namespaces and polyinstanciated directories in a HPC context.
There is currently two main use-cases for this:
- User isolation
Users should not be able to use any local storage, like
/tmp
or/dev/shm
, to share files with other.All local files should be only visible to their owners and sharing files between users should only use shared filesystems like Lustre or NFS.
- Per-users mount rights
In some cases, users of different groups (so called containers) are allowed to access files of another group. These kind of access are hierarchical and read-only.
Given two groups A and B, we have to provide a solution to allow a read-only access of A’s users files to B’s users. But A’s users must not access B’s files.
Implementation¶
To implement those use-cases, we are using the pam_namespace
as the most used piece of software to do that.
The pam_namespace
PAM module sets up a private namespace for a session with polyinstantiated directories. A polyinstantiated directory provides a different instance of itself based on user name.
The pam_namespace
module disassociates the session namespace from the parent namespace. Any mounts/unmounts performed in the parent namespace, such as mounting of devices, are not reflected in the session namespace. Only original mount points are reflected.
There is two ways of doing that using the pam_namespace
:
Describing polyinstanciated directories and the base storage device to use.
Using a script to manually populate the mount namespace using a given base directory.
Basic usage¶
After adding the pam_namespace
in the service’s PAM configuration, the namespace configuartion is done in /sec/security/namespace.d/
files.
A basic usage is simply:
# POLYDIR INSTANCE_PREFIX METHOD UIDS
/tmp /tmp/poly-inst user root
This configuration tells that for each user that is not root
(4th field), remount the /tmp
(1st field) using the username (3rd field) as a template of unique folders within the instance prefix (2nd field).
To keep things clear, this will bind-mount a private /tmp/poly-inst/$USER
onto /tmp
for each (non-root) user.
To ensure that isolation is complete, the pam_namespace
requires that the instance prefix directory is not accessible to any-one (ie. root-owner and 000
mode).
The complete format is documented in namespace.conf(5)
man page.
A side-effect of this is that even system daemons cannot use files of user’s /tmp
.
Kerberos is one of them : if Kerberos is configured to use /tmp
to store credential cache, when the users logs-in the credential cache may be written in the wrong /tmp
.
As such, using polyintanciated directories induces that system daemon cannot share files with users. For kerberos, the usage of KCM
credential caches is a way to do. See Kerberos handbook for details about Kerberos.
This basic usage implements the first use case : User isolation
Customized usage¶
A more advanced usage of the pam_namespace
is to use custom scripts to setup the user’s namespace. Ocean’s delivers a script that wraps most of the heavy work.
You can enable the usage of this with the following configuration:
# POLYDIR INSTANCE_PREFIX METHOD UIDS
/ccc none tmpfs:create=0700,root,root:mntopts=size=1M:iscript=ccc.setup root
This configuration mount an empty /ccc using a tmpfs
. The tmpfs
is set-up with some option (items separated with commas) to change permissions and size of /ccc
.
The iscript
configuration configures a script to be executed on namespace initialization. This script location is relative to /etc/security/namespace.d
.
The pam_namespace
gives 4 arguments :
The
polydir
(ie. ``/ccc`)The instance path (ie
tmpfs
here,/tmp/poly-inst/$USER
for the previous use-case)A boolean indicated if the instance path was newly created. Always true for tmpfs.
The username
Ocean’s ccc.setup
script uses another configuration file that is quite similar to the /etc/fstab
file. This file is located in /etc/fstab_user
and follows the following format :
# SOURCE DESTINATION KIND OPTS GROUP_FILTER
/run/mount/private/A A bind defaults,rw A
/run/mount/private/B B bind defaults,rw B
/run/mount/private/A A bind defaults,ro B
A C symlink defaults A
B C symlink defaults B
Each line corresponds to a new folder or symbolic link (given the 3rd field) relative to the instance path. The name of this is given with the 2nd field.
For bind mounts, the 1st field is the mount point to bind-on. For symbolic links, the 1st field is the target of the link.
The 4th field is only used for bind mounts and indicates mount options to use when creating the bind mount. In most cases, only ro or rw is used. defaults
is a keyword for no additional options.
The 5th field indicates a group filter for the mount/link of the given line. The filter can be negated by prepending a ~
in front of the filter, multiple groups are comma-separated.
The this configuration gives the following result for group A’s users:
/ccc/A
: Bind-mount (read/write) on /run/mount/private/A
/ccc/C
: Symbolic-link to /ccc/A
And for B’s users:
/ccc/A
: Bind-mount (read-only) on /run/mount/private/A
/ccc/B
: Bind-mount (read/write) on /run/mount/private/B
/ccc/C
: Symbolic-link to /ccc/B
Debug¶
To list mount namespaces currently used, execute the lsns
command. This will list a namespaces and associated process.
To enter an already existing mount namespace, you can use the nsenter
command :
# nsenter -t %PID -m
# [ Shell inside namespace.. ]
To debug the script’s execution, you can use the execsnoop
tool from the bcc-tools
package : /usr/share/bcc/tools/execsnoop
.