Configuring Unprivileged LXC containers in Debian Jessie
The gradual maturity of Linux Control Groups and in-kernel namespaces (i.e. net, user, mount, IPC, etc) has enabled powerful OS-level virtualisation utilities such as Docker and LXC to offer a lightweight, high-performing alternative to typical hardware based virtualisation (e.g. KVM).
With OS-level virtualisation the host's kernel handles all system calls generated from OS-level VMs resulting in less resource overhead when compared to the hardware virtualisation approach which requires an intermediary hypervisor (e.g. KVM, Hyper-V) to emulate the system calls on behalf of the VM. The isolation of OS-level VMs is performed entirely in the kernel (in software) through the use of namespaces; this eliminates the need for any particular hardware (e.g. svn, vmx) that is presently necessary for accelerated hardware based virtualisation.
While the nature of these characteristics enables greater utilisation of hardware resources when deploying Linux-only applications (even across differing hardware platforms) there are notable drawbacks pertaining to VM migration inflexibility (unless hosts have same kernel), Linux-only support, and comparably weaker security. The latter of which has been substantially improved upon with the advent of unprivileged containers in LXC 1.0.
Intro
During my time at the North America LinuxCon event in Seattle I attended a wide range of technical talks ranging from Linux kernel performance tuning to the potential applications of the BPF (Berkley Packet Filter) in-kernel virtual machine. One particular topic of interest that cropped up numerous times across the majority of keynotes and technical talks was Containers (OS-level virtualisation) and the variety of tools that make the most effective use of them in an enterprise environment (i.e. Docker, Kubernetes). Seeing as they were mentioned on so many occasions I felt it benefical to focus my time and effort on learning the underlying tools/technologies to a greater extent before getting my hands dirty with applications like Docker.
Having configured LXC within Debian Wheezy on my server a few years back I was already aware of the basic concepts relating to Linux kernel namespaces and control groups from a LXC perspective. One important aspect I learnt about LXC early on was that if a container ran as the root user (as was the norm initially) was compromised (e.g. a buggy syscall) than the underlying host was entirely at the mercy of the malicious, root privilege wielding attacker - yikes!
Various security "wrappers" were, and still are, available to help reduce the attack surface (i.e. AppArmor, SELinux, seccomp, grsec). In spite of these various protection mechanisms Stéphane Graber, an upstream maintainer of LXC and Canonical employee, states that the implementation of user namespaces required for enabling unprivileged containers in LXC is "[...] probably the only way of making a container actually safe".
Stéphane continues to explain that in an unprivileged container "LXC is no longer running as root so even if an attacker manages to escape the container, he’d find himself having the privileges of a regular user on the host." (source).
In an effort to learn more about the LXC userspace (/usr/bin/lxc-*
) tools I migrated some of my existing hardware based VMs to privileged LXC containers (VMs), consequently benefiting from a notable improvement in performance whilst reducing the VM's overall memory footprint. My own curiosity resulted in me exploring ("googling") the possibility of running GUI applications such as Iceweasel or Skype within a LXC container.
Ultimately, this guide stems from the obvious requirement in Stéphane Graber's guide to running GUI applications within unprivileged containers.
While modern versions of Ubuntu (14.04 upwards) are shipping a working unprivileged LXC setup out of the box the Debian Jessie 8.2 offering is sadly lacking in comparison. Having found no thorough setup guides for Debian I thought I would share my step-by-step solution to hopefully save someone from facing the same struggles I did when configuring unprivileged containers on a Debian Jessie 8.2 host.
As usual I have included a TL;DR section at the bottom of this page for those wanting to skip the reasoning of the commands used at each stage.
Packages Used
lxc: 1:1.0.6-6+deb8u1
cgroup-tools:0.41-6
uidmap: 1:4.2-3
linux-image-3.16.0-4-amd64: 3.16.7-ckt11-1+deb8u3
systemd: 215-17+deb8u2
Note: This guide assumes you are using the Debian supplied kernel provided by the linux-image-3.16.0-4-amd64 package. If you are using a custom kernel please check you have Control Group and namespace support (lxc-checkconfig
) before continuing reading.
1. Establishing LXC Unprivileged container path equivalents
The tools required for creating and configuring LXC unprivileged containers do not automatically create the user specific directory/file layout required by user owned containers. The following system-wide to per-user basis LXC configuration layout mappings have been sourced from Stéphane Graber's blog (here):
/etc/lxc/lxc.conf => ~/.config/lxc/lxc.conf
/etc/lxc/default.conf => ~/.config/lxc/default.conf
/var/lib/lxc => ~/.local/share/lxc
/var/lib/lxcsnaps => ~/.local/share/lxcsnaps
/var/cache/lxc => ~/.cache/lxc
1. Lets recreate this layout in the order above as follows:
mkdir -p ~/.config/lxc
touch ~/.config/lxc/{lxc,default}.conf
mkdir -p ~/.local/share/{lxc,lxcsnaps}
mkdir -p ~/.cache/lxc
2. Enabling kernel features and extending user UIDS & GIDS
By default the Debian maintained 3.16.0-4 Linux kernel used by Jessie 8.2 has a crucial kernel feature required for cloning user namespaces disabled by default.
According to the Debian bug report #712870 (here) user namespaces have been enabled in the Debian tailored Linux kernels (3.12-1~exp1 upwards) since late November 2013 but have "Restrict creation of user namespaces to root (CAPSYSADMIN) by default (sysctl:kernel.unprivileged_userns_clone)".
To allow an unprivileged user to create a user namespace we need to enable that feature in the kernel.
2. Create a sysctl
configuration file /etc/sysctl.d/80-lxc-userns.conf
for enabling the required unprivileged_userns_clone flag at boot:
kernel.unprivileged_userns_clone=1
3. Reload sysctl
so it takes into account the newly created /etc/sysctl.d/80-lxc-userns.conf
configuration file:
sudo sysctl --system
4. Check that the unprivileged_userns_clone flag has been set for the running session:
cat /proc/sys/kernel/unprivileged_userns_clone
If sysctl
has done its job correctly the value returned from the above command should be '1'.
5. Install the uidmap package required for allowing unprivileged users create UID and GID mappings within user namespaces:
sudo apt-get install uidmap -y
6. Create a "subid" (subordinate id) range for both user UIDS and GIDS which will serve as a mapping inside unprivileged containers:
sudo usermod --add-subuids 100000-165536 $USER
sudo usermod --add-subgids 100000-165536 $USER
These subids will persist between system reboots as usermod
will have written them to /etc/subuid
and /etc/subgid
respectively.
7. Configure the user specific LXC default configuration file (~/.config/lxc/default.conf
) to use the 100000-165536 UID and GID ranges for all unprivileged containers. This ensures that processes inside the unprivileged container have their respective UIDS and GIDS remapped from the typical 0-65536 range to the 100000-165536 range on the host. The host is aware that these "subid" mappings (stored in /etc/subuid
& /etc/subgid
) are owned by the user ($USER) and consequently enforce preexisting user restrictions upon them.
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536
3. Configuring cgroup Tools
My own investigations into configuring unprivileged containers in Debian Jessie lead to the unearthering of two daemons that provided high-level control for dynamically manipulating control groups on the fly:
cgmanager
: part of the 'cgmanager' package that enables applications and users to configure cgroups via D-Bus requests.cgrulesengd
: part of the 'cgroup-tools' package that detects when processes change its "effective" UID or GID and inspects a list of rules to determine what to do with the process; e.g. move process to a preconfigured control group.
Both offerings provided an operable environment for deploying working unprivileged containers but I found the latter to be more suited for my needs.
The package 'cgroup-tools' contains several userspace command-line utilities and the system daemon,cgrulesengd
, that interfaces with the libcgroup
library for manipulating, controlling, administrating and monitoring cgroups of various controllers (i.e. blkio, cpu, etc.).
Examining the 'cgroup-tools' package's /usr/share/doc/cgroup-tools/TODO.Debian
reveals: "* come up with an failsafe, upgrade-proof and admin friendly initscript" - looks like we need to configure the daemon's startup/shutdown behaviour ourselves - see it as an opportunity to learn more about the 'cgroup-tools' utilities and how they work together!
The /usr/share/doc/cgroup-tools/examples/
directory contains various template configuration files that are utilised later on in this guide.
8. Install the 'cgroup-tools' package:
sudo apt-get install -y cgroup-tools
9. Create the /etc/sysconfig
directory for storing the provided example Control Group Rules Engine Daemon configuration file /usr/share/doc/cgroup-tools/examples/cgred.conf
which will be later used by the cgrulesengd
daemon and can remain as is through this entire guide:
sudo mkdir /etc/sysconfig
sudo cp /usr/share/doc/cgroup-tools/examples/cgred.conf /etc/sysconfig/cgred.conf
10. With your favorite text editor create the /etc/cgconfig.conf
file. This configuration file serves as a structured means of declaring control group parameters and mount points. Further information for particular sections of the configuration file can be found with: man cgconfig.conf. A basic working example of this file for enabling unprivileged containers is as follows:
group username_here {
perm {
task {
uid = username_here;
gid = username_here;
}
admin {
uid = username_here;
gid = username_here;
}
}
# All controllers available in 3.16.0-4
# Listed by running: cat /proc/cgroups
cpu {}
blkio {}
cpuacct {}
cpuset {
cgroup.clone_children = 1;
cpuset.mems = 0;
cpuset.cpus = 0-3;
}
devices {}
freezer {}
perf_event {}
net_cls {}
net_prio {}
# The memory controller is not enabled by default in Debian Jessie despite being enabled in the kernel
# If you enable it add the following
memory { memory.use_hierarchy = 1; }
}
Quite a bit is going on with the configuration file above so i'll try and explain what is happening in a succinct manner (if possible!).
group username_here {
: We create a control group called "username_here" (can be named anything that adheres to directory naming conventions) that encompasses thepermissions
and subsequenttask
andadmin
children stanzas.
Notice the kernel subsystem controllers further down the file: "cpu {}", "blkio {}", etc. By including these we tell thecgrulesengd
(and consequentlylibcgroup
) that we wish our control group to be governed by these subsystem controllers.
This results in the creation of a "username_here" directory within the mounted cgroups virtual filesystem underneath each declared subsystem:/sys/fs/cgroup/[subsystem]/username_here
.perm {
: Permissions to use and alter the control group are assigned to a UID and GID in the task (task {
) and admin (admin {
) child stanzas respectively.
Thetask
UID/GID owns the/sys/fs/cgroup/[subsystem]/username_here/tasks
file, which is in itself a simple list containing PIDs of all processes in currently running within that control group.
Theadmin
UID/GID owns the remaining files within the control group.Finally, lets examine the three critical options present in the "cpuset" stanza:
cgroup.clone_children = 1;
: This option is nicely summarised by the official Cgroups kernel documentation (here) as: "This flag only affects the cpuset controller. If the clone_children flag is enabled (1) in a cgroup, a new cpuset cgroup will copy its configuration from the parent during initialization."
cpuset.mems = 0;
: Specifies the memory nodes from a NUMA perspective that processes under this control group can access. Only applies to systems using NUMA where memory (RAM) nodes are assigned to particular processors. The number of NUMA nodes can be determined by outputting the kernel's "buddyinfo" detailscat /proc/buddyinfo
.
This option is mandatory for the cpuset subsystem to function correctly.
cpuset.cpus = 0-3;
: Specifies the CPUs tasks within the control group are allowed to execute upon. The number of available CPUs can be listed via the 'lscpu' command:lscpu | grep ^On-line
.
This option is mandatory for the cpuset subsystem to function correctly.
Again, further information (i.e. file/directory permissions) in addition to some realistic examples can be found in the man pages: man cgconfig.conf. The aim here is to create a minimum base for successfully deploying unprivileged LXC containers.
11. With your favorite text editor create the /etc/cgrules.conf
file. This configuration file is read by the cgrulesengd
daemon (and consequently libcgroup
) in order to determine which control group a process belongs to as well as its destination:
# <user>:<process_name> <controllers> <destination>
username_here * username_here
The configuration layout above can be interpreted as: "All processes started by username_here for all listed controllers (in /etc/cgconfig.conf
) belong to the control group called username_here"
12. While the cgrulesengd
daemon handles the automatic distribution of processes to their corresponding control groups (and consequential cloning of parent control group files) based off the rules present in the /etc/cgrules.conf
, the creation of the control group directory "username_here" for each kernel subsystem controller (e.g. /sys/fs/cgroups/{cpu,blkio,devices,...}/username_here/
is performed by the cgconfigparser
utility.
The cgconfigparser
utility must successfully read the /etc/cgconfig.conf
file and populate the necessary controller directories BEFORE the cgrulesengd
daemon is started. To achieve this I copied the systemd service file cgconfig.service
found in the Fedora 20 GNU/Linux distribution (here) and saved it under /lib/systemd/system/cgconfig.service
:
[Unit]
Description=Control Group configuration service
# The service should be able to start as soon as possible,
# before any 'normal' services:
DefaultDependencies=no
Conflicts=shutdown.target
Before=basic.target shutdown.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/cgconfigparser -l /etc/cgconfig.conf -s 1664
ExecStop=/usr/sbin/cgclear -l /etc/cgconfig.conf -e
[Install]
WantedBy=sysinit.target
Note: Be careful when using the /usr/sbin/cgclear
utility manually, if you omit the "load configuration file" flag (-l
) you will remove the preexisting systemd control groups hierarchy from the running session. Having only just started to get to grips with systemd i'm not sure how to go about reverting this operation without restarting the host.
In order to avoid this ensure you always use "load configuration file" flag (-l
) to remove just the control groups configured or follow the instructions presented /usr/share/docs/cgroup-tools/README_systemd
for compiling libcgroup
with the option to purposefully ignore the 'name=systemd' hierarchy.
13. To ensure the cgconfigparser
utility is started at boot time we need to enable the service file:
sudo systemctl enable cgconfig
14. We can now start the "oneshot" cgconfig.service
systemd service file that will create the required control group directories under the previously specified subsystem controllers (e.g. /sys/fs/cgroups/{cpu,blkio,devices,...}/username_here/
):
sudo systemctl start cgconfig
15. Confirm that the "username_here" control group has been created: lscgroup
You should see something similar to the following:
cpu,cpuacct:/
cpu,cpuacct:/username_here
devices:/
devices:/username_here
16. With your favorite text editor create a systemd service file /lib/systemd/system/cgred.service
for the cgrulesengd
daemon. Again, I have used the Fedora 20 GNU/Linux distribution template cgred.service
service file (here):
[Unit]
Description=CGroups Rules Engine Daemon
After=syslog.target
[Service]
Type=forking
EnvironmentFile=-/etc/sysconfig/cgred.conf
ExecStart=/usr/sbin/cgrulesengd $OPTIONS
[Install]
WantedBy=multi-user.target
17. Lets enable the cgred.service
to ensure that the cgrulesengd
daemon starts at boot from now on:
sudo systemctl enable cgred
18. With our newly created cgred
service file lets start the cgrulesengd
daemon in our currently running session:
sudo systemctl start cgred
19. Check that new processes are being moved to the user defined control group "username_here" by outputting the contents of /sys/fs/cgroup/[subsystem]/username_here/tasks
file:
cat /sys/fs/cgroup/[subsystem]/username_here/tasks
You should be seeing a list of PID values assuming the cgrulesengd
daemon is running correctly and has migrated the existing $USER processes to the appropriate control group within each subsystem controller.
If nothing has happened try invoking a new process (e.g. open up an interactive application) and then check the tasks file of any of the subsystems (the susbsystem does not matter as we told the cgrulesengd
daemon to use the control group "username_here" for all available subsystems in the /etc/cgconfig.conf
file).
If tasks are still not showing up then stop the cgrulesengd
daemon with sudo systemctl stop cgred
and invoke it manually in the foreground with high verbosity and logging to /var/log/syslog.log
:
sudo cgrulesengd -n -vvv -s
Examine the output of the daemon on a live basis by "tailing" the end of the syslog.log file: sudo tail -f /var/log/syslog.log
4. Creating an unprivileged container
Assuming all previous steps have been successfully followed (without any errors!) we have now configured the host environment sufficiently enough to support the creation and execution of unprivileged containers!
The various limitations imposed by user namespaces (i.e. disallowed loop/filesystem mounts and forbidden usage of mknod
) mean that traditional LXC templates (located at: /usr/share/lxc/templates/lxc-
) used for creating privileged containers will (most likely) fail when invoked by an unprivileged user. With different GNU/Linux distributions each having their own particular LXC-specific bootstrapping mechanism (as dictated by their corresponding template script) Stéphane Graber constructed a new template, "download", which alleviates the requirement of understanding distro-specific bootstrapping mechanisms by permitting the download of various daily pre-built GNU/Linux distribution rootfs
that have been specially configured to operate within a restricted unprivileged container.
Opening up the "download" LXC template script (/usr/share/lxc/templates/lxc-download
) reveals that the daily pre-built rootfs
are downloaded (securely via GPG where possible) from the webserver images.linuxcontainers.org
(as declared by the DOWNLOAD_SERVER
Bash variable). Beyond this, the "download" template selects the corresponding menu metadata list images.linuxcontainers.org/meta/1.0/index-$DOWNLOAD_COMPAT_LEVEL
for allowing the user to download either unprivileged or privileged container (yes the webserver offers both!) rootfs
depending on the execution environment the template determines to be in; for example if the script was executed by root it then lists the corresponding privileged container rootfs
offerings.
By default the "download" template included with the 'lxc' package (1:1.0.6-6+deb8u1) for Debian Jessie 8.2 has a variable set, DOWNLOAD_COMPAT_LEVEL=1
, that results in a reduced offering of pre-built GNU/Linux distributions (i.e. older releases of distributions - no Jessie!) suitable for the restricted unprivileged container environment. This variable results in the script downloading and parsing the metadata menu file https://images.linuxcontainers.org/meta/1.0/index-user.1
as opposed to the "fuller" https://images.linuxcontainers.org/meta/1.0/index-user
metadata menu file.
By changing the DOWNLOAD_COMPAT_LEVEL
variable and setting it as an empty value (i.e. DOWNLOAD_COMPAT_LEVEL=
) the "download" template will obtain the more complete GNU/Linux distribution list for the unprivileged container environment.
Errors Ahead!: Having personally tried editing the DOWNLOAD_COMPAT_LEVEL
variable within Debian Jessie 8.2 in an effort to download a Debian Jessie rootfs
suitable for the unprivileged container environment I was faced with the single line error: Failed to mount cgroup at /sys/fs/cgroup/systemd: Operation not permitted
. Several other individuals attempting a Debian Jessie deployment in an unprivileged container environment have also encountered this issue and Thomas Dalichow has narrowed down the problem (details) to the limitations imposed by the older systemd version running by default on Debian Jessie 8.2 (215-17+deb8u2).
Thomas did mention however that the version of systemd released by the testing branch of Debian ("Stretch") which, at his time of posting was 220-5 (it is now currently at 226-3), worked correctly with the various unprivileged container rootfs
that were linked to by the https://images.linuxcontainers.org/meta/1.0/index-user
metadata menu file.
20. Lets start by creating an unprivileged Debian "Wheezy" amd64 container via the provided "download" template:
lxc-create --name my_unprivileged_container --template download -- --dist debian --release wheezy --arch amd64
Note: Omitting one or more of the three platform options ('--dist'
, '--release'
, or '--arch'
) will invoke a basic interactive mode that will list available templates using any of the supplied flags as a simple filter. Passing no platform flags will result in all the distribution platforms being listed and interactively prompted.
21. LXC should now be downloading the rootfs
of the Debian "Wheezy" amd64 based unprivileged container and preparing it ready for use. Once downloaded this particular deployment (Debian "Wheezy" amd64) can be reused much quicker for another unprivileged container instance as the special rootfs
has been stored under ~/.cache/lxc/download/debian/wheezy/amd64
.
22. As there are no preconfigured users and root has not been assigned a password for security purposes we will need to start the unprivileged container as a daemon:
lxc-start --name my_unprivileged_container --daemon
23. Instead of accessing the container through a console interface we communicate to it via the lxc-attach
command. Lets do this now to set the root user's password so we can access the container through the traditional lxc-console
console.
lxc-attach --name my_unprivileged_container -- passwd
24. Check that you can login to the unprivileged container as the root user via the console:
lxc-console --name my_unprivileged_container
Note: Press "ctrl-a, q" to exit the unprivileged container console prompt and return back to the host's console prompt.
If all has worked now and you are logged into your unprivileged container as the root user then congratulations your environment has been configured correctly!
If you are facing issues at this stage try running the container without the --daemon
flag and examine the output produced. A series of links I find helpful in the "credits" section at the bottom of this page may be of help to resolve your particular issue.
5. Governing network access
Thankfully the process for configuring network access for unprivileged containers is (as far as I am currently aware!) identical to privileged containers with the exception of one particular aspect.
Unlike privileged containers, a dedicated system-wide configuration file (/etc/lxc/lxc-usernet
) is required to limit the amount of "veth" pairs that a user can create as well as the network bridges that the user can connect to.
25. With your favorite text editor create the system-wide LXC configuration file /etc/lxc/lxc-usernet
for governing network access for unprivileged containers. For example lets say we limit user "username_here" to only access the bridge "lxcbr0" with a maximum of 2 "veth" pairs.
# <user> <link_type> <bridge> <#_of_links>
username_here veth lxcbr0 2
Note:
Omit the "field
" headers line when writing the /etc/lxc/lxc-usernet
configuration file. They simply serve as an explanation for the example configuration.
26. If you have not already stopped the unprivileged container targeted for network access then do so now:
lxc-stop --name my_unprivileged_container
27. With your favorite text editor edit the targeted unprivileged container's configuration file (e.g. ~/.config/share/lxc/my_unprivileged_container/config
) and append the following network configuration:
lxc.network.type = veth
lxc.network.link = lxcbr0
lxc.network.flags = up
This minimal network configuration will provide the container with network access to the bridge lxcbr0
.
28. Finally lets start the container and check that it has internet access by updating APT (assuming the network that bridge lxcbr0
is connected to has DHCP preconfigured, external internet connectivity, and a functional DNS server):
lxc-start --name my_unprivileged_container --daemon
lxc-attach --name my_unprivileged_container -- apt-get update
TL;DR
Creating the user-specific LXC directory & file layout
mkdir -p ~/.config/lxc
touch ~/.config/lxc/{lxc,default}.conf
mkdir -p ~/.local/share/{lxc,lxcsnaps}
mkdir -p ~/.cache/lxc
Kernel capabilities & parameter alteration
# Check running kernel supports LXC (namespaces & control groups)
lxc-checkconfig
# Enable "unprivileged_userns_clone" between restarts
echo "kernel.unprivileged_userns_clone=1" | sudo tee -a /etc/sysctl.d/80-lxc-userns.conf
# Reload sysctl rules for the current session
sudo sysctl --system
Subordinate UID & GID mappings
# Create a set of Subuids & Subgids for the current user
sudo usermod --add-subuids 100000-165536 $USER
sudo usermod --add-subgids 100000-165536 $USER
# Set the user-specific LXC configuration file to use this Subuid & Subgid range ~ (~/.config/lxc/default.conf)
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536
Configuring 'cgroup-tools' utilities & daemons
# Install the 'cgroup-tools' package
sudo apt-get install cgroup-tools
# Copy the provided example 'cgred.conf' file to '/etc/sysconfig'
sudo mkdir /etc/sysconfig
sudo cp /usr/share/doc/cgroup-tools/examples/cgred.conf /etc/sysconfig/cgred.conf
# Copy the provided example 'cgconfig.conf' file to '/etc'
sudo cp /usr/share/doc/cgroup-tools/examples/cgconfig.conf /etc/cgconfig.conf
# Edit the '/etc/cgconfig.conf' to establish a minimal working baseline
*** SEE STEP 10. IN FULL GUIDE ***
# Create the '/etc/cgrules.conf' file and populate it with a minimal working baseline
*** SEE STEP 11. IN FULL GUIDE ***
# Create a systemd startup service file for the 'cgconfigparser' utility ~ (/lib/systemd/system/cgconfig.service)
*** SEE STEP 12. IN FULL GUIDE ***
# Register the systemd startup service file, enable it for starting at boot, and start it for current session
sudo systemctl enable cgconfig
sudo systemctl start cgconfig
# Create a systemd startup service file for the 'cgrulesengd' daemon ~ (/lib/systemd/system/cgred.service)
*** SEE STEP 16. IN FULL GUIDE ***
# Register the systemd startup service file, enable it for starting at boot, and start it for current session
sudo systemctl enable cgred
sudo systemctl start cgred
# Confirm both 'cgrulesengd' and 'cgconfigparser' are operating correctly:
cat /proc/self/cgroup
Run an unprivileged container
# Download a custom configured rootfs for unprivileged container usage (will prompt for distro, release, and arch)
lxc-create --name unprivileged_container --template download
# Run the unprivileged container
lxc-start --name unprivileged_container --daemon
# Set the root password to enable typical console login
lxc-attach --name unprivileged_container -- passwd
# Login to the unprivileged container as the root user
lxc-console --name unprivileged_container
Governing network access
# Create a configuration file for governing user network access ~ '/etc/lxc/lxc-usernet'
# Limits the number of "veth" pairs & bridge connection authorisation) - See Step 25
username veth br0 2
# Edit the configuration file for the network targeted unprivileged container ~ '$HOME/.local/share/lxc/unprivileged_container/config'
lxc.network.type = veth
lxc.network.link = br0
lxc.network.flags = up
# Start the unprivileged container with network support
lxc-start --name unprivileged_container --daemon
# See if the unprivileged container has been given an IP address (assuming DHCP on attached bridged network)
lxc-attach --name unprivileged_container -- ip addr
Final Words
The configuration presented here will place all user owned processes into their own control group but it will not apply any sort of hardware resource restrictions (as evident from the /etc/cgconfig.conf
file). This means that sub-control groups can be employed (with varying, granular resource restrictions if desired), they will just be nested under the "username_here" group is all (e.g. /sys/fs/cgroup/[subsystem]/username_here/subgroup_name/
).
By creating the cgconfig.service
and cgred.service
systemd service files the environment for enabling the creation and execution of unprivileged containers will persist between reboots.
This guide blog post was written via an Iceweasel browser running inside an unprivileged container on: Linux Debian-Jessie 3.16-0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) x86_64 GNU/Linux.
I intend to write a follow-up post in the near future for running various GUI applications within unprivileged containers on Debian Jessie 8.2.
If you've spotted any mistakes/typos I've made or you'd like to comment/question any aspect feel free to leave a response in the comment box below.
Credits
A wide range of web sources were used to understand how to go about configuring unprivileged containers in Debian Jessie 8.2. If my troubleshooting tips have not resolved any issues you faced from following this guide (sorry!) than these links/man pages may serve to remedy your situation:
- https://www.stgraber.org/2014/01/17/lxc-1-0-unprivileged-containers/
- http://unix.stackexchange.com/questions/170998/how-to-create-user-cgroups-with-systemd
- https://wiki.archlinux.org/index.php/Cgroups
- https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Using_Control_Groups.html
- man cgconfig.conf
- man cgrules.conf
- man cgconfigparser
- man cgrulesengd