FreeBSD/Jails

Jails are a really neat feature of FreeBSD that allow a certain amount of virtualization, without requiring expensive emulation. I seem to have worked out a rather sleek configuration, but it took awhile and concise how-to type of documentation is lacking, so here's what I do.

Jail Caveats
Normally, a jail has additional security restrictions that do not apply to a typical host or VM. A hardware host or VM configures its own IP, but a jail is restricted to the IP given by the parent host. This means jails typically have tighter network security than VMs or rented hardware. However to enforce this, jails are not allowed to raw sockets or BPF by default, since those allow spoofing and sniffing traffic.

UNAME_r=`freebsd-version` freebsd-update upgrade -r 10.3
 * `freebsd-update upgrade` is currently broken in jails, because it looks at `uname -r` instead of `freebsd-version`. This is particularly problematic because you must upgrade the host first, and reboot during that upgrade, or risk compatibility problems between the new userland and old kernel. Strangely, this has been broken for a long time, and there is mailing list discussion about it, but no motion to fix it in spite of the triviality. The suggested work around is to specify a variable to the uname command (see man uname). Try:
 * To ping from inside a jail, you need to allow raw sockets. This allows the jail to circumvent the normal restrictions for jailed network IO.
 * To tcpdump (even on lo) or run dhcpd inside a jail, you need to expose /dev/bpf*. This allows the jail to circumvent the normal restrictions for jailed network IO.
 * To run VirtualBox inside a jail, you need to expose /dev/vbox*. This is may not be considered secure; I'm not sure.
 * If you need System V IPC facilities in a jail, you need to set allow.sysvipc. For example, PostgreSQL requires this. These IPCs are currently shared, so if you run more than one PostgreSQL server in jails, each must run under a unique UID, even across jails. Obviously this implies insecure trust between jails. I suspect that this will change eventually.
 * Lots of packages (like bash) want fdescfs on /dev/fd, so you probably want this. But the built-in feature to mount /dev happens after other mounts, so mounting /dev/fd will fail because the mountpoint won't yet exist. The solution is, you must set mount.nodevfs and list /dev explicitly. I guess this is because the jail's root directory must be mounted before /dev, and the logic to sort out the correct order was too complicated. Perhaps this will be improved some day.

Host configuration
Each jail needs a unique IP and hostname. Typically one would assign aliases with ifconfig on the main NIC, but you can also assign private IPs on lo0 and NAT them with PF. Each also needs a unique root directory. I suggest creating all the jail roots patterned after the hostname, for example, /j/$hostname. This simplifies jail.conf. Set jail_enable=YES in /etc/rc.conf.

/etc/jail.conf
With a little clever thinking, you can generally avoid nearly all redundancy in jail.conf. Here's a nice example with some settings one would commonly want to consider. Please read about each setting though; your security requirements may vary.

allow.set_hostname = 0; exec.clean; exec.consolelog = "/var/log/jail_${name}_console.log"; enforce_statfs = 1; host.hostname = "$name.example.com"; path = "/j/$name";
 * 1) allow.raw_sockets = 1;

exec.start = "/bin/sh /etc/rc"; exec.stop = "/bin/sh /etc/rc.shutdown";
 * 1) exec.prestart = "/usr/bin/true";
 * 2) exec.poststop = "/usr/bin/true";

devfs_ruleset = 4; mount.nodevfs; mount += "dev /j/$name/dev devfs rw,ruleset=$devfs_ruleset"; mount += "fdesc /j/$name/dev/fd fdescfs rw", "proc /j/$name/proc procfs rw";

mount += "/home /j/$name/home nullfs rw";
 * 1) This will share /home with the host.

jail0 { ip4.addr = 10.0.16.0; #allow.sysvipc = 1; } jail1 { ip4.addr = 10.0.16.1; #allow.sysvipc = 1; } jail2 { ip4.addr = 10.0.16.2; #allow.sysvipc = 1; }

/etc/devfs.rules
The devfs ruleset 4 is reasonable for most purposes. But if you want to expose other device nodes to your jails, such as for running VirtualBox or tcpdump, you'll need to create another ruleset. The default rulesets are in /etc/defaults/devfs.rules; don't modify this file because you'll lose your changes during upgrades. Instead create a new file, /etc/devfs.rules, and put the following in it. Although these files allow you to name the rulesets, unfortunately the names aren't accessible outside of these files, so the numbers are more important.

Caveat: inclusions are not recursive. So even though ruleset 4 includes 1, 2, and 3, including 4 will not include them.

[devfsrules_loosejail=100] add include 1 add include 2 add include 3 add include 4 add path bpf* unhide add path vbox* unhide mode 0660 group 920

Then you'll want to change devfs_ruleset in /etc/jail.conf. Change the one shown above to affect all jails, or add "devfs_ruleset = 100;" in a jail-specific section to affect only that jail.

Preparing each jail
The features of ZFS are particularly nice for jails. If you create a filesystem for each jail, then you can snapshot and rollback separately. For example, if you botch an upgrade, you can rollback just that one jail without affecting the others, if each has a separate filesystem. Also, you can create one jail from a snapshot of another using zfs clone. So start with "zfs create /j/foo".

The easiest way to prepare a new jail is to simply extract the FreeBSD distfiles. Get them from /usr/freebsd-dist on the installation media, or download them from ftp.freebsd.org. They have names like "{base,doc,games,kernel,lib32}.txz".

Then add a paragraph to /etc/jail.conf, and run "service jail start foo" to start the new jail. That's it.

Controlling operations

 * Start the new jail with service jail start jail1.
 * Start all unstarted jails with service jail start. This will skip jails that are already started.
 * jls will list jails
 * jexec jailid /bin/sh will give you a shell in the new jail
 * The jail's console log goes in the jail's /var/log if you've enabled exec.consolelog as suggested above.