Using simple seccomp filters
Introduction
The Linux kernel (starting in version 3.5) supports
"
seccomp filter"
(or "mode 2 seccomp"). Ubuntu 12.04 LTS had it backported to its 3.2 kernel,
and Chrome OS has been using it (in various forms) for a while.
This document is designed as a quick-start guide
for software authors that want to take advantage of this security feature.
In the simplest terms, it allows a program to declare ahead of time
which system calls it expects to use, so that if an attacker gains
arbitrary code execution, they cannot poke at any unexpected system calls.
The full seccomp filter documentation
can be found in the Linux kernel source,
here.
The seccomp filter system uses the Berkley Packet Filter system. Combined
with argument checking and the many possible filter return values (kill, trap, trace, errno), this is
allows for extensive logic. This document seeks to show only the minimal
case of defining a syscall whitelist. Everything not added to this filter
causes the program to be killed.
To determine which seccomp features are available at runtime, please
see the
seccomp autodetection examples.
Since it is not always obvious to see which syscalls are being called by
the various libraries a program might use, this document also includes
example code that provides a helper to assist in discovering unwhitelisted
syscalls during filter development.
Example Program
First, we start with an example program that reads stdin, writes to stdout, sleeps,
and exits. We want to make sure it never calls "fork", so we've added that to the end
so we can verify that seccomp filter is working, once it gets added.
/*
* seccomp example with syscall reporting
*
* Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
* Authors:
* Kees Cook <keescook@chromium.org>
* Will Drewry <wad@chromium.org>
*
* Use of this source code is governed by a BSD-style license that can be
* found in the LICENSE file.
*/
#define _GNU_SOURCE 1
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <unistd.h>
#include "config.h"
int main(int argc, char *argv[])
{
char buf[1024];
printf("Type stuff here: ");
fflush(NULL);
buf[0] = '\0';
fgets(buf, sizeof(buf), stdin);
printf("You typed: %s", buf);
printf("And now we fork, which should do quite the opposite ...\n");
fflush(NULL);
sleep(1);
fork();
printf("You should not see this because I'm dead.\n");
return 0;
}
When we build and run this now, we get:
$ autoconf
$ ./configure
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
configure: creating ./config.status
config.status: creating config.h
$ make
gcc -Wall -c -o example.o example.c
gcc example.o -o example
$ ./example
Type stuff here: asdf
You typed: asdf
And now we fork, which should do quite the opposite ...
You should not see this because I'm dead.
You should not see this because I'm dead.
Everything is working, even the "fork" we want to eliminate.
Adding basic seccomp filtering
Next, we include the fancy "
seccomp-bpf.h" header.
Additionally, this also updates
an example "
configure.ac" to check for the new
"linux/seccomp.h" include, since "seccomp-bpf.h" would like to use it. Then we build
our initial list of basic system calls we expect (signal handling, read, write, exit).
The flow of a simple seccomp BPF starts with verifying the architecture (since syscall
numbers are tied to architecture), and then loads the syscall number and compares
it against the whitelist. If no good match is found, it kills the process:
--- step-1/example.c 2012-03-22 21:43:10.845732543 -0700
+++ step-2/example.c 2012-03-22 21:50:56.373304922 -0700
@@ -16,11 +16,54 @@
#include <unistd.h>
#include "config.h"
+#include "seccomp-bpf.h"
+
+static int install_syscall_filter(void)
+{
+ struct sock_filter filter[] = {
+ /* Validate architecture. */
+ VALIDATE_ARCHITECTURE,
+ /* Grab the system call number. */
+ EXAMINE_SYSCALL,
+ /* List allowed syscalls. */
+ ALLOW_SYSCALL(rt_sigreturn),
+#ifdef __NR_sigreturn
+ ALLOW_SYSCALL(sigreturn),
+#endif
+ ALLOW_SYSCALL(exit_group),
+ ALLOW_SYSCALL(exit),
+ ALLOW_SYSCALL(read),
+ ALLOW_SYSCALL(write),
+ KILL_PROCESS,
+ };
+ struct sock_fprog prog = {
+ .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
+ .filter = filter,
+ };
+
+ if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+ perror("prctl(NO_NEW_PRIVS)");
+ goto failed;
+ }
+ if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
+ perror("prctl(SECCOMP)");
+ goto failed;
+ }
+ return 0;
+
+failed:
+ if (errno == EINVAL)
+ fprintf(stderr, "SECCOMP_FILTER is not available. :(\n");
+ return 1;
+}
int main(int argc, char *argv[])
{
char buf[1024];
+ if (install_syscall_filter())
+ return 1;
+
printf("Type stuff here: ");
fflush(NULL);
buf[0] = '\0';
--- step-1/configure.ac 2012-03-22 21:40:51.651435417 -0700
+++ step-2/configure.ac 2012-03-22 21:44:19.438868163 -0700
@@ -2,4 +2,5 @@
AC_PREREQ([2.59])
AC_CONFIG_HEADERS([config.h])
AC_PROG_CC
+AC_CHECK_HEADERS([linux/seccomp.h])
AC_OUTPUT
While this gets us to a nice starting place, it's not obvious what's still needed when
we run the program, since it just blows up instead:
$ ./configure
...
checking for linux/seccomp.h... yes
configure: creating ./config.status
config.status: creating config.h
$ make
gcc -Wall -c -o example.o example.c
gcc example.o -o example
$ ./example
Bad system call
$ echo $?
159
Adding syscall reporting
Now we can utilize one of the extra features of seccomp filter, and temporarily catch
the failed syscall and report it, instead of immediately exiting. The intention is to
remove this at the end, since once we've finished our syscall list, we won't need to
change it (unless the program or its libraries change, in which case, we can do this
again).
Here, we add the "
syscall-reporter.mk" Makefile
include and the "
syscall-reporter.c" object to
the Makefile, and then add "
syscall-reporter.h"
and a call to "install_syscall_reporter" to the program.
--- step-2/example.c 2012-03-22 21:50:56.373304922 -0700
+++ step-3/example.c 2012-03-22 21:51:04.377433872 -0700
@@ -17,6 +17,7 @@
#include "config.h"
#include "seccomp-bpf.h"
+#include "syscall-reporter.h"
static int install_syscall_filter(void)
{
@@ -34,6 +35,7 @@
ALLOW_SYSCALL(exit),
ALLOW_SYSCALL(read),
ALLOW_SYSCALL(write),
+ /* Add more syscalls here. */
KILL_PROCESS,
};
struct sock_fprog prog = {
@@ -61,6 +63,8 @@
{
char buf[1024];
+ if (install_syscall_reporter())
+ return 1;
if (install_syscall_filter())
return 1;
--- step-2/Makefile 2012-03-22 19:41:02.510347542 -0700
+++ step-3/Makefile 2012-03-22 19:41:33.706847395 -0700
@@ -3,7 +3,9 @@
all: example
-example: example.o
+include syscall-reporter.mk
+
+example: example.o syscall-reporter.o
.PHONY: clean
clean:
Now, when we run it, we can see the missing syscalls, and progressively add them
until we're up to the fork (which is implemented via the "clone" syscall):
$ make
gcc -Wall -c -o example.o example.c
In file included from example.c:20:0:
syscall-reporter.h:21:2: warning: #warning "You've included the syscall reporter. Do not use in production!" [-Wcpp]
echo "static const char *syscall_names[] = {" > syscall-names.h ;\
echo "#include <syscall.h>" | cpp -dM | grep '^#define __NR_' | \
LC_ALL=C sed -r -n -e 's/^\#define[ \t]+__NR_([a-z0-9_]+)[ \t]+([0-9]+)(.*)/ [\2] = "\1",/p' >> syscall-names.h;\
echo "};" >> syscall-names.h
gcc -Wall -c -o syscall-reporter.o syscall-reporter.c
In file included from syscall-reporter.c:12:0:
syscall-reporter.h:21:2: warning: #warning "You've included the syscall reporter. Do not use in production!" [-Wcpp]
gcc example.o syscall-reporter.o -o example
$ ./example
Looks like you need syscall fstat(5) too!
$ vi example.c
...
$ make
gcc -Wall -c -o example.o example.c
gcc example.o syscall-reporter.o -o example
$ ./example
Looks like you need syscall mmap(9) too!
$ vi example.c
...
$ make
gcc -Wall -c -o example.o example.c
gcc example.o syscall-reporter.o -o example
$ ./example
Type stuff here: asdf
You typed: asdf
And now we fork, which should do quite the opposite ...
Looks like you need syscall rt_sigprocmask(14) too!
$ ...
Testing is done
This continues until we hit the report of the "clone" use, and we know we're done:
--- step-3/example.c 2012-03-22 21:51:04.377433872 -0700
+++ step-4/example.c 2012-03-22 21:51:13.577583466 -0700
@@ -36,6 +36,11 @@
ALLOW_SYSCALL(read),
ALLOW_SYSCALL(write),
/* Add more syscalls here. */
+ ALLOW_SYSCALL(fstat),
+ ALLOW_SYSCALL(mmap),
+ ALLOW_SYSCALL(rt_sigprocmask),
+ ALLOW_SYSCALL(rt_sigaction),
+ ALLOW_SYSCALL(nanosleep),
KILL_PROCESS,
};
struct sock_fprog prog = {
$ ./example
Type stuff here: asdf
You typed: asdf
And now we fork, which should do quite the opposite ...
Looks like you need syscall clone(56) too!
Ready for prime-time
Now that we're done, we can remove the syscall reporter again, and see that the
program correctly dies when it hits the fork. (To be really done, the fork should
be removed too!)
--- step-4/example.c 2012-03-22 21:51:13.577583466 -0700
+++ step-5/example.c 2012-03-22 21:51:21.785717260 -0700
@@ -17,7 +17,6 @@
#include "config.h"
#include "seccomp-bpf.h"
-#include "syscall-reporter.h"
static int install_syscall_filter(void)
{
@@ -35,7 +34,6 @@
ALLOW_SYSCALL(exit),
ALLOW_SYSCALL(read),
ALLOW_SYSCALL(write),
- /* Add more syscalls here. */
ALLOW_SYSCALL(fstat),
ALLOW_SYSCALL(mmap),
ALLOW_SYSCALL(rt_sigprocmask),
@@ -68,8 +66,6 @@
{
char buf[1024];
- if (install_syscall_reporter())
- return 1;
if (install_syscall_filter())
return 1;
--- step-4/Makefile 2012-03-22 19:55:27.056164102 -0700
+++ step-5/Makefile 2012-03-22 19:55:33.680270186 -0700
@@ -3,9 +3,7 @@
all: example
-include syscall-reporter.mk
-
-example: example.o syscall-reporter.o
+example: example.o
.PHONY: clean
clean:
$ ./example
Type stuff here: asdf
You typed: asdf
And now we fork, which should do quite the opposite ...
Bad system call
$ echo $?
159
Conclusion
Ta-da! That's it -- you've now got a seccomp filter built into your program. To make this
even more portable, you can ignore the "prctl" failures if seccomp is not available, or
warn the user but not die, or put the entire thing behind a "#ifdef HAVE_LINUX_SECCOMP_H"
test.
For more complex, or dynamic, BPF constructions, you'll probably want to take a look
at
libseccomp.
For a stand-alone filtering tool, check out
minijail.
Thanks for reading! --
Kees Cook, Mar-Nov 2012.