Runner Pools

Important information about runners and how they execute pipelines.

Runner Pools orchestrate Runners, machines or containers, which are used to execute pipelines.

Runner Pool Architecture

A Runner Pool is a singleton service that can be deployed in a High-Availability setup to manage a Pool of Runners with AWS CloudFormation, or the Kubernetes API. This communication happens outbound only, which enables secure self-hosted setups.

When provisioned, a Runner Pool first authenticates with Sophos Factory to determine if it is the leader or inactive follower. Once a Runner Pool achieves “active” status, it will begin to provision Runners based on scaling settings, as well as manage all lifecycle events of the Runner.

Runner Pool Statuses

Runner Pools can have the following statuses:

Bootup: The Runner Pool is undergoing its bootup sequence, and is unable to provision Runners.
Active: The Runner Pool is online, conducting health-checks on Runners, and provisioning new ones based on the Run queue.
Inactive: The Runner Pool is polling to become an active leader, and is not conducting any operations against Runners.
Tainted: The Runner Manager has been marked as tainted, and will be rotated with a fresh container.
Error: The Runner Manager experienced an error. It will attempt to recover to active/inactive until an error threshold is reached, and then it will rotate to Tainted and be removed.

Runner Health Checks

When a Runner is marked for deletion, it will first be marked “Tainted” and then removed after a small duration of time, in order to prevent a Run from being acquired during deletion, as long as it is not actively operating on a Run.

A Runner Pool will conduct the following health checks against a Runner:

Heart Beat: Should the Runner stop communicating with Sophos Factory for 5 minutes or longer, it will be marked for removal.
Un-Healthy VM/Container: Should the Virtual Machine or Container enter into an unhealthy state, it will be marked for removal.
Old Runner Agent: Should a Runner be an older version than the version specified for the Runner Pool, it will automatically rotate and be replaced with an updated version.
Age of Runner: Should a Runner be older than 48 hours, it will be marked for removal.

The following Timeouts are used in conjunction with the above health checks:

Create Timeout: Should a Runner take longer than this time, it will be removed.
Run Timeout: Should a Run take longer than this time, the Run will be failed, and the Runner will be removed.
Idle Timeout: Should a Runner be idle longer than this time, it will be removed.

Hosting Options

Sophos Factory provides cloud-hosted and self-hosted Runner Pools. There is no difference in functionality between the two, however, only cloud-hosted Runner Pools are fully managed by Sophos Factory in a highly secure environment.

Cloud Hosted: Sophos Factory provides Centos-8 based VMs.
Self-Hosted: Highly customizable and configurable, with the capability to orchestrate VM as well as container based runners. Learn more.

Adding a New Runner Pool

To add a new runner pool, visit the Account Settings page. Select Runner Pools in the left navigation bar, and then click the New Runner Pool button.

You can choose from cloud-hosted and self-hosted Runner Pools.

Configuring AutoScaling

We currently provide queue-depth based scaling, whereby the Runner Pool provisions Runners based on the number of items in the queue, and the total count of healthy runners. We expose the “Queue Depth Threshold” configuration setting, which allows you to define the number of Runs that can be queued per Runner. By increasing this value, scaling of Runners will be less aggressive.

Run Queue Depth X Runner Count = Queue depth size (10 runs X 10 Runners = 100 Run queue Depth)

Scaling Configuration Options:

Minimum Pool Size: The minimum number of Runners that will be provisioned.
Maximum Pool Size: The maximum number of Runners that can be provisioned.
Queue Depth Threshold: The Queue depth allowed per Runner in the pool. This is used to calculate scale-up and scale down events.
Scale Up Cooldown: The number of milliseconds to wait between Scale-Up events.
Scale Down Cooldown: The number of milliseconds to wait between Scale-Down events.

Runner Architecture

Runners that are provisioned by a Runner Pool communicate with Sophos Factory by first authenticating and then polling for available runs. This communication happens outbound only. When running a self-hosted runner in your own environment, no inbound connections are required.

Once a run is available, the runner attempts to acquire the run, and if successful, it will then download the pipeline to execute, along with any associated data such as variables.

Currently, only Linux runners are supported.

Customizing a Runner

Runner Pools provide the Startup Script settings field. This allows you to enter arbitrary shell script(s) that are executed only once when the runner starts up.

To add or edit this script from your runner’s page, click the Edit button in the Runner Pool Settings section.

If you’re using a self-hosted runner, you can configure your runner environment as you like.

Installing Tools

If you need to install tools or dependencies that will be used by your pipelines, it is highly recommended to install those tools as a part of your pipeline steps, either manually or by using the built-in tool installer step modules. This ensures that your pipeline will always use the same version of the tools.

Viewing the Status of a Runner

Runners can have the following statuses:

Provisioning: The Runner is undergoing its bootup sequence, and is unable to acquire or process a Run.
Idle: The Runner is online and polling for a run.
Running: The runner is processing a run.
Tainted: The Runner has been marked for removal, either due to a health check or to the pool being decreased in size. If the Runner is currently processing a Run, we will wait until a health check fails, Run timeout occurs, or the Run is completed.

Working With Files and Paths

Runners use a directory to store data for pipeline runs called the workspace directory, which defaults to /workspace. You can retrieve this path using the automatic environment variable env.WORKSPACE_PATH.

For each pipeline run, a temporary directory is created within the workspace called the run directory. The run directory is deleted immediately after a run finishes, so it is a useful location to store temporary files. You can retrieve this path using the automatic environment variable env.RUN_PATH.

For more on the built-in environment variables, see Built-in Environment Variables in the expression reference.

For script step modules and other modules that run as a subprocess, the workspace directory will be the same as the run directory. Any file path properties are resolved relative to this directory.

Accessing Metadata About a Pipeline Run

It’s often useful to retrieve information about the pipeline run itself during execution, such as the pipeline ID or name. Environment variables can be used to access this information. They are automatically set for all steps that run as a subprocess, including script modules.

These values can be retrieved from any expression directly by using the env.* syntax. For example, to retrieve the agent version in an expression, use env.AGENT_VERSION.

The full list of built-in environment variables can be found in Built-in Environment Variables in the expression reference.

Self-Hosted Runner Pools

Configuring and using self-hosted Runner Pools.