Skip to main content

Distribute Task Execution (DTE)

Lerna speeds up your average CI time with caching and the --since flag. But neither of these features help with the worst case scenario. When something at the core of your repo has been modified and every task needs to be run in CI, the only way to improve the performance is by adding more agent jobs and efficiently parallelizing the tasks.

The most obvious way to parallelize tasks is to split tasks up by type: running all tests on one job, all builds on another and all lint tasks on a third. This strategy is called binning. This can be made difficult if some test tasks have build tasks as prerequisites, but assuming you figure out some way to handle that, a typical set up can look like the diagram below. Here the test tasks are delayed until all necessary build artifacts are ready, but the build and lint tasks can start right away.

CI using binning

The problem with the binning approach is you'll end up with some idle time on one or more jobs. Nx's distributed task execution reduces that idle time to the minimum possible by assigning each individual task to agent jobs based on the task's average run time. Nx also guarantees that tasks are executed in the correct order and uses distributed caching to make sure that build artifacts from previous tasks are present on every agent job that needs them.

When you set up Nx's distributed task execution, your task graph will look more like this:

CI using DTE

And not only will CI finish faster, but the debugging experience is the same as if you ran all of your CI on a single job. That's because Nx uses distributed caching to recreate all of the logs and build artifacts on the main job.

Find more information in this detailed guide to improve your worst case CI times.

Set up

To distribute your task execution, you need to (1) connect to Nx Cloud and (2) enable DTE in your CI workflow.

1. Connect to Nx Cloud via the interactive, browser-based workflow
npx nx connect-to-nx-cloud

Configure your CI workflow

Every organization manages their CI/CD pipelines differently, so it's not possible to cover every possible case, but the following examples of configuring DTE for the Nx CLI in popular providers should give you a good starting point, and are straigthforward to adapt to lerna specific commands like lerna run (instead of nx run-many or nx affected):

Note that only cacheable operations can be distributed because they have to be replayed on the main job.

For more details on setting up DTE, read this guide.

CI Execution Flow

Distributed task execution can work on any CI provider. You are responsible for launching jobs in your CI system. Nx Cloud then coordinates the way those jobs work together. There are two different kinds of jobs that you'll need to create in your CI system.

  1. One main job that controls what is going to be executed
  2. Multiple agent jobs that actually execute the tasks

The main job execution flow looks like this:

# Coordinate the agents to run the tasks
- npx nx-cloud start-ci-run
# Run any commands you want here
- lerna run lint --since=main & lerna run test --since=main & lerna run build --since=main
# Stop any run away agents
- npx nx-cloud stop-all-agents

The agent job execution flow is very simple:

# Wait for tasks to execute
- npx nx-cloud start-agent

The main job looks more or less the same way as if you haven't used any distribution. The only thing you need to do is to invoke npx nx-cloud start-ci-run at the beginning and optionally invoke npx nx-cloud stop-all-agents at the end.

The agent jobs run long-running start-agent processes that execute all the tasks associated with a given CI run. The only thing you need to do to set them up is to invoke npx nx-cloud start-agent. This process will keep running until Nx Cloud tells it to terminate.

Note it's important that the main job and the agent jobs have the same environment and the same source code. They start around the same time. And, once the main job completes, all the agents will be stopped.

It's also important to note that an Nx Cloud agent isn't a machine but rather a long-running process that runs on a machine. I.e., Nx Cloud doesn't manage your agents - you need to do it in your CI config (check out CI examples above).

Nx Cloud is an orchestrator. The main job tells Nx Cloud what you want to run, and Nx Cloud will distribute those tasks across the agents. Nx Cloud will automatically move files from one agent to another, from the agents to the main job.

The end result is that when say lerna run build --since=main completes on the main job, all the file artifacts created on agents are copied over to the main job, as if the main job had built everything locally.

Running Things in Parallel

--concurrency is propagated to the agents. E.g., npx lerna run build --since=main --concurrency=3 --dte tells Nx Cloud to run up to 3 build targets in parallel on each agent. So if you have say 10 agents, you will run up to 30 builds in parallel across all of them.

You also want to run as many commands in parallel as you can. For instance,

- lerna run lint --since=main
- lerna run test --since=main
- lerna run build --since=main

is worse than

- lerna run lint --since=main & lerna run test --since=main & lerna run build --since=main

The latter is going to schedule all the three commands at the same time, so if an agent cannot find anything to build, it will start running tests and lints. The result is better agent utilization and shorter CI time.

Relevant Repositories and Examples