Run impact analysis on fewer nodes

If an environment has a lot of nodes, it might take a long time for impact analysis to run. It is possible to only analyze a subset of your total nodes, but there are tradeoffs.

Important: If you're experiencing degraded performance when running impact analysis, we initially recommend changing the impact analysis destination to use a dedicated compiler or a pool of compilers. If you're already using compilers for impact analysis and the runtime or performance burden isn't acceptable, you might consider running impact analysis on fewer nodes. However, keep in mind the following tradeoffs:

Not all nodes are analyzed. By definition, running impact analysis on fewer nodes means that some nodes don't get analyzed. For example, if you're analyzing only 10% of your nodes, the remaining 90% are not analyzed. When your code is deployed, excluded nodes might have unexpected changes that weren't detected since those nodes weren't analyzed.
Additional heap space is consumed. To run impact analysis on fewer nodes, you must create one or more dedicated impact analysis environments. Each impact analysis environment has the same code as its corresponding primary environment (for example, production and production-ia). Because environments consume heap space in Puppet Server, adding these additional environments consumes additional heap space deploying the same code to multiple environments.

Impact analysis runs on nodes in a designated environment. Therefore, if your control repo pipeline runs impact analysis on your production environment, it analyzes all nodes in the production environment node group. If you have a lot of nodes, this can take a long time to run and might be taxing on system resources. If your nodes are mostly similar, it might make sense to run impact analysis on a subset of your total nodes, rather than always analyzing every node. However, this requires changing your environment structure to accommodate one or more impact-analysis environments.

For example, if you want to run impact analysis on a few nodes before deploying code to all production nodes, you'll need to set up a production impact analysis environment. First, create a production-ia branch in your control repo and deploy the new environment. Next, create a production-ia environment node group as a child of your production environment node group. Then, add nodes to the production-ia group representing a subset of your total production nodes.

Tip: You'll only analyze the nodes in the production-ia group, so make sure the nodes in this group are a good representation of your total production nodes. For example, make sure to include different operating systems or geographic locations, as well as any outliers and known problematic nodes.

You now have two environments where your production code is deployed: production-ia, which contains some production nodes, and production, which contains all production nodes. To run impact analysis on the smaller production-ia group, you need to add the new production-ia environment to your control repo pipeline:

Add a deployment for the production-ia environment, in addition to the production environment's deployment.
Tip: Since you're deploying the same code to production-ia and production, you can configure your pipeline to auto-promote to the production deployment stage after completing the production-ia deployment.
Edit the impact analysis task so that it only runs on nodes in the production-ia environment. Make sure the impact analysis task is set to Run for selected environments and includes only the production-ia environment. Since your goal is to analyze only a subset of nodes, you don't want to run impact analysis on the production environment anymore.
Check the promotion settings between the impact analysis stage and the production-ia deployment stage. If you want to review the impact analysis report before deploying your code, make sure your pipeline doesn't auto-promote to the deployment stage.

While the above example used the production environment, you could set up similar structures for any environments you wanted to partially analyze, such as UAT or preproduction.

You can help us improve this feature: We invite you to tell us why you run (or would run) impact analysis on fewer nodes, whether the above approach works in your infrastructure, and what changes you'd like us to make. Please visit our product board to learn more, vote for features you like, and tell us your thoughts.