Why HPA Should Be Disabled During Performance Testing for Kubernetes Microservices

When organizations adopt Kubernetes for running microservices, one of the most powerful features they often leverage is Horizontal Pod Autoscaling (HPA). HPA automatically adjusts the number of pods in a deployment based on observed CPU, memory, or custom metrics, ensuring that workloads scale efficiently in response to demand.

While HPA is extremely valuable in production environments, it can skew results during performance testing if left enabled. Let’s unpack why it’s best practice to disable HPA when running performance or load tests.

1. The Goal of Performance Testing

The primary objective of performance testing is to understand:

How your application behaves under expected and peak loads.
The maximum capacity a service can handle before performance degradation.
Latency, throughput, and error thresholds of the system.

To achieve this, the test environment must remain predictable and controlled. If HPA is active, the environment dynamically changes during the test, making it harder to attribute observed performance to the application itself.

2. HPA Masks Bottlenecks

HPA automatically spins up new pods when resources are under pressure. This elasticity is great in production, but during testing it can hide bottlenecks.

If your application has inefficient queries, poor caching, or memory leaks, autoscaling might temporarily compensate.
Instead of surfacing the root cause, the system “throws more pods” at the problem.
This leads to misleading test results and prevents teams from uncovering real optimization opportunities.

3. Inconsistent Test Results

Performance tests should be repeatable. When HPA is enabled:

The number of pods at the start and end of the test may differ.
Results vary depending on how aggressively HPA scaled resources.
Benchmark comparisons (e.g., before and after code changes) become unreliable.

By disabling HPA, you ensure that every test run uses the same fixed resources, making your measurements consistent and comparable.

4. Cost and Overprovisioning Concerns

If you’re running large-scale tests, HPA may aggressively scale out services, creating far more pods than anticipated. This not only inflates infrastructure costs during testing but also gives a false sense of capacity. Your service may appear highly scalable in tests, but in reality, the efficiency and cost implications in production could be unsustainable.

5. When to Reintroduce HPA in Testing

It’s important to note that HPA should not be ignored altogether in testing—it simply needs to be introduced at the right stage:

Early-stage performance testing → Keep HPA disabled to identify application-level bottlenecks and establish baselines.
Scalability testing → Re-enable HPA to validate scaling policies, ensure autoscaling responds correctly, and fine-tune thresholds.

This phased approach ensures both application efficiency and scalability resilience are properly validated.

Key Takeaway

Disabling HPA during performance testing creates a controlled environment, prevents autoscaling from masking issues, and produces consistent, reliable results. Once your microservices are tuned and stable, HPA can then be tested separately to validate scalability strategies.

In short:
Performance Testing = Fixed Resources
Scalability Testing = Enable HPA

This distinction helps teams uncover bottlenecks early and ensures their Kubernetes-based microservices are both optimized and scalable.

Perf Critic

Performance Engineering Forum