Power Caps On Three RTX 3090s: Bursts Versus Sustained Load

Power caps on RTX 3090 rigs are workload-specific. A number that looks perfect for one serving stack can be a bad trade for another.

On Nodehome's current three-card setup, 300W has been reasonable for bursty interactive inference: temperatures can peak under load, but they fall quickly between requests. That is very different from sustained all-GPU work, where the chassis heat-soaks and a lower cap becomes part of the operating policy.

The measured stress-test tradeoff was not subtle. Dropping from 300W to 250W cut aggregate gpu-burn throughput by roughly 27 percent. That does not mean 250W is wrong. It means the lower cap should be chosen deliberately for sustained thermal control, not assumed to be free.

The broader local-builder lesson:

  • use the higher cap for short interactive inference if thermals recover cleanly,
  • use lower caps for long scoped runs,
  • compare caps on the actual workload, not just on anecdotes,
  • do not assume a 220W or 250W sweet spot transfers across runtimes, quantization formats, card coolers, or chassis airflow.

The right target is not maximum watts. It is useful work per degree of heat, for the workload you are actually running.