1

I am executing a CUDA kernel in my A100 GPU and I've realized that the power consumption at some points is higher than nvidia-smi given range:

enter image description here

The picture has been taken from nvtop.

Is it something that I should be worry about?

Hennes
  • 64,768
  • 7
  • 111
  • 168
Bub Espinja
  • 151
  • 5
  • "burst" performance has, for at least the last 5 years, been a fluid thing in computing. Intel processors have "turbo boost" where they self-overclock depending on thermal limits and power capabilities of the chip itself. Those overclocks are also time and instantaneous temperature dependent so a "cold" chip will overclock higher and that clock speed will drop over time as the device warms up. I would be surprised if Nvidia do not have similar capabilities. – Mokubai Dec 24 '20 at 10:51

2 Answers2

1

The only worrisome aspect is the temperature, which seems to be at an unimpressive 52°C. This doesn't make sense if the power draw is truly above the max.

So, take your pick. Either:

  • The power draw figure is false
  • The temperature reported is false
  • nvtop is not working correctly with your GPU.

I would suggest verifying the temperature using other applications. If they also report the same readings, then you don't need to worry. Check both CPU and GPU and motherboard.

Useful references:

harrymc
  • 455,459
  • 31
  • 526
  • 924
  • Thanks for your answer. I use nvtop for an easier monitorization. Data is the same that in nvidia-smi. – Bub Espinja Dec 24 '20 at 10:03
  • There is also above a fourth option, that the card is reporting false values. In any case, I would suggest keeping an eye on temperature readings, but otherwise, as long as the computer doesn't crash, believe in those temperature readings. – harrymc Dec 24 '20 at 10:07
1

The power draw of a GPU is uneven - it has spikes and lows. The specified power draw of a card is ment to be read as "rolling average over one second" during which time it can fluctuate over and belo that value - this is one of the reasons why PSU specs are recommended to be way over the sum of specified component power draw in a GPU-heavy rig.

nvidia-smi and friends report not the rolling average, but the momentary power draw, which can of course exceed the specified value. If you randomly sample your GPU power draw over a statistically relevant sample, you are very likely to see a value very close to specs.

Eugen Rieck
  • 19,950
  • 5
  • 51
  • 46