EVGA's VRM Thermals Not the Killer of Cards - Final Test
EVGA's VRM Thermals Not the Killer of Cards - Final Test
2016-11-23
this is the end-all be-all of content
surrounding EVGA zrm temperatures we
attach thermocouple probes directly to
the PCBs backside hotspot to MOSFETs
number two and seven which are commonly
shown scorched in photos of defective
cards and we ran dozens of tests ranging
from optimistic use case scenarios to
completely unrealistic torture tests
testing was done with and without
thermal pads and also with and without
the v-- BIOS update and with a stock
unmodified card as well today we're
showing a closer look at EVGA s10 series
a CX 3.0 cooler thermal issues build
Zoid from actually hard core overclocked
and will be joining me about halfway
through before getting to that this
content is brought to you by our patreon
backers so you can go to our patreon
page patreon.com/lenguin axis to help us
out directly and to help us fund
in-depth testing like this which is
frankly impossible to make money off of
because it took so much time keep in
mind that everything here is
exhaustively discussed in the article as
well linked in the description below
there are more charts there good amount
more but there's still a ton here so
there's plenty of content to consume the
goal is to sort of finish the book on
the EVGA thermal analysis and discussion
and see if these potential posted online
board failures are a result of thermals
as sort of initial testing suggested or
something else let's recap the basics
again and we recently validated a Tom's
Hardware test which suggested that EVGA
ACX devices were heating up on the
backside of the vrm - north of 100
Celsius
note that VRMs though can handle 100 C
no problem but the temperatures that
Tom's had shown hitting 114 C in some
reports and using thermal imaging we're
beginning to enter the range of being
concerning the fact that a few users
began sharing photos around this time of
scorched PCBs a further this concern of
temperature related damage EVGA cards
and we validated at Tom's Hardware des
testing methodology by deploying our own
thermal camera just like they did but
noted that emissivity and the Delta
between the backside of the PCB and the
frontside VRMs could be significant so
we decided thermal imaging wasn't
sufficient even though we did validate
their results using that method
and EVGA then issued thermal pad mods
optionally and AV BIOS update that
increased the aggression of the fan
speed curve we declared at this point
that both of these by way of thermal
imaging were enough to fix the problem
and until today this problem was largely
assumed to be because of EB J's lack of
thermal pads between the base plate and
the heatsink and then on the back side
of the card between the back plate and
the back of the PCB the thermal pad that
was later added in the thermal mod goes
right here and covers the backside of
the inductors the power stages capacitor
banks things like that but we found more
there's more to it than that
so the thermals are not the heart of the
issue here in our testing we apply
k-type thermocouple to three spots one
of them is the back of the PCB which on
a backplate would be right around where
the e is in the logo that's where one of
the hotspots is that both Tom's testing
and our testing previously showed two
other spots that we post the
thermocouples are as you'll see some of
them are still hanging off here that
they've mostly gone now to the others
MOSFET number two and MOSFET number
seven up here those are hot spots and we
got direct readings for those the
thermocouples used are flat and
self-adhesive from Omega as recommended
by thermal engineers in the industry
including the Bob you can still of
course air when we previously
interviewed we explained in a previous
video why these flat thermocouples that
have the adhesive on them do not pose
issues for thermal transfer between the
FETs and the thermal pads that are
included on the card and how they're not
electrically conductive so that's also a
good thing we also talked about avoiding
EMI in that previous video and that was
basically the preparation video for this
one so if you're curious how the testing
was set up go there
we ran several different tests for this
there were four primary tests different
configurations of the card physically or
in firmware the first one was just the
card stock with the original v bios less
aggressive fan speed no thermal pads
applied at least none of the extras
second test just the v bios update no
thermal pads applied third test thermal
pads and the v bios and fourth test
which will not be in this video but will
be in the article is with the thermal
pads and no v bios and then in addition
to that we here
about half a dozen more tests on those
four scenarios and the data being
published right now is primarily
combustors implementation of fir mark
and how the card responds to that Metro
last light and dirt rally on loop for a
long period of time for real-world game
example overclocking and over volting
and the impact on the results from the
previous two tests named brief sli
testing just for those of you who
requested it on Twitter or elsewhere and
finally a high ambient torture test
which is probably the most interesting
and it will be the last one that I
talked about
as for the vrm EVGA is t.j.maxx is 150 C
and probably one would hope should
initiate OTP at 180 C the power stage is
best operate at 100 C continuous but
have a T case of 125 C and doctors
really don't matter that much for heat
dissipation since they can take so damn
much in terms of thermals since they're
really just copper wire coiled inside of
a natural heatsink but they do heat up
neighboring components the FETs have a
thermal pad contacting the base plate
and that is included with the original
design it's just the original cards that
have no pad to allow transfer from the
base plate to the heatsink so that heat
gets kind of trapped there and saturates
the base plate with nowhere to go and
before getting into the testing in our
tutorial for how to apply the thermal
pad mod we saw some misguided comments
about EVGA suggested placement of
thermal pads atop the chokes basically
sitting on top of the chokes and on top
of the base plate and connecting to the
fins of the heatsink that would be this
side of the heatsink where obviously
there's no cold plate here if you had a
cold plate shirt it would be better but
it's not necessary and as will be
showing the thermal pads between the
base plate and just the fins improve the
temperature significantly so this myth
that I've seen sort of circulating that
they're not doing anything and it's just
wrong measurements or something is is
not true and based on nothing and so now
that we've got experience with this
dozens of hours testing it I can tell
you definitively it doesn't matter
another thing that I saw that I'd like
to debunk quickly was a concern that
because the thermal pad is placed where
it is through their recommendation
there's no air getting down it's the vrm
anywhere well there's no air getting
there and
because there's a baseplate on top of it
it's physically obstructed and not only
that the chokes where there is I guess
you could call it an air channel are
flanked by capacitor banks on one side
which leads to the vram and GPU anyway
so that's irrelevant
and on the other side flanked by a
thermal pad which allows really no air
transfer to the MOSFETs so air can't get
in there and air getting to the chokes
doesn't matter anyway because they can
take so much heat and in addition to
that a thermal pad on top of them has
far and away greater thermal
conductivity than air air you might have
something like 0.3 watts per meter
Kelvin at 25 C where a thermal pad could
be seven could be 10 and I don't know
what EB J's pads are but I assure you
it's greater than air I did ask them
that what their thermal conductivity is
haven't gotten a response yet oh and one
more thing some posts claims that black
screen issues on EB J's first batch of
cards were caused by VRMs overheating
that's wrong on many levels a thermally
compromised vrm will not throw a black
screen that can be fixed by a restart
it'll go up in a puff of smoke and
you'll never be able to turn it on again
that's how they fail the black screen
issues have been resolved by EVGA and
we're unfortunately to the company
another in a long list of issues and
resolutions for this series of graphics
cards but they're not related as far as
we know to this vrm thermal concerns
anyway enough clearing up those gripes
let's use the data to show what's going
on internally if you have any questions
whatsoever about how this testing was
conducted check the article linked below
because it probably answers it in the
methodology section we're gonna start
with a stock card that's with the
original V bios and with no thermal pads
applied through the thermal mod just a
warning this video will be very heavy on
data that's the point of it I'm trying
to sort of close the book on this series
of questions about it but that's the way
it is a lot of data a lot of charts
here's a table with the updated noise
results originally we tested the RPMs at
1500 to 1600 which was the previous
update rpm and we tested that against
the 2200 rpm that EVGA had informed us
would be applied through the V BIOS
update
after that testing EVGA changed their V
BIOS update so that the max fan speed
was closer to 2000 rather than 2200 so
that's the updated DBA results you're
seeing on the screen now this first
temperature test is the stock card
without an overclock running combustor
fir mark as a burnin remember that fir
mark is sort of a power virus it loads
the vrm more heavily than any game will
ever do also note that fir mark doesn't
blast the clock as much as the game
would but load is still heavy the
colours will be the same for every
charge shown yellow is mosfet seven
counting bottom-up and is a significant
hotspot on the card mosfet 2 is a common
scorch point in photos that we've seen
online and is toward the bottom of the
card and that's orange PCB is cyan and
is measured on the hotspot on the rear
side of the video card with the
backplate on GPU temperature is white
and measured by software and ambient
temperature is also critical to these
tests so we'll leave that as well later
double ambient and that's represented by
a darker blue line in the bottom we're
seeing the PCB achieving temperatures
just shy of 100 C after a 1-hour burnin
the MOSFETs are both at around 90 to 94
see with MOSFET seven running a bit
warmer ambient was in the low 20s for
this test case ambient as we show in
this chart is upwards of 40 C in some
enclosures but that would account for
gains in temperature just not a
one-to-one gain so you can't add for
example 20 C to the results we're seeing
if you have a 40 C case like we tested
in our case review this week well test
this situation later though so far these
numbers are all numbers that the card is
built to handle and that's with fir mark
here's Metro last light running a burnin
we're seeing temperatures closer to 85 C
for the PCB backside and MOSFET 7 with
MOSFET number two around a DC that's
around 10 to 20 Celsius cooler than with
fir mark depending on what you're
looking at and other games that show
similar performance results this chart
shows the overclocking impact on a 1080
FTW without the V BIOS update and
without thermal pads as benchmark to
using fir mark temperatures get a little
warmer here now nearing 105 Celsius on
the PCB and MOSFET number seven that's
hot enough that high case ambient would
decrease your efficiency as you near 110
C but you should still be within safe
operating rain
as we show in the forthcoming high
ambient tests before the high ambient
test and before applying the thermal
pads and the V BIOS updates the next
goal is to test SLI 1080 FT WS with a
one slot spacing between them so that's
basically the cards are not immediately
adjacent but they're somewhere around
that far apart maybe a good inch and a
half or two depending on what
motherboard you're using this chart has
some manual tuning in it because I was
basically trying to figure out where's
the worst case scenario so there were
some changes going on live during the
testing the first half of this test was
without any tester interference at all
we're at around 90 to 94 C for the FETs
and the PCB temperatures with GPU
temperatures maintained around ADC the
thermal limits to which fan rpm will
slave and that triggers a fan rpm about
80 percent with SLI that's in the native
profile for the cards this is where we
decided to start playing around with the
idea of tortured testing and so drops
the fan rpm back to its 60% speeds from
auto so Ottawa's around 80 and a single
card would run at around 60% so it was
worth testing this pushes the
temperatures up to about 105 to 107 C
it's starting to get a little more
interesting here's the interesting part
you may have noticed that the vrm
component temperatures for SLI are
actually a little bit lower than the
same scenario with a single card and
there's a good reason for that the fan
rpm is entirely controlled by the GPU
the fan doesn't know anything about the
vrm temperatures it doesn't know how the
power stages are how hot the inductors
are how hot the PCB is or how hot the
capacitors are it also doesn't know the
fan doesn't know how hot the RAM is
either so the GPU sort of controls
everything with regard to fan speed and
that was where one interesting
hypothesis came about which is if you
have a hotter GPM GPU and therefore a
lower rpm if you follow that hotter GPU
lower rpm so we end up with this 80%
speed in the situation of an SLI test
then suddenly your vrm starts to
actually look better than if you had the
reverse where maybe there's a lower GPU
temperature but there's a lot of load on
the
our design of the card this is where
firm art comes in and you end up with a
hot or vrm a lower fan speed because the
GPU is cooler and you end up with really
a worst case scenario now it's time to
move on to the V BIOS update if you want
to see how Justin's thermal pads perform
without the V BIOS update check the
article below
here's the first chart of the V BIOS
updates this one shows the FTW v RM with
the new V virus applied but without the
thermal pad mod we're hitting
temperatures of around 85 C on the PCB
in around 80 C on the second MOSFET
compared to the stock card with the
original v bios this more aggressive fan
speed profile improves performance by
about 6 to 10 Celsius on the V RMS and
about 15 Celsius on the PCB that's just
the fan profile this is mostly the same
as what we reported with the original
PCB temperature testing using thermal
imaging and reconfirms our statement
that the V BIOS update alone is enough
to resolve the issues though it's still
good that EVGA is now shipping the cards
with the thermal pads and here's what it
looks like with a video game this is
Metro last light at some of its most
demanding settings on the GPU side loops
for a full hour
the outcome is temperatures around 75 C
on the PCB and MOSFETs and that's more
than acceptable these VRMs can handle
way higher heat than that and this is
mind you with an ambient temperature of
around 23 to 25 C as opposed to the
earlier values near 20 C this is the
last chart for the V BIOS updates at
least before we move on to the thermal
pad mod this chart shows overclocking of
plus 30 percent with the master switch
engaged and over voltina plus 100
percent the V RMS are hitting
temperatures around 85 to 90 Celsius
obviously warmer but it's still an
improvement of more than 10 Celsius when
looking at the old V bios and its OC
performance again acceptable
temperatures a bit warm certainly a bit
warmer than some of the competitors but
acceptable and within range here's a
look at the thermal pads in addition to
the V bios when running fir mark and
stock clock so now we've added those
pads that ebj ships for free the card is
way below where it was stock somewhere
to the tune of 20 to 30 Celsius
depending on which test and component
you look at furthering the decision that
yes these should be included on the
cards and temps are running around 85 to
87 Celsius for the backplate and are
around 75 to 80 Celsius for the feds
while fur mark Vernon is running so all
this data is sort of if not for you for
me beginning to be a bit of an overload
it's difficult just to organize it so
the rest of these types of tasks will be
in the article again
but I've got one more set of tests
that's pretty interesting and it is for
the high ambient torture tests the idea
here was trying to kill the card
actively so this was no longer a normal
use case it entered into the realm of a
challenge of how do I make this thing
set itself on fire just to give you an
idea here's some footage of our CPU
radiator dumping Heat straight into the
vrm fan this is with prime95 running l
FFTs crunching on an overclocked CPU and
with only one of the radiator fans
turned on our CPU is that probably about
170 watt power draw during these tests a
lot this isn't more than just a
simulation of a high ambient case
temperature it's actually worse than
that because it's directed airflow not
just sort of sitting ere the effective
temperature that the card was breathing
was often 40 Celsius of pure heat from
another source of heat which is the CPU
and I even lowered the fan rpm on the
EVGA card 50% provided 30% more power
over volted it 100% of allowance and
overclocked at biown in 25 megahertz
another test I tried was connecting a
case fan pointed strictly at the GPU
such that the GPU could run a colder
temp
the s-- a higher clock and lower fan
speed but in a way that the vrm would
still be incinerated by a lack of cool
air because it's again directional so
that was the footage of some of this
going on and here's the chart with the
outcome none of it worked this chart is
the worst of the two no thermal pads
installed and an old v bios and after
overclocking around the 3000 second mark
which is where you're seeing that tick
up I was able to hit about 120 6.8
Celsius Mac's on the PCB backside and
about 108 to 109 Celsius on the V RMS
there was one tick to 110 that I saw but
didn't really reflect in the data the
ambient fluctuation you see is from when
the radiator was moved closer to the
card so it would incinerate itself as
much as possible moving on to the
torture test with the new v bios and to
the thermal pads install the best case
scenario with the worst case testing
environment the worst I got was about 82
Sol's he's on the PC
and that's a massive decrease mostly the
thermal pad is doing work too so don't
listen to those comments saying that the
thermal pads do nothing because they are
utterly completely false and pasted on
nothing we've done enough testing at
this point that I can confidently tell
you the thorough pads do actually
contribute fairly noticeably to
performance just looking at the fan
speeds of these two charts you'll see
that we're even below at some points the
overall speed from the previous chart
when I was manually tuning the cards for
worst-case scenarios and yet performance
is better with the thermal pads applied
but that doesn't answer for the fact
that every day we're seeing posts on
Reddit and overclocking forums where
you're seeing allegedly anyway scorched
or singed or in some cases dead cards
from EVGA so why is that happening I
reached out to EVGA and got a number
that I can share with you all the number
is a 200 DP p.m. rate for their cards
that's defective parts per million so
out of 1 million cards shipped video
cards complete cards they're seen 200
that are defects and returned or repair
to replace through the RMA process and
that is all repairs replacements for
these series cards this means that for
every 1 million there's 200 defects as a
percentage of about point zero two
percent of all cards shift according to
EVGA I you have no way of validating
that and we're told that this number is
fairly consistent with previous
generations so it's just the defects at
least from what EVGA seems to think are
more noticeable this time because of the
way the internet works there's media
coverage there's people posting about
issues on social media and so now
everyone posts their cards even though
maybe in the past it would have just
been a email to you EGA and a
replacement sent out so it is noisier
and the 200 DP p.m. rate is really all I
have to go off of for the answer to why
do we keep seeing these things pop up if
thermals are not an issue which they are
not by the way this testing pretty much
proves it's not a thermal issue so it's
something else something else is going
on here at least for the 200 DP p.m.
cards the 0.02 percent of all card shift
something going on obviously and it
could just be things like manufacturing
defects workmanship things like that
here's a quick clip of a conversation I
had with build Zoid from actually
hardcore overclocked
he's got his own channel we talked about
the EVGA 10 series cards and a bit more
electrical depth with him I'm gonna say
it's the capacitors I have a bad batch
of like is the one consisting thing
that's like starting to show up is you
have scorch marks coming like this
latest one it comes right off of like
you have the scorch mark coming right
off the capacitor right there was a
really early one where somebody had a
blown-out capacitor just like capacitor
completely fried in pieces that was like
the first report and there I was like oh
no that's that's just a manufacturing
defect that's not a thermal issue so
this most recent one as well is just
like the capacitors well it's not blown
to bits but you can see that the solder
on one side of it just like got shot
right off so that one's another like
that's a capacitor failure and basically
all of the other ones are like varyingly
more severe capacitor failures could
cause those exact types of damage and
then along with the fact that is
claiming that oh there isn't thermal
issue that would line up with why there
isn't a thermal issue you have a bad
batch of capacitors so I'm thinking it's
like because I read through the like the
capacitor failure stuff which I gave you
a link to yeah and basically all of its
gonna lead to pretty much short short
circuit explosion sort of scenarios
where the capacitor will be damaged
partially over time it'll get worse at
some point it's gonna short how and it's
gonna blow up and since this isn't
really temperature bound that explains
why it blows up at idle why it blows up
without the capacitor actually being
like properly powered or anything you
know the one where it blows up from the
PCIe slot right at startup it especially
considering about the thermals even
without the thermal pads for most people
won't be that bad right considering like
the temperatures you showed me from the
testing it it's like fine that's normal
for a VRM hundred degrees here a hundred
degrees there is fine it's just like so
I I don't see
so it's basically I'm like I'm almost
certain that ever just has a bad batch
of capacitors on some of the cards
hopefully not all of them that would be
a disaster but I think they probably
just have a bad batch of capacitors and
they're just going up in flames and now
it's really up to I've got to just state
probably that that's the case I confirm
it or disprove it because they obviously
don't have a thermal issue with the
MOSFETs because we tortured them so a
thanks to builds Wright for joining for
that and for his expertise it's clear to
me at this point that there's no thermal
issue with the EVGA cards at least not
one that kills them the issue is
something else and it's probably
something like individual components it
might be in the supply chain maybe they
got a batch of bad caps from their
supplier or it could be something else
like workmanship or whatever other
manufacturing defects but these things
are not dying because the VRMs are
getting too hot and that's even true for
the stock card before v bios before the
thermal pads now that said obviously
high temperatures not great so you could
be the case that you accelerate the
death of the card it caused by something
else with high heat but I still don't
think that's the most likely cause
regardless if you have one of these now
you should still at least get the v bios
update it is installed through a double
click and very easy and your performance
improves at least thermally and it's
worth doing a thorough pad mod if you
feel like you can otherwise send it in
or replace it with them because the
temperature improvement is just so great
it feels wrong to throw away a free
improvement to that margin especially
for some of the worst case scenarios
we've seen even though it's contacting
here
you're still transferring a ton of heat
from a hot component with a shell casing
to metal so it's not going to air
anymore that's good and it's free just
apply that at least if you don't have a
card and you're thinking of buying one
at this point it's there's a few things
right there the black screen issues that
have been resolved and I have seen those
personally and there were the thermal
issues which are really mostly a
non-issue but these sort of echo chamber
of the Internet
making the existing damage to cards
caused potentially by other things not
thermal related look a lot worse than
maybe you would feel comfortable with
making a purchase so that's acceptable
that's totally fine if you don't want to
buy it because you're like this looks
like a lot of problems I don't feel
comfortable then don't buy it it's as
simple as that if you kind of had your
heart set on one of these and these
questions that arose about thermals made
you step back and go well let me see how
this resolves you can probably buy it
and be fine everything sold after
November first will have the thermal pad
replacement and the V BIOS update maybe
there's a few items out there on shelves
and small shops but other than that
they're all updated at this point some
were pulled back by EVGA and updated
even after being sent to retailers so I
think that mostly recaps it assuming
what evj told me is correct
assuming a 0.02 percent failure rate
this seems like a low enough occurring
issue that you could buy one of these
and still be ok but that is not an
excuse for EVGA as lack of thermal pads
and the slack the BIOS profile because
the temperatures were way higher than
they need to be even though it was still
safe
it just wasn't competitive and that has
been fixed so I think now we can kind of
close the book on all this that covers
every angle possible if you want more I
guess hit the link I'm exhausted on this
topic so we won't be covering it again
thank you for watching as always patreon
link in the postal video do click that
because it helps us fund these types of
things where it basically turns into a
huge research project for fun subscribe
for more and I'll see you all next time
you
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.