Gadgetory


All Cool Mind-blowing Gadgets You Love in One Place

EVGA's VRM Thermals Not the Killer of Cards - Final Test

2016-11-23
this is the end-all be-all of content surrounding EVGA zrm temperatures we attach thermocouple probes directly to the PCBs backside hotspot to MOSFETs number two and seven which are commonly shown scorched in photos of defective cards and we ran dozens of tests ranging from optimistic use case scenarios to completely unrealistic torture tests testing was done with and without thermal pads and also with and without the v-- BIOS update and with a stock unmodified card as well today we're showing a closer look at EVGA s10 series a CX 3.0 cooler thermal issues build Zoid from actually hard core overclocked and will be joining me about halfway through before getting to that this content is brought to you by our patreon backers so you can go to our patreon page patreon.com/lenguin axis to help us out directly and to help us fund in-depth testing like this which is frankly impossible to make money off of because it took so much time keep in mind that everything here is exhaustively discussed in the article as well linked in the description below there are more charts there good amount more but there's still a ton here so there's plenty of content to consume the goal is to sort of finish the book on the EVGA thermal analysis and discussion and see if these potential posted online board failures are a result of thermals as sort of initial testing suggested or something else let's recap the basics again and we recently validated a Tom's Hardware test which suggested that EVGA ACX devices were heating up on the backside of the vrm - north of 100 Celsius note that VRMs though can handle 100 C no problem but the temperatures that Tom's had shown hitting 114 C in some reports and using thermal imaging we're beginning to enter the range of being concerning the fact that a few users began sharing photos around this time of scorched PCBs a further this concern of temperature related damage EVGA cards and we validated at Tom's Hardware des testing methodology by deploying our own thermal camera just like they did but noted that emissivity and the Delta between the backside of the PCB and the frontside VRMs could be significant so we decided thermal imaging wasn't sufficient even though we did validate their results using that method and EVGA then issued thermal pad mods optionally and AV BIOS update that increased the aggression of the fan speed curve we declared at this point that both of these by way of thermal imaging were enough to fix the problem and until today this problem was largely assumed to be because of EB J's lack of thermal pads between the base plate and the heatsink and then on the back side of the card between the back plate and the back of the PCB the thermal pad that was later added in the thermal mod goes right here and covers the backside of the inductors the power stages capacitor banks things like that but we found more there's more to it than that so the thermals are not the heart of the issue here in our testing we apply k-type thermocouple to three spots one of them is the back of the PCB which on a backplate would be right around where the e is in the logo that's where one of the hotspots is that both Tom's testing and our testing previously showed two other spots that we post the thermocouples are as you'll see some of them are still hanging off here that they've mostly gone now to the others MOSFET number two and MOSFET number seven up here those are hot spots and we got direct readings for those the thermocouples used are flat and self-adhesive from Omega as recommended by thermal engineers in the industry including the Bob you can still of course air when we previously interviewed we explained in a previous video why these flat thermocouples that have the adhesive on them do not pose issues for thermal transfer between the FETs and the thermal pads that are included on the card and how they're not electrically conductive so that's also a good thing we also talked about avoiding EMI in that previous video and that was basically the preparation video for this one so if you're curious how the testing was set up go there we ran several different tests for this there were four primary tests different configurations of the card physically or in firmware the first one was just the card stock with the original v bios less aggressive fan speed no thermal pads applied at least none of the extras second test just the v bios update no thermal pads applied third test thermal pads and the v bios and fourth test which will not be in this video but will be in the article is with the thermal pads and no v bios and then in addition to that we here about half a dozen more tests on those four scenarios and the data being published right now is primarily combustors implementation of fir mark and how the card responds to that Metro last light and dirt rally on loop for a long period of time for real-world game example overclocking and over volting and the impact on the results from the previous two tests named brief sli testing just for those of you who requested it on Twitter or elsewhere and finally a high ambient torture test which is probably the most interesting and it will be the last one that I talked about as for the vrm EVGA is t.j.maxx is 150 C and probably one would hope should initiate OTP at 180 C the power stage is best operate at 100 C continuous but have a T case of 125 C and doctors really don't matter that much for heat dissipation since they can take so damn much in terms of thermals since they're really just copper wire coiled inside of a natural heatsink but they do heat up neighboring components the FETs have a thermal pad contacting the base plate and that is included with the original design it's just the original cards that have no pad to allow transfer from the base plate to the heatsink so that heat gets kind of trapped there and saturates the base plate with nowhere to go and before getting into the testing in our tutorial for how to apply the thermal pad mod we saw some misguided comments about EVGA suggested placement of thermal pads atop the chokes basically sitting on top of the chokes and on top of the base plate and connecting to the fins of the heatsink that would be this side of the heatsink where obviously there's no cold plate here if you had a cold plate shirt it would be better but it's not necessary and as will be showing the thermal pads between the base plate and just the fins improve the temperature significantly so this myth that I've seen sort of circulating that they're not doing anything and it's just wrong measurements or something is is not true and based on nothing and so now that we've got experience with this dozens of hours testing it I can tell you definitively it doesn't matter another thing that I saw that I'd like to debunk quickly was a concern that because the thermal pad is placed where it is through their recommendation there's no air getting down it's the vrm anywhere well there's no air getting there and because there's a baseplate on top of it it's physically obstructed and not only that the chokes where there is I guess you could call it an air channel are flanked by capacitor banks on one side which leads to the vram and GPU anyway so that's irrelevant and on the other side flanked by a thermal pad which allows really no air transfer to the MOSFETs so air can't get in there and air getting to the chokes doesn't matter anyway because they can take so much heat and in addition to that a thermal pad on top of them has far and away greater thermal conductivity than air air you might have something like 0.3 watts per meter Kelvin at 25 C where a thermal pad could be seven could be 10 and I don't know what EB J's pads are but I assure you it's greater than air I did ask them that what their thermal conductivity is haven't gotten a response yet oh and one more thing some posts claims that black screen issues on EB J's first batch of cards were caused by VRMs overheating that's wrong on many levels a thermally compromised vrm will not throw a black screen that can be fixed by a restart it'll go up in a puff of smoke and you'll never be able to turn it on again that's how they fail the black screen issues have been resolved by EVGA and we're unfortunately to the company another in a long list of issues and resolutions for this series of graphics cards but they're not related as far as we know to this vrm thermal concerns anyway enough clearing up those gripes let's use the data to show what's going on internally if you have any questions whatsoever about how this testing was conducted check the article linked below because it probably answers it in the methodology section we're gonna start with a stock card that's with the original V bios and with no thermal pads applied through the thermal mod just a warning this video will be very heavy on data that's the point of it I'm trying to sort of close the book on this series of questions about it but that's the way it is a lot of data a lot of charts here's a table with the updated noise results originally we tested the RPMs at 1500 to 1600 which was the previous update rpm and we tested that against the 2200 rpm that EVGA had informed us would be applied through the V BIOS update after that testing EVGA changed their V BIOS update so that the max fan speed was closer to 2000 rather than 2200 so that's the updated DBA results you're seeing on the screen now this first temperature test is the stock card without an overclock running combustor fir mark as a burnin remember that fir mark is sort of a power virus it loads the vrm more heavily than any game will ever do also note that fir mark doesn't blast the clock as much as the game would but load is still heavy the colours will be the same for every charge shown yellow is mosfet seven counting bottom-up and is a significant hotspot on the card mosfet 2 is a common scorch point in photos that we've seen online and is toward the bottom of the card and that's orange PCB is cyan and is measured on the hotspot on the rear side of the video card with the backplate on GPU temperature is white and measured by software and ambient temperature is also critical to these tests so we'll leave that as well later double ambient and that's represented by a darker blue line in the bottom we're seeing the PCB achieving temperatures just shy of 100 C after a 1-hour burnin the MOSFETs are both at around 90 to 94 see with MOSFET seven running a bit warmer ambient was in the low 20s for this test case ambient as we show in this chart is upwards of 40 C in some enclosures but that would account for gains in temperature just not a one-to-one gain so you can't add for example 20 C to the results we're seeing if you have a 40 C case like we tested in our case review this week well test this situation later though so far these numbers are all numbers that the card is built to handle and that's with fir mark here's Metro last light running a burnin we're seeing temperatures closer to 85 C for the PCB backside and MOSFET 7 with MOSFET number two around a DC that's around 10 to 20 Celsius cooler than with fir mark depending on what you're looking at and other games that show similar performance results this chart shows the overclocking impact on a 1080 FTW without the V BIOS update and without thermal pads as benchmark to using fir mark temperatures get a little warmer here now nearing 105 Celsius on the PCB and MOSFET number seven that's hot enough that high case ambient would decrease your efficiency as you near 110 C but you should still be within safe operating rain as we show in the forthcoming high ambient tests before the high ambient test and before applying the thermal pads and the V BIOS updates the next goal is to test SLI 1080 FT WS with a one slot spacing between them so that's basically the cards are not immediately adjacent but they're somewhere around that far apart maybe a good inch and a half or two depending on what motherboard you're using this chart has some manual tuning in it because I was basically trying to figure out where's the worst case scenario so there were some changes going on live during the testing the first half of this test was without any tester interference at all we're at around 90 to 94 C for the FETs and the PCB temperatures with GPU temperatures maintained around ADC the thermal limits to which fan rpm will slave and that triggers a fan rpm about 80 percent with SLI that's in the native profile for the cards this is where we decided to start playing around with the idea of tortured testing and so drops the fan rpm back to its 60% speeds from auto so Ottawa's around 80 and a single card would run at around 60% so it was worth testing this pushes the temperatures up to about 105 to 107 C it's starting to get a little more interesting here's the interesting part you may have noticed that the vrm component temperatures for SLI are actually a little bit lower than the same scenario with a single card and there's a good reason for that the fan rpm is entirely controlled by the GPU the fan doesn't know anything about the vrm temperatures it doesn't know how the power stages are how hot the inductors are how hot the PCB is or how hot the capacitors are it also doesn't know the fan doesn't know how hot the RAM is either so the GPU sort of controls everything with regard to fan speed and that was where one interesting hypothesis came about which is if you have a hotter GPM GPU and therefore a lower rpm if you follow that hotter GPU lower rpm so we end up with this 80% speed in the situation of an SLI test then suddenly your vrm starts to actually look better than if you had the reverse where maybe there's a lower GPU temperature but there's a lot of load on the our design of the card this is where firm art comes in and you end up with a hot or vrm a lower fan speed because the GPU is cooler and you end up with really a worst case scenario now it's time to move on to the V BIOS update if you want to see how Justin's thermal pads perform without the V BIOS update check the article below here's the first chart of the V BIOS updates this one shows the FTW v RM with the new V virus applied but without the thermal pad mod we're hitting temperatures of around 85 C on the PCB in around 80 C on the second MOSFET compared to the stock card with the original v bios this more aggressive fan speed profile improves performance by about 6 to 10 Celsius on the V RMS and about 15 Celsius on the PCB that's just the fan profile this is mostly the same as what we reported with the original PCB temperature testing using thermal imaging and reconfirms our statement that the V BIOS update alone is enough to resolve the issues though it's still good that EVGA is now shipping the cards with the thermal pads and here's what it looks like with a video game this is Metro last light at some of its most demanding settings on the GPU side loops for a full hour the outcome is temperatures around 75 C on the PCB and MOSFETs and that's more than acceptable these VRMs can handle way higher heat than that and this is mind you with an ambient temperature of around 23 to 25 C as opposed to the earlier values near 20 C this is the last chart for the V BIOS updates at least before we move on to the thermal pad mod this chart shows overclocking of plus 30 percent with the master switch engaged and over voltina plus 100 percent the V RMS are hitting temperatures around 85 to 90 Celsius obviously warmer but it's still an improvement of more than 10 Celsius when looking at the old V bios and its OC performance again acceptable temperatures a bit warm certainly a bit warmer than some of the competitors but acceptable and within range here's a look at the thermal pads in addition to the V bios when running fir mark and stock clock so now we've added those pads that ebj ships for free the card is way below where it was stock somewhere to the tune of 20 to 30 Celsius depending on which test and component you look at furthering the decision that yes these should be included on the cards and temps are running around 85 to 87 Celsius for the backplate and are around 75 to 80 Celsius for the feds while fur mark Vernon is running so all this data is sort of if not for you for me beginning to be a bit of an overload it's difficult just to organize it so the rest of these types of tasks will be in the article again but I've got one more set of tests that's pretty interesting and it is for the high ambient torture tests the idea here was trying to kill the card actively so this was no longer a normal use case it entered into the realm of a challenge of how do I make this thing set itself on fire just to give you an idea here's some footage of our CPU radiator dumping Heat straight into the vrm fan this is with prime95 running l FFTs crunching on an overclocked CPU and with only one of the radiator fans turned on our CPU is that probably about 170 watt power draw during these tests a lot this isn't more than just a simulation of a high ambient case temperature it's actually worse than that because it's directed airflow not just sort of sitting ere the effective temperature that the card was breathing was often 40 Celsius of pure heat from another source of heat which is the CPU and I even lowered the fan rpm on the EVGA card 50% provided 30% more power over volted it 100% of allowance and overclocked at biown in 25 megahertz another test I tried was connecting a case fan pointed strictly at the GPU such that the GPU could run a colder temp the s-- a higher clock and lower fan speed but in a way that the vrm would still be incinerated by a lack of cool air because it's again directional so that was the footage of some of this going on and here's the chart with the outcome none of it worked this chart is the worst of the two no thermal pads installed and an old v bios and after overclocking around the 3000 second mark which is where you're seeing that tick up I was able to hit about 120 6.8 Celsius Mac's on the PCB backside and about 108 to 109 Celsius on the V RMS there was one tick to 110 that I saw but didn't really reflect in the data the ambient fluctuation you see is from when the radiator was moved closer to the card so it would incinerate itself as much as possible moving on to the torture test with the new v bios and to the thermal pads install the best case scenario with the worst case testing environment the worst I got was about 82 Sol's he's on the PC and that's a massive decrease mostly the thermal pad is doing work too so don't listen to those comments saying that the thermal pads do nothing because they are utterly completely false and pasted on nothing we've done enough testing at this point that I can confidently tell you the thorough pads do actually contribute fairly noticeably to performance just looking at the fan speeds of these two charts you'll see that we're even below at some points the overall speed from the previous chart when I was manually tuning the cards for worst-case scenarios and yet performance is better with the thermal pads applied but that doesn't answer for the fact that every day we're seeing posts on Reddit and overclocking forums where you're seeing allegedly anyway scorched or singed or in some cases dead cards from EVGA so why is that happening I reached out to EVGA and got a number that I can share with you all the number is a 200 DP p.m. rate for their cards that's defective parts per million so out of 1 million cards shipped video cards complete cards they're seen 200 that are defects and returned or repair to replace through the RMA process and that is all repairs replacements for these series cards this means that for every 1 million there's 200 defects as a percentage of about point zero two percent of all cards shift according to EVGA I you have no way of validating that and we're told that this number is fairly consistent with previous generations so it's just the defects at least from what EVGA seems to think are more noticeable this time because of the way the internet works there's media coverage there's people posting about issues on social media and so now everyone posts their cards even though maybe in the past it would have just been a email to you EGA and a replacement sent out so it is noisier and the 200 DP p.m. rate is really all I have to go off of for the answer to why do we keep seeing these things pop up if thermals are not an issue which they are not by the way this testing pretty much proves it's not a thermal issue so it's something else something else is going on here at least for the 200 DP p.m. cards the 0.02 percent of all card shift something going on obviously and it could just be things like manufacturing defects workmanship things like that here's a quick clip of a conversation I had with build Zoid from actually hardcore overclocked he's got his own channel we talked about the EVGA 10 series cards and a bit more electrical depth with him I'm gonna say it's the capacitors I have a bad batch of like is the one consisting thing that's like starting to show up is you have scorch marks coming like this latest one it comes right off of like you have the scorch mark coming right off the capacitor right there was a really early one where somebody had a blown-out capacitor just like capacitor completely fried in pieces that was like the first report and there I was like oh no that's that's just a manufacturing defect that's not a thermal issue so this most recent one as well is just like the capacitors well it's not blown to bits but you can see that the solder on one side of it just like got shot right off so that one's another like that's a capacitor failure and basically all of the other ones are like varyingly more severe capacitor failures could cause those exact types of damage and then along with the fact that is claiming that oh there isn't thermal issue that would line up with why there isn't a thermal issue you have a bad batch of capacitors so I'm thinking it's like because I read through the like the capacitor failure stuff which I gave you a link to yeah and basically all of its gonna lead to pretty much short short circuit explosion sort of scenarios where the capacitor will be damaged partially over time it'll get worse at some point it's gonna short how and it's gonna blow up and since this isn't really temperature bound that explains why it blows up at idle why it blows up without the capacitor actually being like properly powered or anything you know the one where it blows up from the PCIe slot right at startup it especially considering about the thermals even without the thermal pads for most people won't be that bad right considering like the temperatures you showed me from the testing it it's like fine that's normal for a VRM hundred degrees here a hundred degrees there is fine it's just like so I I don't see so it's basically I'm like I'm almost certain that ever just has a bad batch of capacitors on some of the cards hopefully not all of them that would be a disaster but I think they probably just have a bad batch of capacitors and they're just going up in flames and now it's really up to I've got to just state probably that that's the case I confirm it or disprove it because they obviously don't have a thermal issue with the MOSFETs because we tortured them so a thanks to builds Wright for joining for that and for his expertise it's clear to me at this point that there's no thermal issue with the EVGA cards at least not one that kills them the issue is something else and it's probably something like individual components it might be in the supply chain maybe they got a batch of bad caps from their supplier or it could be something else like workmanship or whatever other manufacturing defects but these things are not dying because the VRMs are getting too hot and that's even true for the stock card before v bios before the thermal pads now that said obviously high temperatures not great so you could be the case that you accelerate the death of the card it caused by something else with high heat but I still don't think that's the most likely cause regardless if you have one of these now you should still at least get the v bios update it is installed through a double click and very easy and your performance improves at least thermally and it's worth doing a thorough pad mod if you feel like you can otherwise send it in or replace it with them because the temperature improvement is just so great it feels wrong to throw away a free improvement to that margin especially for some of the worst case scenarios we've seen even though it's contacting here you're still transferring a ton of heat from a hot component with a shell casing to metal so it's not going to air anymore that's good and it's free just apply that at least if you don't have a card and you're thinking of buying one at this point it's there's a few things right there the black screen issues that have been resolved and I have seen those personally and there were the thermal issues which are really mostly a non-issue but these sort of echo chamber of the Internet making the existing damage to cards caused potentially by other things not thermal related look a lot worse than maybe you would feel comfortable with making a purchase so that's acceptable that's totally fine if you don't want to buy it because you're like this looks like a lot of problems I don't feel comfortable then don't buy it it's as simple as that if you kind of had your heart set on one of these and these questions that arose about thermals made you step back and go well let me see how this resolves you can probably buy it and be fine everything sold after November first will have the thermal pad replacement and the V BIOS update maybe there's a few items out there on shelves and small shops but other than that they're all updated at this point some were pulled back by EVGA and updated even after being sent to retailers so I think that mostly recaps it assuming what evj told me is correct assuming a 0.02 percent failure rate this seems like a low enough occurring issue that you could buy one of these and still be ok but that is not an excuse for EVGA as lack of thermal pads and the slack the BIOS profile because the temperatures were way higher than they need to be even though it was still safe it just wasn't competitive and that has been fixed so I think now we can kind of close the book on all this that covers every angle possible if you want more I guess hit the link I'm exhausted on this topic so we won't be covering it again thank you for watching as always patreon link in the postal video do click that because it helps us fund these types of things where it basically turns into a huge research project for fun subscribe for more and I'll see you all next time you
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.