What Are CUDA Cores?

what is a coup de coeur folks on the green team pay special attention to this one CUDA stands for compute unified device architecture if the term dubbed by Nvidia that mirrors to an extent stream processors on AMD side and in essence both describe exactly what they do compute and stream graphical data now if you go back and think about where we started gtcys first year was 2009 we introduced tesla we invented CUDA in the year 2007 I think that the GPU many of you guys will probably still remember the GeForce 8800 GTX 8800 is potentially one of the most important GPUs ever created and it was a sacrifice we made early on its put CUDA on every single GPU long before people found value at value in it by making it available on every single GPU from desktops to laptops to supercomputers to data centers and now in mobile devices and now in your cars by putting CUDA in every single GPU we make it as easy as possible for you to develop and to deploy your software but there's much more to it than that as we'll discuss here shortly first off let's address a common misconception and Nvidia graphics card with a higher number of CUDA cores than another must be more powerful this is only true to an extent that extent being cards within the same family for instance the gtx 960 packs 1,024 maxwell cuda cores while the more powerful gtx 970 contains 1664 of them the 970 is more powerful and both are based on the same architecture so this makes sense vram availability and clock speed also play a role here but for the most part core counts within the same family will indicate relative GPU strength the same generalization can be made for stream processors and AMD GPU labs by the way so what about across different architectures say from Maxwell to Pascal here things get dicey the 980ti one of Maxwell's biggest and baddest contains twenty eight hundred and sixteen CUDA cores while the Pascal ten eighty features only two thousand five hundred and sixty of them however thanks to narrower fin arrays of Pasco transistors and a much more compact profile overall the Kennedy is the clear winner when it comes to gaming performance there are advantages to having more physical cores such as video rendering but these can be eliminated with simple overclocks and Driver optimizations now when I said compact profile I was referring to the fabrication node Pascal features 16 nanometer architecture meaning that individual features within each Pascal die can be precisely defined to within 16 nanometers more on that in an upcoming into science episode Maxwell architecture by contrast is based on a 28 nanometer fabrication so by principle of design theoretically more transistors can be packed in each coup de coeur with in Pascal GPUs this plays a substantial role in single core performance GPU die size and overall power consumption all three of which go hand in hand you see when it comes to single core performance it's undeniable individual Pascal cores are more efficient and more powerful than their Maxwell counterparts but while this would normally be thanks to the increased number of transistors per CUDA core which is actually not the case for the 1080 compared to the 980ti in the case of what you're about to see this comes down to purely clock speed and GPU die size let's break down a 980ti and a 1080 the 90 vti contains 8.1 billion transistors with two thousand eight hundred and sixteen cores the 1080 contains 7.2 billion transistors with 2560 course assuming a uniform distribution of transistors per core we get into some heavy technical topics if we assume anything else this puts the Ti at roughly 2 million 876 thousand 420 transistors per core and the 1080 at two million eight hundred and twelve thousand five hundred transistors per kodokor now at this point if you just blindly judge the two cards in question based on the numbers you just saw I wouldn't really blame you but you would be incorrect cute GPU frequencies the 90 DTI can overclock up to around 1500 megahertz with a non-reference cooler my MSI gold edition overclocks to a stable 1531 but a typical GT X 1080 can attain a stable 2,000 mega Hertz and that's an easy 2000 to obtain 500 Meg's over the ti so while transistor counts per core and per GPU might not be far apart Pascal transistors can attain higher overhead frequencies thanks to their reduced sizes picture these transistors like Pistons in a car smaller pistons are lighter and typically travel shorter distances per stroke meaning that our PMP the equivalent of frequency in this case are generally higher larger Pistons are heavier and usually travel larger distances from top dead center to bottom resulting in reduced rotations per minute under full load it's why a 1.6 litre Formula one engine revs to well over 12,000 rpm while an eight point four liter Viper engine is limited to around six thousand the same principle in theory applies for transistors in general smaller transistors consume less power and demand lower voltages overall resulting in higher overclocking Headroom and this is exactly what Nvidia Bank done with their GP 104 lineup on paper o'clock four o'clock the 980ti in 1080 are neck-and-neck if we reduce the frequency up to 1080 in the core to that of the 980ti both cards would perform similarly omitting memory frequency differences between g5 and g5x however thanks to a much smaller fabrication and significantly smaller GPU die overall we're talking 314 square millimetres verse 601 the 1080 crushes its current competition while conserving power and venturing into uncharted GPU frequency territory architecture Lee speaking CUDA cores are grouped into chunks regarded as streaming multi processors these sm's maximize throughput by organizing clusters of information into streams for the sake of parallel processing CUDA cores within each sm essentially divide instruction sets into even distributions of in the cases of Maxwell and Pascal 128 and sixty-four respectively Nvidia decided to reduce the number of CUDA cores in each streaming multiprocessor by half in an effort to increase the ability of Pascal GPUs to render and shade simultaneously and for those of you who are especially astute in this topic this brings us awfully close to the issue of asynchronous compute again saved for another crash course however what you need to know for now is that cuda has and likely will be the preferred architectural design for NVIDIA GPUs it has been since 2006 and based on the current trend it's very unlikely that CUDA will ever fully support asynchronous compute at a hardware level given this current configuration it just doesn't add up from what I'm seeing so far and several several articles agree with me here it is clear that invidious goal in reducing cores per SM was to increase throughput via greater registry access however from the looks of things at this point I have my doubts as to whether or not this current model trend will align with that of the gaming industry were aimed he has a clear edge in Vulcan and directx12 titles I've already discussed how DirectX 11 and 12 adapt to different GPU hardware in a video you can check out right here if you enjoyed this video be sure to give this one a thumbs up give it a thumbs down if you feel the complete opposite or if you hate everything about life be sure to click the subscribe button if you haven't already and stay tuned for more crash course in minutes science episodes here on the channel as well as an ultra cheap well not ultra cheap it's a moderately cheap PC build featuring the Q 6600 and overclockable p5q a suits motherboard and my good friend Bo who's starting up his gaming youtube channel which will be able to showcase and check out here shortly all the footage is already on the computer I just have to edit it a lot of footage not as much as it as I had with McLovin but it's still a lot so give me give me a few days and I'll have that one up for it this is signed studio thanks for learning with us

Gadgetory

All Cool Mind-blowing Gadgets You Love in One Place

2016-08-02