what is a coup de coeur folks on the
green team pay special attention to this
one CUDA stands for compute unified
device architecture if the term dubbed
by Nvidia that mirrors to an extent
stream processors on AMD side and in
essence both describe exactly what they
do compute and stream graphical data now
if you go back and think about where we
started gtcys first year was 2009 we
introduced tesla we invented CUDA in the
year 2007 I think that the GPU many of
you guys will probably still remember
the GeForce 8800 GTX 8800 is potentially
one of the most important GPUs ever
created and it was a sacrifice we made
early on its put CUDA on every single
GPU long before people found value at
value in it by making it available on
every single GPU from desktops to
laptops to supercomputers to data
centers and now in mobile devices and
now in your cars
by putting CUDA in every single GPU we
make it as easy as possible for you to
develop and to deploy your software but
there's much more to it than that as
we'll discuss here shortly
first off let's address a common
misconception and Nvidia graphics card
with a higher number of CUDA cores than
another must be more powerful this is
only true to an extent that extent being
cards within the same family for
instance the gtx 960 packs 1,024 maxwell
cuda cores while the more powerful gtx
970 contains 1664 of them the 970 is
more powerful and both are based on the
same architecture so this makes sense
vram availability and clock speed also
play a role here but for the most part
core counts within the same family will
indicate relative GPU strength the same
generalization can be made for stream
processors and AMD GPU labs by the way
so what about across different
architectures say from Maxwell to Pascal
here things get dicey the 980ti one of
Maxwell's biggest and baddest contains
twenty eight hundred and sixteen CUDA
cores
while the Pascal ten eighty features
only two thousand five hundred and sixty
of them however thanks to narrower fin
arrays of Pasco transistors and a much
more compact profile overall the
Kennedy is the clear winner when it
comes to gaming performance there are
advantages to having more physical cores
such as video rendering but these can be
eliminated with simple overclocks and
Driver optimizations now when I said
compact profile I was referring to the
fabrication node Pascal features 16
nanometer architecture meaning that
individual features within each Pascal
die can be precisely defined to within
16 nanometers more on that in an
upcoming into science episode Maxwell
architecture by contrast is based on a
28 nanometer fabrication so by principle
of design theoretically more transistors
can be packed in each coup de coeur with
in Pascal GPUs this plays a substantial
role in single core performance GPU die
size and overall power consumption all
three of which go hand in hand
you see when it comes to single core
performance it's undeniable individual
Pascal cores are more efficient and more
powerful than their Maxwell counterparts
but while this would normally be thanks
to the increased number of transistors
per CUDA core which is actually not the
case for the 1080 compared to the 980ti
in the case of what you're about to see
this comes down to purely clock speed
and GPU die size let's break down a
980ti and a 1080 the 90 vti contains 8.1
billion transistors with two thousand
eight hundred and sixteen cores the 1080
contains 7.2 billion transistors with
2560 course assuming a uniform
distribution of transistors per core we
get into some heavy technical topics if
we assume anything else this puts the Ti
at roughly 2 million 876 thousand 420
transistors per core and the 1080 at two
million eight hundred and twelve
thousand five hundred transistors per
kodokor now at this point if you just
blindly judge the two cards in question
based on the numbers you just saw I
wouldn't really blame you but you would
be incorrect cute GPU frequencies the 90
DTI can overclock up to around 1500
megahertz with a non-reference cooler my
MSI gold edition overclocks to a stable
1531 but a typical GT X 1080 can attain
a stable 2,000 mega Hertz and that's an
easy 2000 to obtain 500 Meg's over the
ti
so while transistor counts per core and
per GPU might not be far apart
Pascal transistors can attain higher
overhead frequencies thanks to their
reduced sizes picture these transistors
like Pistons in a car smaller pistons
are lighter and typically travel shorter
distances per stroke meaning that our
PMP
the equivalent of frequency in this case
are generally higher larger Pistons are
heavier and usually travel larger
distances from top dead center to bottom
resulting in reduced rotations per
minute under full load it's why a 1.6
litre Formula one engine revs to well
over 12,000 rpm while an eight point
four liter Viper engine is limited to
around six thousand the same principle
in theory applies for transistors in
general smaller transistors consume less
power and demand lower voltages overall
resulting in higher overclocking
Headroom and this is exactly what Nvidia
Bank done with their GP 104 lineup on
paper o'clock four o'clock the 980ti in
1080 are neck-and-neck if we reduce the
frequency up to 1080 in the core to that
of the 980ti both cards would perform
similarly omitting memory frequency
differences between g5 and g5x
however thanks to a much smaller
fabrication and significantly smaller
GPU die overall we're talking 314 square
millimetres verse 601 the 1080 crushes
its current competition while conserving
power and venturing into uncharted GPU
frequency territory architecture Lee
speaking CUDA cores are grouped into
chunks regarded as streaming multi
processors these sm's maximize
throughput by organizing clusters of
information into streams for the sake of
parallel processing CUDA cores within
each sm essentially divide instruction
sets into even distributions of in the
cases of Maxwell and Pascal 128 and
sixty-four respectively Nvidia decided
to reduce the number of CUDA cores in
each streaming multiprocessor by half in
an effort to increase the ability of
Pascal GPUs to render and shade
simultaneously and for those of you who
are especially astute in this topic this
brings us awfully close to the issue of
asynchronous compute again saved for
another crash course however what you
need to know for now is that cuda has
and likely will be the preferred
architectural design for NVIDIA GPUs it
has been since 2006 and based on the
current trend it's very unlikely that
CUDA will ever fully support
asynchronous compute at a hardware level
given this current configuration it just
doesn't add up from what I'm seeing so
far and several several articles agree
with me here it is clear that invidious
goal in reducing cores per SM was to
increase throughput via greater registry
access however from the looks of things
at this point I have my doubts as to
whether or not this current model trend
will align with that of the gaming
industry were aimed he has a clear edge
in Vulcan
and directx12 titles I've already
discussed how DirectX 11 and 12 adapt to
different GPU hardware in a video you
can check out right here if you enjoyed
this video be sure to give this one a
thumbs up give it a thumbs down if you
feel the complete opposite or if you
hate everything about life be sure to
click the subscribe button if you
haven't already and stay tuned for more
crash course in minutes science episodes
here on the channel as well as an ultra
cheap well not ultra cheap it's a
moderately cheap PC build featuring the
Q 6600 and overclockable p5q a suits
motherboard and my good friend Bo who's
starting up his gaming youtube channel
which will be able to showcase and check
out here shortly
all the footage is already on the
computer I just have to edit it a lot of
footage not as much as it as I had with
McLovin but it's still a lot so give me
give me a few days and I'll have that
one up for it this is signed studio
thanks for learning with us
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.