pascale is here the GP 104 has arrived
on the GTX 1080 graphics card that we
have here we've benchmarked this for
frame rates thermals power noise and
more and what we're looking at today is
the GTX at 1080 founders edition the
main news here just to sort of bring
everyone up to speed GPU 104 is the
Pascal architecture this is the first
Pascal architecture card shipping for
consumers for the gaming market and the
big change here is that it's got a
refined or brand-new actually process
note so it's running a 16 nanometer
FinFET process as opposed to Maxwell's
28 nanometer planar process and that has
some inherent power efficiency and
performance for watt gains that will
hopefully be reflected in our tests that
we're showing you today before diving
into all this I want to just sort of let
everyone know we've got a 9,000 word
article review of this card written on
the website hit the link in the
description below for the full article
because we can't go into all the same
detail in a video that we can in that
article and the article includes things
like more depth on asynchronous computes
more depth on us on architecture or more
depth on memory subsystem things like
that so you'll find that link below but
let's dive into this thing look at some
of the testing data and specs the GTX
1080 is priced at $700 currently for the
founders Edition which is equivalent to
the previous reference nomenclature see
our previous video for that MSRP for
board partners is $600 so we'll see a
range of 600 and higher and the cards
ship on May 27th for the founders
edition this is the specs table for all
the recent Nvidia video cards shown on
the screen now on the far left is the
tesla p 100 accelerator card which runs
the GP 100 cut version of big pascal
that was the first pascal architecture
device but was shipped with scientific
tasks in mind it was not a compute card
the GTX 1080
debuts the GP 104 version of Pascal and
a lot of the architecture is similar in
terms of sort of top level ideas and the
process note is the same but s/m
architecture is varied in some ways
compute preemption is similar in some
ways there's a couple differences there
that we'll talk about
the gtx 1080 has 2560 cuda cores split
between 4g pcs and 20s ms the reduced
cores per SM mean more dedicated
resources per SM partition so to speak
as sums are
to quote-unquote partitions of Kors and
schedulers and warps and buffers and
things like that and we'll talk about
that momentarily other than the
processed shrink the most obvious change
to GP 104 for consumers is the
introduction of 8 gigabytes of gddr5 X
memory that X is new and important gddr5
X V RAM from micron has an effective
operating frequency of 10 gigahertz on
this particular device and it's got a
256 bit interface again on this
particular device it offers a mid step
between the 40-ish percent slower gddr5
and the future of HP m to high bandwidth
memory of course shipped originally on
Fiji and these card and HP m2 will be
coming to Vigo later and hopefully to
Nvidia sometime maybe in the next year
but it's got low yields right now gddr5
ex operates at 10 gigabits per second
per die on GP 104 and has the potential
to grow to 13 to 14 gigabits per second
with microns future advances and gd-r 5
but for a reference operates at about 8
gigabits per second versus the 13 to 14
maximum capacity that micron is
targeting for gddr5 X the GTX 1080 runs
at 17 33 megahertz boosted with a GPU
boost 3.0 update and can overclock
beyond 2 gigahertz as you'll see later
it's got nine teraflops of FP 32 compute
at stock whereas the gtx 980ti has 5.63
teraflops and the gtx 1070 has a 6.5
teraflops most notably the gtx 1080 has
TDP of 180 watts and requires only a
single 8 pin power header which is a
pretty big change as well now of course
I know you all want to see the
benchmarks right away so we'll start
there starting with thermal noise and
then fps and you can find all of our
testing methodology on the website in
the article if you're curious how we
conducted these tests for gaming we
tested OpenGL DirectX 11 12 and Vulcan
we also tested 1080 1440 and 4k
resolutions games include doom ashes of
singularity Talos principle Tomb Raider
the division in gta5 shadow board or
Metro last light and more we will not
show all the charts here as that would
be insane so again article for those as
many of you know we used a thermal
chamber recently to validate our thermal
testing methodology and found it to be
highly accurate thanks to our ambient
login actively with a thermocouple
reader here's our equilibrium chart as
we call it for ease the GTX 1080
operates at 50 7.5 Celsius under load
and
96 Celsius Idol comparatively that's
roughly 49 presents warmer than the fury
X which is liquid cooled at 36 point
three nine Celsius and is effectively
identical to the gtx 980ti reference
edition the founders edition cooler is
able to keep the more powerful gtx 1080
at about the same temperature as the gtx
980ti is reference cooler does for the
GM 200 chip and then of course we've got
the hybrid card on here as well from
EVGA and that is just insane but it's a
much different design and it's more
expensive here's a look at thermals over
times they're all torture commences at
the same 120 second mark for all devices
because we use a custom program after
adding ambient back in we would see that
the thermals hit about 80 Celsius and
stay there for these and video devices
and especially the 1080 the fan will
adjust itself accordingly to maintain
this thermal level and that is something
we'll talk about with throttling this
next test is the new one for us the
purpose here is to run an endurance test
and generate a chart that looks for
throttling under tortured scenarios we
disabled the open bench fans to this end
and just let the GPU try to cool itself
and create a worst-case scenario for it
hoping to discover if it would throttle
the frequency against a thermal barrier
somewhere that's what we're looking to
discover here unlike the earlier charts
this temperature is represented as an
absolute value not a delta you'll notice
that over the two-hour test period
running dirt rally at completely maxed
settings all accounts the GPU seemed to
sit around 80 Celsius and Peaks
occasionally and then also dropped and
frequency occasionally here is a cropped
and look at those dips where you can see
what what's causing the drops and if you
look closely you'll see that it's
basically the temperature hitting about
82 cells he's causing the frequency
fluctuations which show a range of about
60 megahertz each time the GPU diode
hits 82 Celsius absolute temperature and
that can trigger a slight latency
increase or a slight frame rate
fluctuation at the exact moment of the
frequency drop but it is basically
imperceptible and not something to
really be concerned about because
overall we've seen this five times over
a period of two hours enough of that
let's talk power and noise and then
games total system power consumption
this is not per card but for the whole
system offers a difference comparison
for the GT x 1080 fe cards we see that
they set around 300 watts this is a bit
more power than the GTX 980 required non
ti but a fair bit less than the gtx
980ti that's also shown here
as for noise all this methodology and
the meters and setup we used are defined
in the article pretty important to check
that out if you want to know more idle
noise levels are more or less
imperceptibly varied between all the
cards tested after the five-minute GPU
load period the r9 290x pushes the
loudest auto DB output at forty nine
point three decibels and would be
perceptible even from within an
enclosure the r9 fury x may only be
thirty nine point oh eight decibels but
it's still the one we've got produces
that high-pitched pump whine that we
wrote about ages ago the msi Twin Frozr
card keeps the lowest DB level at thirty
0.37 aided by its dual fan push setup
and massive alloy heat sinks and the gtx
980ti VR Edition pushes one of the
loudest outputs at forty point seven
decibels with a 1080 running within
margin of error forty eight point eight
decibels effectively identical no card
realistically hits the 100% fan speed
they tend to sit around 50 percent if
they can help it but the stats show the
r9 290x had 70 decibels on its reference
design the GTX 1080 at fifty seven point
two DB and the 980ti noticeably louder
at sixty three point four dB the 980 at
fifty nine point five two dB the fury X
and Twin Frozr cards again around the
quietest for reference conversational
speech is about 65 ish maybe 70 decibels
telling how loudly you talk alright time
to test some games were opening with it
doom which we just tested this is an
OpenGL game not a Vulcan or DX game and
it runs on two different versions
there's 4.3 that it runs on for AMD
that's something done by in software and
4.5 OpenGL for NVIDIA so keep that in
mind let's jump into this we already
posted the full benchmark by the way for
this game if you want to learn more
about it 4k shows the GTX 1080 as being
the only card getting within throwing
distance of 60fps one or two settings
weeks would push the GTX 1080 into full
60 plus FPS action the GTX 10 a TF e at
stock clock outperforms a stock clock
gtx 980ti by 13.8% against the fury x
that delta is widened to 21.4% but
that's still really in the real world
not that noticeable the gtx 1080 is off
to a good start though and the value
proposition is a bit better against
these similarly priced devices 1440p
shows a significant performance lead for
the 1080 plan to note of 98.3 fps and
with the best low frame times on our bed
the GTX in 1080 is the most tightly
timed card for frame delivery that we've
tested as of this instant and against
the predecessor GTX 980 non TI
performance gains are a staggering at
30% over again the 989 TI against the
980 i however the performance
differences again 13% similar to the
last one we looked at here's the 1080p
chart in this test it's clear that we're
bottle necking somewhere else in the
system probably on our 59 30 kcp you and
that CPU bind that does mean that we're
seeing the 980 I and 1080 push
effectively an identical performance
output either way either card would
enable close to 144 Hertz gaming if an
appropriate CPU is paired with them we
tested a few new API games but ashes is
the most interesting as it's the most
reliable with most data we rip the
satellite shot to data from ashes which
shows the large batches down the pipe
and chokes components we have two sets
of data for ashes to feature millisecond
latency between frames and its
improvement yields from DX 12 and the
hard frame rate comparison chart between
the x11 and DX 12 what you're looking at
right now is the 1080p high comparison
of DX 12 and dx11 performance bashes GT
X 1080 holds a clear lead over
everything when using DX 12 this is
because of GP 104 is clear improvements
and asynchronous compute which I'll
explain in a few minutes notice that the
fury X and 390x are both choking on some
sort of DirectX 11 optimization issue
where they can't circumvent a CPU
hang-up or other bottleneck in the
system hence they're identical
performance to getting bottlenecked and
this is something which and videos dx11
drivers are good at working around
looking at dx12 performance though Andy
shows some of the biggest raw gains from
the new API and that's great news for
them as the industry trends toward DX 12
in Vulcan now look at the GTX 980 it's
dx11 performance ranked it high among
the cards but once we sort by DX 12 it's
clear that the 980 is the loser on the
bench
4k high shows similar performance
changes as those 4k with crazy settings
the GT x 1080 has made obvious
improvements in the x12 optimization at
frame rate and here's a frame time chart
with 1080p and high lower is better here
it's measured in milliseconds notice
that Andy's bane
isit's dx11 frame latency which creates
the stuttering seen in dx11 the 1080 has
an absurdly low thirteen point three
eight millisecond average frame time
which is good but that's not to take
away from the improvements made across
the board for AMD and NVIDIA both with
their newest architectures here's the
percent change chart for latency the
fury X sees a 120% latency reduction the
390x he's a massive seventy six point
nine percent latency reduction both a
reward of andes investment in async
computes while the older GM 204 Maxwell
architecture struggles to stay positive
GT X 1080 and GP 104 however combined
brute force compute with async
improvements to generate a forty eight
point six five percent Johnson TX twelve
which is massive news ran video who
struggled in the last generation with DX
twelve so they're finally gaining some
serious ground here and have become a
real contender in the new API space to
see more dx12 and Vulcan performance
check the article we're trying to keep
this bit of it short because it is
pretty complex data let's look at DX 11
at 4k the GTX 1080 holds nearly a 30%
lead against both the gtx 980ti and the
r9 fury act the 1080 is the first single
GPU that is able to sustain our GTA v 4k
benchmark with all main settings on very
high and ultra it pushes 56 FPS which is
well within the acceptable range without
many in-game hiccups at 1080p the GTX
1080 pushes into 125 fps and leaves
behind the nighty ion fury X again both
of which sit at about 16% lower frame
rates the fury X drops it's 0.1% low
values below 60fps but that's really not
too bad here and generally pushes it
less consistent frame times on the 1080
now we're looking at black ops 3 we've
had to expand our scale for these charts
past 100 60 FPS because the GTX 1080
crushes frame rates at 200 2.3 FPS
average for 1080 but if you're buying
this then you really aren't gonna plan
on 1080 let's all be honest again that
the 200 is an average with one percent
low is those exceed one 30 fps and the
GTX 1080 is exceptional in its frame
rate performance here and runs tight
frame times meaning consistent latency
between frame delivery and these are
nine fury x also performs very well in
black ops 3 and outperforms the
see I just barely assuming a reference
now a TI anyway and that's all we've
really got for for this test the fury X
is still about 24% behind the 1080 even
at 1440p and the 1080 is nearly at 144
Hertz range and would easily sustain
such high frame rates with a few
settings tweaks if he needed it as for
4k the card push of 68 FPS average and
holds a firm lead over the fury X and
it's average frame rate but a massive
lead over the fury X is 0.1 percent load
dips which this fury X card seems to
show it's for gigabyte limitation in
some of these 4k high setting scenarios
because the vram is just being tapped so
heavily this is a consistent issue in
some of the DX 11 games with the fury X
though dx12 is somewhere AMD excels and
you'll see that in or you've already
seen that in some of the test results
for sake of time we're gonna stop the
game benchmark charts here if you want
to see more length description below
you'll find Metro and a couple of other
DX 1112 in Vulcan analysis items that
you can look at but now we're going to
talk about overclocking overclocking has
changed thanks to GPU boost 3.0 but not
too much first of all we have an
interview with Tom Peterson and
technical marketing director ads and
Vidia that's live with this video you
can check that out for more depth on how
overclocking works the main thing here
first of all GPU boost 3.0 enables a new
feature called scan OC what this does is
it plots a frequency voltage curve on
your particular GPU and it looks for
specific frequencies where voltage may
need to be peaked or lowered or whatever
to sustain a stable overclock that is an
automated tool you basically click scan
and it runs some sort of burnin program
basically a reskin fir mark for EVGA is
precision and then that looks for what
your maximum potential overclock is the
reason this exists is because when
performing the overclock and there's
this chance that you'll end up with a
pretty high and stable OC high frequency
but once you start throwing specific
applications or games into the mix maybe
the witcher 3 triggers at a specific
point of failure it's not triggered on
other games the only real solution is to
step down the OC or to create multiple
profiles for multiple games and that
just kind of sucks so scan OC bypasses
that that's the thin I was not able to
get it functioning properly for this
review it crashed
Cecily but we did get something together
through the old means and that was just
by manually sitting there and Oh seen it
myself and I have several sheets here of
passes and failures that I'm not gonna
read all of them for you but the one
that we settled on was a pretty light
OSI in terms of what I think this card's
capable of but it was the max that I
could get out of the founders edition
cooler and that's because of some of the
thermal throttling we talked about
earlier so max OCI was landing between
2025 megahertz and 2050 megahertz star's
landing between that range I maximally
hit I think was 20 80 megahertz and it's
not too bad the memory clock I was
hitting 50 400 megahertz I didn't try to
push that higher you probably could but
I pushed it 400 Hertz and kind of left
it voltage I was sitting at 1.031
volts and my OC was a 120 percent power
target so we're giving to an extra 20%
to the TDP to the card 220 megahertz OC
to the core which produced that twenty
twenty-five twenty fifty megahertz
output 400 to the memory and then a 37%
voltage increase which really didn't
push 37 percent extra volts to it
because we've talked about that Maxwell
it's the same here it doesn't really
always give it all that it's got or that
you asked for now in terms of fps here's
the impact we got them doing off of
paper here because we just did this test
and interrupted the video do it the GTA
v 4k benchmark sees an improvement from
56 fps stock frequency to 65 fps
overclock that's pretty big gain that's
a full nine FPS mordor moves from sixty
point seven FPS stock to 65 o seed at 4k
not a huge gain but not bad and then at
1440p it moves from 106 to 113 fps
reasonable doom moves from 98 FPS stock
at 1440 to 109 FPS ioc it pretty big
gain actually a bit over 10% I believe
and then 51.7 FPS stock to 59 FPS at 4k
overclocked so what this tells us is
that first of all the the Headroom for o
scene is large for this card it is only
limited on the founders edition by the
cooling potential which we'll talk about
very
really I have a pretty special feature
for that but that's the main limitation
it tells us that AIB boards the the
Adhan boards from the AIB partners will
be significantly better at overclocking
in terms of their cooling potential now
whether or not the silicon can handle it
that's a different story we have to
really get more hands-on samples to
figure out exactly where the sort of
silicon Madhuri plays out with this
particular chip but that's all I got for
you with overclocking all right this is
the hard part I'm gonna try and condense
all the pertinent architecture
information into just a few minutes and
we'll see how it goes we're gonna be
talking about GPC as SMS the memory
subsystem and other items like that if
you want some of the visual guides again
link below for those let's start with a
block diagram this is GP 104 unlike GP
100 which is much different in a couple
of key ways the GP 104 GP does not have
in fact 15 billion transistors and it
also doesn't have six GP sees with ten
SMS each that's what GP one hundreds guy
GP 104 is shrunken down to four GP sees
with 20 total SMS in some ways it can be
thought of as a shrunken Maxwell it's
very similar in a couple of key areas
there are 20 T pcs for GP 104 and the
total TM you count hits 160 which is
calculated simply by multiplying the TM
use per SM by the SM count so there's
ATM use per SM times 20 equals 160 TM
used like GP 100 GP 104 partitions and
sessoms into two blocks of cores each
with 64 to 2 cores that's 128 per SM
partitioning the cores into smaller
clusters helps allocate more dedicated
resources to these cores like warp
schedulers that queue threads dispatch
units register files cache and memory
that could access frequently things like
that each SM has its own 256 kilobyte
register file a 96 kilobytes are memory
unit 48 kilobyte l1 cache and ATM use
that's again per SM stepping up a level
we zoom out and kind of look at GPC is
each GPC has its own raster unit as well
as that's four total for GP 104
new to GP 104 is its polymorphic engine
4.0 which was introduced on Fermi each
TVC has a polymorph engine that executes
specific tasks mostly related to its
nvidia simultaneous multipe rejection
tool that was recently introduced SMP is
used to ensure multi display surround
display or non flat display output is
warped according to its position
relative to the user so if you've got
monitors sort of on an angle towards you
the polymorph engine deals with all of
that translation now let's compress
asynchronous compute in a similar
fashion asynchronous compute paves the
way for leveraging low-level api's and
asynchronous command queuing allows GPU
resources to be allocated between
non-dependent
tasks there are three major changes to a
sinc compute in pascal from previous
architectures and it affects these items
one overlapping workloads to real-time
workloads and three compute preemption
for gaming GP resources are often split
into graphics and compute segments eg a
selection of cores cache and other
elements assigned to graphics while the
remainder is assigned to compute and the
resources are partitioned too for
instance rendering and post-processing
effects and it may be the case that one
of those partitioned clusters say the
cluster that's handling rendering
completes its workload prior to its
partner may be handling something else
compute whatever that compute allocation
may still be crunching a particularly
complex problem when the render
allocation completes its job leaving the
units allocated to rendering idle that's
wasted resources you don't want that you
don't want to leave those units idle
async command queuing structures allow
for more resources estimate on the fly
and abling concurrent in-flight jobs to
all reach completion this is called
dynamic load balancing and allows
workloads to scale as resources become
available or busy maybe 50% of resources
are allocated to compute and the other
50% to renderings and they like that
if one job completes the other job can
consume idle resources and speed of
completion and reduce latency between
frames in this image the command push
buffer stores triangle and pixel data
and halfway through working on its
latest draw call which you see on the
far right you'll notice that it's
stopped working on that draw call the
reason is because Pascal allows for
preemption requests to arrive hit that
push buffer and basically demand
priority over other tasks and at that
point the push buffer pauses its
execution saves all the rasterizes and
shaded pixel data freeze itself for use
elsewhere and then can execute that
command later when the preemption is
done
Pascal can perform pixel level thread
level and instruction level compute
preemption and that allows some very low
reaching changes to the the data
structure so that these things like time
warp for VR can be dropped in as needed
if something's really crit
so the user doesn't vomit from VR now
gddr5 ex introduces a lot of changes too
and I'd love to talk about it more but
we're going to keep that short just talk
about memory compression very briefly
and Vidya has moved to its fourth
generation delta collar compression and
DCC functions by looking at all the
color data temporally which means
basically frame the frame and then
reduces colors lossless leads into Delta
values so instead of fetching all the
color data from an absolute value DCC
can group as an example of the blues and
the skybox of game together store a
neutral value that resides between them
all and then use that neutral sort of
mean to reach outward with Delta values
and then create the colors that are
needed later as they are required this
compression approach reduces bandwidth
consumption by about 20 percent
DCC can do eight to one compression
maximally but also offers four to one
and the original max while two to one
and more on that in the article and so
we come to the conclusion so this thing
is the gtx 1080 words they will first of
all this one founders edition is seven
hundred dollars you should expect to see
cards hitting these 650 i would think
they technically they can go as low as
six hundred by MSRP standards but we'll
see 650 600 cards as these AIB Partners
start shipping their devices in terms of
performance NVIDIA has definitely
improved some of their dx12 Vulcan and
other API performance issues from the
past it's a good card it's performed the
best out of all the cards we've tested
and the 1080 has about a 30% lead over
the 980 this one and the fury X not
shown here now one thing to note this
1080 in some games it's well actually
almost all games it's about 13% ahead of
the nyai 980ti it was very powerful but
if you have one now I wouldn't I
wouldn't upgrade to this because why 13%
is kind of like who cares so it's got
13% lead there if you have something
like a hybrid the EVGA hybrid which i
think is some around twelve hundred
megahertz boosted the gap actually
shrinks considerably so you end up with
more of a 10% some games even lower
difference between the two so a lot of
the gains here I think for passed out
can be chalked up to sort of the
frequency increase it's more than
seventeen hundred megahertz when it's
running boosted now that is a massive
increase over this which I could get
this up to probably 1400 1500
stable 1504 sure I was able to hit
before if you did that you'd be pretty
darn close with these two devices but my
ETA hybrid is more expensive ten
eighties better architecture better sort
of at the ground level it's got eight
gigabytes of memory which is actually
becoming relevant now and overall the
1080 is definitely the card to buy out
of the 980 980ti
and 1080 in the future fury X is kind of
a it's an interestingly placed card it's
not something I would buy over the 1080
right now because the price is very
similar I think I've seen it like 650 in
some places versus 700 for this so the
only thing you really gain is the lower
thermals and HBM but we're seeing that
0.1% frame time kind of hurt the fury
acts and games where it's exceeding the
4 gigabyte allocation on that card the
GT x 1080 founders Edition runs
reasonably cool but AI B's will
obviously outperform that pretty easily
and videos trying to grant its reference
cards longer stay in power by continuing
to sell them through AOL but I would
still point you toward the board
partners for fiercer competition that
will drive prices down they'll pre
overclock the cards and the 1080 will be
a good purchase at the top and as the
price drops down more towards that 600
low-end that they're listing 700 is
pretty steep I it's not unfair it's just
a lot of money and if you can wait a
couple monster at the market to
stabilize and start pumping out these
cards then it's definitely a good wait
and you'll save some money get something
that runs cooler as well it's already an
easy choice though over the 980 I in the
r9 fury x of both of which are priced
similarly and lower-performing sometimes
13% for the 980 i sometimes up to 30%
for the fury x so definitely an obvious
win there for the 1080 so thank you for
watching I know this was huge at length
ascription below for more information
patreon like special video wanna helps
out directly for these efforts I'll see
you all next time
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.