NVidia Volta V100 GPU Detailed: Completely New Architecture
NVidia Volta V100 GPU Detailed: Completely New Architecture
2017-05-11
and video detailed more of its Volta
architecture yesterday a name that we
first covered in 2015 where Volta was
shown as targeted for 2018 delivery
Nvidia seems to be on target for 2018
with its initial server grade Tesla V
100 accelerators shipping and dgx
servers by quarter three of this year V
100 is a major architecture milestone
for Nvidia and marks a divergence from
architectures that more closely resemble
the likes of Kepler Maxwell and Pascal
today we're detailing the initial V 100
layout before that this video is brought
to you by courses at new Vengeance RGB
LED Ram which ships with custom screens
ICS for better overclocking performance
and stability given that memory is
highly relevant for performance with new
rise in CPUs now is a good time to do
research on high performance kits start
with the Vengeance RGB LED kit at the
link in the description below the 100 is
not a consumer products just like P 100
was not but eventually we'll probably
have something like V 102 which will be
consumer targeted and would effectively
be an 1180 TI if we were to take today's
now in clay Chur and it looks like
nvidia is on target for 2018 delivery of
potential consumer grade Volta
architecture implementation on video
cards that would be in the gaming market
for today we're looking at the initial
Volta architecture that includes things
like the new tensor cores which are not
really something that gamers will be
able to make use of but are critical to
business operations deep learning HPC
all kinds of things that involve data
heavy crunching and compute whereas
later graphics versions of Volta will
change in their organization somewhat
just like with Pascal P 100 was 64 cores
for SM whereas the one of the two chips
and all the other consumer chips were at
128 cores per SM so there's a bit of
difference between compute and graphics
products but the base architecture is
the same so we can start with V 100 and
see what it looks like today starting
with the block diagram this is the
full-size V 100 block diagram as both
the top one of the largest pieces of
silicon we've seen on a GPU the eye size
at 4 V 100 is 815 millimeters squared
around 30% larger than P 1 hundreds
already large 610 millimeters squared
for comparison consumer grade at 1080
tig through one of two chips are 471 a
millimeters
just give some perspective fold v100
hosts 53 76 FP 32 CUDA cores across 84
SM is using similar organizational
hierarchy to the GP 100 at 64 cores per
SM with a 1 to 2 FP 64 ratio and 2 to 1
FP 16 ratio the result is 15 Dara flops
of single-precision compute about 8
teraflops of double-precision and about
30 teraflops of FP 16 or half precision
compute for deep learning FP 16 is
useful as the precision is not needed
when working with such large data
matrices despite similar organizational
layout the P 100 the actual architecture
is vastly different scheduling and
thread execution rings may have been
reinvented with V 100 but we don't
really know how just yet and the
introduction of tensor cores complicates
things
so alongside the 53 76 CUDA cores which
were fairly familiar with at this point
there are also 672 tensor chords these
are the new ones each SM holds 8 tensor
cores just like each SM holds 64 CUDA
cores and a tensor core is targeted at
HPC deep learning machine typed neural
net applications whereas CUDA cores are
more known they can work and apply to
all these things especially with FP 16
towards deep learning but tensor cores
are special that's because they are a
cluster of al use in a single cores with
a single unit that specializes in FMA
operations refused multiply add and it
can execute those operations across 2 4
by 4 matrices what does that mean well
it means that each tensor core can
execute 64 FMA s per clock so 64 fuse
multiply add operations per clock of a
single tensor core which means that
you're roughly quadrupling what a
previous coup de coeur could do in the
same amount of time one clock cycle in
terms of FMA operations so that is
helpful for compute work it's helpful
again for deep learning it's a huge deal
in that regard for us gaming not really
something we care about right now anyway
nothing that we can see an immediate use
for but that means that doesn't mean
it's it's irrelevant it's obviously a
huge deal for
the Volta architecture tensor
quadrupling coup de coeur execution
throughput in terms of flops per clock
for those server-side very expensive
money driving businesses that's a big
deal so that's the main thing with v100
that end video was talking about at
their event tensor cores are something
to pay attention to but not something
that you need to worry about as a
consumer anytime soon here's a shot of
the SM layout HSM it contains the two
tee pcs that contain blocks of FP 64 FP
32 int and tensor core units just like
always except for the additional tensor
course this is just a block diagram and
not necessarily representative of actual
diet space allocation to the tensor
cores but it's looking like a large part
of the die space of the 100 is allocated
to this new type of core the usual 40
mus are present on each SM totaling 336
TM use across 84 screen multi processors
to P 100 to 240 TM use across 60 SMS
these are indicated by the text marks on
the bottom of the diagram and that's
again an SM layout diagram whereas the
block diagram shows them all working
together the architecture will be built
on a new 12 a nanometer FinFET process
which Nvidia and tsmc they stick the
name and video on it so it's thin set
Nvidia which is just special
optimizations that we don't know about
made to the process by the SMC
so 12 nanometers it's a completely new
architecture this isn't Pascal base it's
not Maxwell or Kepler base or anything
like that and the die size again 815
millimeters squared totally in 20 1.1
billion transistors that is at the
reticle limit for what TSMC can
manufacture that is the absolute hard
limit of their manufacturing tools the
physical tools in the factory the fab I
should say you can't really get bigger
than that right now and when the eye
size increases on GPUs if you're not
aware it generally means that yield goes
down this is just part of why the bigger
GPUs like something you'd find on a
1080p eye are more expensive than the
smaller ones which are physically
smaller yield goes down to Costco
up because there's greater risk involved
you lose more of your silicon each run
you're doing of the wafers so that
drives up cost but for consumers it
eventually comes down our way because as
these large enterprise b2b type
operations pay for the new technology
because they can really benefit from it
the cost goes down the yields go up the
die sighs eventually shrinks a bit for
something like an eleven atti or
whatever it may be in the future that's
why you normally see these GPUs on
gaming grade cards smaller than the
accelerated cards the GB 100 accelerator
is the first product shipping with Volta
architecture it is a cut-down version of
the v1 100 block diagram that we just
went through not back cut down it's 50
120 CUDA cores rather than the 53 76
that the full of uLTA block is capable
of from what we've seen so far so it's a
little cut down and not much and then it
is HP m2 supported just like the GP 100
accelerator was that would be the
previous tesla card high in tesla card
and HBM 2 here is limited to 16
gigabytes they can't go higher until
someone can manufacture HP m with a
greater count of dies per stack and
because nvidia is sticking to four
stacks for this new GPU just like for GP
100 accelerators that means they're
limited to 16 gigabytes of HP m - it's
still a 4096 bit bus just like GP 100
was but the efficiency has gone up from
what they've said and we don't really
have a whole lot of information on volt
at a low level just yet we've got a
decent out of starter information but
efficiency should have gone up that also
means that there could be a change in
perk or efficiency which is something
we've seen the past architectures for
example Kepler - Maxwell there's
something like a 30% increase in
performance per watt out of the cores or
just efficiency overall which means that
you can't always compare just raw core
count across architecture even from the
same manufacturer because those cores
might be physically capable of a
different amount of throughput or
different operations or whatever
depending on which art you're looking at
how old it is things like that so
our optimizations of course we don't
know the details just yet other than top
level things like threat execution and
organization have changed but no no real
details to speak of right now that said
they should be coming out pretty soon
because white papers for these new
architectures normally get pushed pretty
shortly after the GTC event which is
where this was announced and is in the
process of concluding this weekend so a
consumer target looks like next year is
pretty likely 2018
that's what NVIDIA has said all along
for the most part so that's not changed
if you're in the server business and for
some reason watching us and I guess the
thing for you to know would be that this
GV 100 accelerator cost $18,000
standalone or you can buy a server a dgx
server which has it's got the full specs
online on their site that's $150,000 and
means you get them earlier quarter three
but if you're in the business of making
money off these things I guess getting
it earlier is probably something that
would make you money but that's not
really Howard Territory so we'll wait
for Volta on consumer products but as
more information on GV 100 comes out
we'll deep dive it just like we did with
GPU 100 and of course Vega is
theoretically happening at some point so
we'll stay tuned for that as well thank
you for watching as always you can go to
patreon.com/scishow and ice themselves
out with this type of reporting or you
can go to store that gamers nexus dotnet
where we've got shirts like this one or
the graphed logo shirt now restock Singh
cotton subscribe for more I'll see you
all next time
you
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.