hey welcome back to harbor unboxed so we
now know that risin 5 is just around the
corner and even sold out only a few
weeks away I just couldn't wait and we
took a sneak peek at the gaming
performance by running a few simulated
tests and this was done by disabling a
few calls on the rise and 7 processes I
do expect the simulator performance to
be pretty much spot-on with what we will
see from rising 5 in a few weeks time
and really that is great news because
things look very good the risin 5
simulated performance video came about
because a week prior a heap of you asked
me to play around the down core feature
and mimic the core configurations of the
6 core and 4 core models and it wasn't
because just 24 hours earlier before my
video lioness released the same kind of
test looking at simulated or eyes and
fire performance the testing from ovl
alone took an entire two days anyway in
the event that Linus beats me once again
by looking at the impact see 6 latency
has on performance
please note I'm not copying him testing
for this video began on the 22nd of
March I promise once again I decided to
make this video because so many of you
requested it so what exactly are we
going to be testing well actually before
we get to that for those of you who
don't know here is a quick explanation
of how Verizon cpus are designed horizon
7 features 8 cores in total and with the
addition of simultaneous multi-threading
or SMT for short
there are 16 threads on offer however
not all 8 cores are located within the
same die rather they are spread across
two modules or CPU complexes as AMD
calls them the CPU complexes or CCX for
short are connected using an interface
called infinity fabric but we won't
cover that in detail here let's focus on
the core configuration Rison 7 and the
upcoming rise and 5 processors feature
to CCX modules which means to a degree
half the cores are separated and as a
result for them to work together means
they will likely be a performance
penalty in contrast Intel's 10 core
desktop CPUs work within a single die
the Broadway lay architecture stacks the
cause around a shared level 3 cache the
fully enabled silicon offers 10 cores
and this is how the six 950 X is
configured the 6100 K features two cores
disabled while the 6800 K has 4 cores
disabled
typically processes with defective cause
get bend as lower end parts so what
would have been a ten-course 69 50 X
becomes an eight-core 69 or K or a 6-4
6800 a so what's key to know here is
that latency between any core is the
same moving back to Horizon it has been
discovered that the latency penalty are
between cause of different CC axes is
over twice that of cause within the same
CC X so basically for cause to
communicate within the same CC x you're
looking at around a 40 nanosecond delay
meanwhile when going between CC X's so
one core over here in one cc X a core
over here in another CC x there's about
a hundred nanosecond latency penalty
when talking between CC X's and that
takes the total time to around 140
nanoseconds opposed to 40 nanoseconds
within the same CC x as I just said it
is believed as this added latency is why
Verizon isn't as impressive for gaming
as you might expect it to be based on
productivity performance and the reason
why AMD is gone with this modular design
is well so simple fact that it is just
that modular the design is allowed AMD's
new zen based naples server chips to
pack up to 32 physical cores per chip
using multiple c CX modules so
essentially arising is a server chip
that's been scaled down for desktop
computing I should note that it intel
scales up the amount of cores their xeon
cpus contain they also use a modular
design though it only splits the CPU
into two their method is called cluster
on dial cod4 sure and this is ideal for
highly Numa optimized workloads but
again we won't go into detail about this
here getting back to the matter at hand
let's talk about the upcoming rise on 5
models these six core and four core CPUs
are based on the same physical chip as
Rison 7 so this means all models feature
to CC axes each with 4 cores though not
all of them will be enabled basically
this means Rison 7 CPUs that feature one
or more defective cores will be Bend as
rising five parts the six core models
feature one core disabled Persie CX
while the quad core parts featured two
cores disabled Persie CX
the news that the quad core rising 5
parts would still utilize true c CX
units disappointed quite a few people as
they were hoping that the four core
models would be better for gaming as
they wouldn't suffer the late
the penalty when working between CCX
units with just two calls per CCX the
latency penalty will be amplified as
it's far more likely crosstalk will
occur with fewer cause that being the
case a shipload of you have asked me to
test the rise in seven processes in a
2+2 configuration and then compare it
with a four plus zero configuration that
is to say emulating the rise in five
quad cores with two cores per CCX
and then testing them again with four
cores in a single CCX
with the second CCX completely disabled
the idea being that the latter
configuration won't suffer CCX crosstalk
latency as all four calls will be
working within the same CCX
in theory this means games should run
better but we'll have to go find out so
for testing we have six games in total
all of which were tested at 1080p using
the Titan XP to try and remove any kind
of GPU bottleneck so let's go and check
out the result first up we have f1 2016
and here we see running a single CC X
for the four plus zero configuration
performance is much the same as the two
plus two configuration still this game
provided a strong result for AMD as the
quad-core eyes and five part clocked at
four gigahertz a slightly faster than
the 7600 K clocked at 4.8 gigahertz as
we have seen in previous tests Far Cry
primal is a gamut rise and really
struggles with evidently though the
performance issues aren't caused by the
CCX latency as running all four cores
within a single CCX did not improve
performance in this title this test was
a bit pointless but I included it anyway
since we already have the 2+2 results
from last week's video as you can see
the Titan XP is maxed out in front using
either configuration on the Rison
processor Ghost Recon wildlands is
another GPU intensive game and here we
see much the same performance using
either the standard 2+2 configuration or
the 4 + 0 configuration
Mark III is a title where I suspected
removing the CCX latency might help
improve performance further but I was
wrong we've seen a real difference here
testing with battlefield 1 shows very
minor performance improvements when
using a single zcx here the 4 + 0
configuration allows us 3% more
performance not exactly a huge increase
but with roughly the same boost to the
minimum and average frame rates it seems
like removing the CC across 2
here does lead to slightly better
performance interestingly though if we
look at the 1% and 0.1% frame time
performance in battlefield one the four
plus zero and two plus two
configurations deliver the same results
so it's really starting to look like the
increased latency incurred with
crosstalk between the CC X's doesn't
really impact gaming performance at
least in the games we tested the
horrible Far Cry primal performance for
example certainly isn't CCX related now
you might be wondering how do I actually
know if the bias was configuring the
rise in CPU as it claimed when set to
the four plus zero for example how I
know it wasn't just still in a 2+2
configuration well the easiest way to
determine this is by measuring the level
three cache performance here we are
looking at the cache latency and as
expected the level one and level two
cache performance remains much the same
regardless of the configuration as this
isn't shared cache in other words each
core has its own dedicated level 1 and
level 2 cache the level 3 on the other
hand which is split into eight megabytes
chunks of shared cache per CC X will be
impacted by the core configuration as we
can see here keeping all four cores and
the same CCX we only have an eight mega
byte level three cache but it's all
under the same roof so it doesn't incur
a latency penalty with both CCX modules
enabled we now have 16 megabytes of
level 3 cache but of course it's spread
across both CC X's and this increases
latency looking at the level 3 cache
bandwidth we see that the 2+2
configuration heavily cripples right
performance reducing throughput from 210
gigabytes per second to just 91
gigabytes per second the reads
throughput also takes a hit dipping from
211 gigabytes per second to 168
gigabytes per second so we know for a
fact the down core feature is working
and configuring the CPU as claimed
before wrapping things up here is a look
at the battlefield 1 benchmark running
in either configuration as you can see
performance is much the same this is a
custom perhaps path so the benchmark is
an identical but it's very close we of
course report on the average minimum and
average framerate from 3 runs
finally I also took a look at Mass
Effect Andromeda before wrapping things
up and again this is another fraps pass
measuring in-game performance as such
the benchmark runs while very similar
aren't identical for the most part the 4
+ 0 configuration looks much faster but
having run the 60-second test three
times on average it was just a single
frame faster at 115 fps 214 fps the
minimum frame rate was also just a
single frame faster for the 4 + 0
configuration
well initially we were concerned with
AMD's decision to spread the Rison 5
quad-core CPUs across 2 CC axes rather
than keep them in a single module and
this was because we were aware of the
latency penalty when communicating
between CCX modules and we believed as a
more you know gaming orientated CPU it
would be imperative that AMD avoided
this delay in communication as it turns
out at least based on the testing done
here but for the most part CCX crosstalk
won't have a noticeable impact on gaming
performance so the fact that AMD has
decided to arrange the rise in 5
quad-core processors in a 2+2
configuration won't be disastrous for
gaming so with c6 crosstalk latency not
looking to be the problem the main
culprit now appears to be memory
bandwidth evidence has surfaced recently
suggesting that when using ddr4 3600
memory for example Rises gaming
performance improves dramatically
because of this a few viewers have
suggested I retest using ddr4 3600
memory to show what Rison is truly
capable of sounds good and I certainly
don't disagree more testing needs to be
done that said once I managed to get one
of my rising systems working with DDR
speeds above 3200 I will certainly
retest right now they're even getting
ddr4 3000 to work as a real chore and
I've seen countless user reports from
new Rison owners struggling to get ddr4
2666 working so while the testing with
ddr4 3,600 memory might be a good
indicator of Verizon's untapped or at
least future performance it's far from
representative of the kind of
performance most consumers are going to
see in its current condition I feel like
most ERISA owners will be using ddr4
2666 memory the fact that you need to
play around with base clock overclocking
to exceed ddr4 3
and doesn't really make it a viable
option at this point for now I really
just wanted to get this video out of the
way especially as Verizon five
approaches for the most part we saw next
to no difference
spreading the cause across two CCX
modules or keeping them under one roof
in a single CCX battlefield one was
really the only game that showed a
slight performance advantage when
sticking to a single CCX
but yeah 3% gain using an extreme GPU at
1080p isn't exactly noteworthy stuff
well that's all for this one guys I hope
you enjoyed the testing and I bet a few
of you were quite surprised by the
findings I know I was anyway I'm your
host Steve catch again soon
you
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.