FFXV Bench: CPU numThread, SMT, NV/AMD GameWorks Scaling [Update]
FFXV Bench: CPU numThread, SMT, NV/AMD GameWorks Scaling [Update]
2018-02-02
this content piece will explore the
performance anomalies and command-line
options for Final Fantasy 15 s benchmark
with later pieces going into detail on
CPU and GPU benchmarks completely prior
to committing to massive GP and CV
benchmarks though we always pretest the
game to understand his performance
behaviors and scaling across competing
devices for final fantasy 15
we've already detailed the FPS impact of
benchmark duration impacts of graphics
settings and resolution scaling and
we've used command line to automate and
custom configure the benchmarks we've
also discovered poor frame time
performance under certain benchmarking
conditions and we'll explore all of that
in today's video before that this video
is brought to you by Thermaltake and the
view 71 enclosure the view 71 is a full
tower case that's capable of fitting
three video cards and most
configurations it's also one of the
better cooling cases in our recent case
testing bench lineup the view 71 has
hinged a tempered glass doors on either
side that make it easy to open and show
off and it comes with at least one rain
fan though you can get the RGB version
if you prefer learn more at the link in
the description below so we've done a
lot of research on Final Fantasy 15 s
benchmark already a few notes here
before getting started this is not
technically a beta but it is a
pre-launch benchmark utility and there's
no final game present so all of this
stuff will probably at least somewhat
change by the time the game rolls out in
about a month keep that in mind that
said it still serves as a good
benchmarking tool and we're excited to
add it to the fall suite once the game
is launched completely because it's
actually very easy to work with for what
we need
so not a complete game things will
change we are however using the latest
drivers from Nvidia which are supposed
to be tuned for this AMD we contacted
they did release a new driver set today
the day that the game came out but it
doesn't include any optimizations for
the Final Fantasy 15 benchmark any of
you noted to us that they plan on
including those closer to the game's
launch rather than the benchmark so
you'll have to keep an eye out for that
as well they're probably optimizing as
we speak using the benchmark to do so
this game also has a lot of graphics
settings that we
can see they've been exposed but we
can't really change them easily right
now and we'll have more info on that as
we go through this for the original
articles and everything that fed this
video you can go to gamers Nexus dotnet
we will be posting follow ups as well
any time there's a live benchmarking
activity going on like this we're
pushing stuff as soon as we can it hits
the website first so let's get started
with the benchmarks for this one and as
always you can check additional details
in the article link to the description
below if you need test methodology
information this video will basically
set the stage for the next two which
will be the GPU and the CPU benchmarks
we started out by testing for run to run
variants which would be used to help
locate outliers and determine how many
test passes we need to conduct per
device in this frame time plot you can
see that the first test pass Illustrated
on a gtx 1070 with the settings noted in
the charge exhibits significantly more
volatile frame times than what you'll
see for the second pass the frame to
frame interval occasionally slams into a
wall during the first six minute test
pass causing noticeable visible stutters
in gameplay the second pass was
noticeably more consistent in frame time
interval though it did still encounter
one spiked frame time in excess of 250
milliseconds bad enough to notice a
stutter this was over a six minute
period and that said there's one spike
over 250 milliseconds as opposed to
spikes over a full second for the first
run indicated it in blue and a 1200
millisecond frame time in the first run
means that you're left staring at the
same frame for a full 1.2 seconds in
other words you're getting less than 1
FPS for that 1.2 seconds
technically speaking in that interval
the average still hits sixty-five FPS
for both of these passes actually all
three of them even in spite of the 1200
millisecond frame time and that's
because we're averaging nearly 30,000
frames of data as a reminder you'd need
about a sixteen point six six seven
millisecond frame time interval to
achieve an effective 60 FPS run three
exhibited similarly smooth behavior to
run too and we have now observed across
six GPUs that the first run
particularly with the 1080p high
settings appears to have worse 1% in
point 1% lows than the subsequent
next we're moving on to graphic settings
discussion
reddit user random stranger four or five
for detailed all the lower level
settings options for the three presets
in the game the benchmark launcher only
gives the ability to switch between
presets of low medium and high along
with just a couple of resolutions
knowing the lower level details tells us
what we're game works and other graphics
options are enabled and disabled
theoretically giving us a look at two
things one a potentially closing
relative performance gap between AMD and
NVIDIA as lower graphics options are
configured as these disabled game works
technology and two a look at future
options for the full game let's start
with the game works options the high
preset is presently the only one that
the game works graphics options as far
as we know are enabled in and two of
those options
supposedly remain disabled for the
benchmark utility at least sometimes the
shadow works library is probably
disabled at present as is the voxel
accelerated ambient occlusion that said
I'm using the words probably and likely
and maybe here because the user who
found those settings and stripped them
out of the games files did end up
posting a screenshot later after the
user had managed to enable the on-screen
display for the benchmark and noticed
that the on-screen display which is not
technically officially supported did say
that the apparently disabled vxa o
options were enabled in that test so
we're not fully clear on whether we
believe the file they came from or
whether we believe the on-screen display
if we believe the file then they might
be disabled but if we believe the
display then it looks like they're
enabled at least for 1080p high either
way we previously detailed most of these
graphics settings when they were
unveiled back at GDC 2016 VX AO converts
the screen space into voxels based upon
geometric data which reduces the
complexity present from raw triangles
and primitives the x AO then runs a cone
tracing pass for the shadowing
computation and the result is that
ambient occlusion can theoretically be
calculated more accurately demonstrated
with Nvidia's tank asset in this example
so in the example on the screen the blue
voxels are partially occluded and the
red voxels are completely covered by the
volume of jeon
count Racine draws the lines from each
point to calculate how much occlusion
exists from the respective points traced
into the hemisphere around that point to
learn more about this click our old
article linked below the XA o does
require Maxwell architecture and up
including Pascal and I guess the Titan V
if you wanted to count that what we're
not sure about is how well the exit AO
will work on AMD if it works at all and
as for the rest of the game works
options those are from what we
understand all enabled just with VX AO
and the shadow libraries being a big
question mark right now let's pull some
quick data out of our upcoming GPU
benchmark this will look at relative
performance scaling between the RX 588
gigabytes and gtx 1066 gigabyte cards
while switching between medium and high
settings the idea is to see if relative
scaling it worsens with higher settings
and that's where nvidia will
theoretically have more optimization
keep in mind that more than just game
work settings change between medium and
high here so it's not perfectly isolated
as a test but the game work settings are
most likely to be drivers in performance
deltas particularly for AMD and it's for
obvious reason and the probably hasn't
had as much access to the game and they
certainly haven't had as much time
optimizing for their competitors in game
work solutions so makes sense in the
chart the gtx 1066 gigabyte card is
baseline marked at 100% performance the
gtx 1070 under both medium and high
settings maintains 137 percent of the
gtx 1060 s performance it is almost
equal for both presets at 137 for both
medium and high the RX 580 maintains 60%
of the gtx 1060 s performance when using
high settings or 66% of the gtx 1060
performance when using medium settings
andy is regaining ground at medium
settings which means that at least one
of the settings enabled under high is
more taxing for the AMD card that it is
for the competing and vidya card this
comes down to shader level optimization
and or architectural level differences
where shader level optimization would
also account for driver and library
differences involving game works we
don't have enough information yet to
firmly
whether game works is the driving reason
for that six percentage point Delta and
performance increase as we moved to
medium settings with the RX 580 card but
it's a likely contributor as it has been
in the past and a history dictates here
that and videos game works packages
libraries often use things like
tessellation which and videos cards
happen to be pretty good at and AMD does
struggle a bit with the heavier
tessellation without some optimization
on andis side or on the user side there
are settings and Andes drivers that you
can go through to help with this factor
by lowering the amount of tessellation
one thing that we've noticed that we
haven't yet published is that the
tessellation setting for terrain has a
particularly heavy impact on some of the
AMD cards that tend to struggle more
with geometric complexity drawing a lot
of triangles at once and the amount of
tessellation for the Train in this game
is enough to bring those cards down a
couple of ranks you can account for this
somewhat by going through your am the
driver settings but this isn't something
that we've tested for this piece because
it's not really the point of this piece
moving on to CPU testing now we ran
command-line benchmarks using the num
threads and num a sync threads commands
checking for performance disparities on
our stock r7 1700 platform and our stock
i7 8700 K platform
our thanks to peer this girl of the G on
patreon backer discord for helping
troubleshoot these commands if you want
to join us next time we talk about a new
game coming out and benchmarking it you
can go to the link below patreon.com
slash gamers Nexus so the goal here was
to determine if either command impacts
Intel or AMD differently not to match
the CPUs head-to-head that'll come later
and to get you up to speed if you
haven't been following this game
basically you're able to run
command-line options for the game so you
can set flags for the exe when you
launch it and this has been particularly
helpful in building the tests that we've
been building some of those flags
include a number threads and an async
threads setting so you can set number
three Deak hwal to say eight maybe half
of your threads on an r7 1700 the theory
here and there's not really any
documentation on how these work
officially is
setting it to Nam threads eight would
reduce the number of threads that the
game is going to load when it's handling
all the game data so this is something
we tested here async threads were not
really positive if it's even implemented
yet or if it works at all we tried it
we'll talk about that in a bit though
first of all here's the utilization
difference on the r7 1,700
this chart shows all of those tests at
once we're seeing the highest
utilization when set to 16 threads with
baseline meaning no flag at all in
command line also roughly equating num
threads equals 16 it is a 16 thread CPU
having it to eight threads noticeably
reduces CPU utilization so the function
appears to be working at least somewhat
going to four threads that further
reduces utilization aside from one spike
toward the end of the test ultimately
though it comes down to fps although an
ad hoc test we did collect data that
seemed to indicate a baseline fps using
a ten atti of about 131 FPS average
using num threads equals four or eight
gave us 135 FPS average which is just
outside of acceptable margins of test
variants num threads equals 16 didn't
seem to show uplift outside of error and
that's probably because it's the same
thing as baseline this appears to be a
GPU limitation and so we get into a
problem where to really show a
difference with these settings if they
do indeed work and it seems like they
might we start entering realm of
academic study it's not really something
you necessarily do because what we have
here is a gtx 1080i at 1080p with medium
settings not even high and we're still
plotting about 94 percent utilization
apparently on the GPU with occasional
spikes to 97
so we're basically at full load on the
GPU the CPU is not really there yet and
it's hard to say how much of that that
limited performance disparity between
the different number threads flags comes
from GPU limitations but it's reasonable
to assume at least some of it does this
is something we'll be able to explore in
greater depth with our CPU benchmark and
our GPU benchmark which are coming up
separately for now however we can assume
that some of the limited difference here
that comes from GPU limit
and to that extent to really show
differences you enter territory where
you've got two options
you drop the resolution to something no
one will ever play with like 480p and
low settings and eliminate the GP
bottleneck that would certainly show
difference but it's entirely academic at
that point so it's not at all realistic
still interesting though the other
option is to use a low end really low on
CPU like G 45 60 or maybe an r3 or an
old FX fork or something like that maybe
you'll start seeing differences there
but that's not something we're doing for
today with this test as for num async
threads
we're not really sure of that feature is
working right now so we tried it and
we're also not sure how it's supposed to
work there's no official documentation
on it and we tried num a sync threads
equals 16 equals 8 and equals 4 and saw
about the same performance across all of
them it's not any worth generated a
chart for it's possible that it's not
enabled right now it doesn't like the
CPU maybe or that we just don't know how
to use it properly what number to type
or the order of where to type it things
like that so if you have an idea on
using this command you've actually seen
the difference from using it please let
us know below and give us some more
information so that we can look into it
for you but for now not really clear on
if it does anything or works at least on
the 1700 we also ran all this testing on
the 8700 K same problem there except to
a bigger degree all the numbers were the
same because the 1080i is bottlenecking
and yes we can eliminate that bottleneck
no it is not realistic so it exits
really user scenarios it's all academic
we're gonna skip it for now the next
part is standard deviation and test time
standard deviation is another aspect of
our data analysis for benchmarks and we
just posted a video about test duration
and the minimum requirements are how
long a test should be so check that out
if you haven't already at time of
filming this we've only completed half
of our Nvidia and a couple of our AMD
cards because we were waiting for AMD to
push today's drive a revision prior to
testing which they just did and using
our still limited data set starting with
10 DB high we can see that standard
deviation was relatively consistent
across four runs though exhibited
greater variance in our GT X 1080 tests
than the others we may rerun the GTX
1080 as a result of its wider deviation
from the norm
our X 580 has the least deviation but
this is also because it has the lowest
frame rate with these settings it's
struggling at 1080p high something we'll
talk about in our GPU benchmark much of
this has to do with tessellation at
1080p medium with the game work settings
mostly disabled or entirely disabled and
tessellation presumably turned down to
the settings we're observing tighter
results overall with standard deviation
on average FPS below 1.4 the three
presently tested devices or below 1.5
fps for the one percent and point one
percent low values the RX 580 was also
consistent in this testing and this
gives us a reasonable margin for error
of plus or minus 1 to 3 FPS depend which
card we're talking about will further
refine this data prior to our GP
benchmark publication and talk about it
more there for test durations we found
that the full six minute benchmark
produces roughly equivalent results to a
60 second pass or even a thirty second
pass with GPU testing with relative
scalability cross vendor also scaling
equivalently between 6 minutes and 30 or
60 seconds this feeds back into our
video published just a couple days ago
at this point where we talked about the
benchmark duration requirements and how
what you're really looking for generally
is relative performance versus vendor a
and B as long as that scaling is the
same you're good to go for something
like this though this is a specific game
benchmark so we care more about absolute
performance because people watching our
upcoming benchmarks want to know if
their card can play or what card they
should buy to play the game at specific
settings so we'll be looking at it more
from an absolute FPS standpoint however
the absolute FPS and the relative FPS
six minutes versus thirty seconds even
60 seconds 90 seconds all the way down
the line for the most part it's about
the same with GPU down testing CPU bound
testing is a different story we're still
studying that and it looks like we're
going to be doing a longer duration test
with a different amount of passes for
our CPU benchmarks because when testing
CPUs in this game they're really only a
couple spots in the game where the CPS
loaded heavily the rest of it's a very
GPU intensive so you have to pinpoint
those spots and run the benchmark for
that specific location to really bind
the CPU and one final thing here as we
finished this
video we went back to add in this audio
clip because we discovered that with CPU
testing on the AMD r7 1700 CPU under
both stock and overclocked settings we
were observing a performance uplift once
again by disabling SMT so this kind of
feeds back into the number of threads
thing being eight giving you better
performance we'll talk about this more
on the CPU benchmark Fowley it's not a
huge uplift it's not like it's a 2x gain
but we're gaining eight-- FPS on top of
150 and one of the benchmarks to give
you an idea is a couple percent and
that's something we'll talk about more
soon we also made some very interesting
discoveries between AMD and NVIDIA
graphics card scaling performance under
very specific benchmark conditions that
we will also reveal in our upcoming GPU
benchmarks so make sure you subscribe
for that
and speaking of all that CPU benchmarks
of course are upcoming as our GPU
benchmarks subscribe for those if you
haven't already so you can catch them as
they go live should be relatively soon
maybe even today or within the next 12
or so hours anyway we'll see but I
subscribe from that you're gonna
patreon.com slash gamers Nexus to helps
out directly or join the patreon discord
where we were talking with everyone
about the benchmarking process as it was
going on behind the scenes and thanks
for watching I'll see you all next time
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.