the point of this content is to show why
multitasking benchmarks are wholly
unreliable difficult to execute and
ultimately uninformative unless
controlled to a point of becoming a
quote-unquote normal benchmark there's
something here we can work with for the
future maybe but our first attempt at
multitasking benchmarking largely
demonstrates why no one really does this
mainly it's not reliable and variance
between tests produces unexpected read
wrong results we're looking at the G
4560 and r3 1200 today testing under
conditions of gaming while running video
playback and other bloatware before
getting to that this coverage is brought
to you by the EVGA 1080 TI sc2 and
NVIDIA Destiny to bundle running up
through September 4th the 1080i sc2
comes with a synchronous fan control for
its dual fans and 9 thermal sensors and
again includes destiny to learn more at
the link in the description below
patrick headed up the test planning an
execution for this content I worked with
him on a lot of it and he'll be joining
me at the end of this video to talk
about some of the testing procedures he
went through things that we found
inadequate or things that we thought
might potentially give us an option for
the future but the thing is this is the
kind of testing we do behind the scenes
to build a new test plan we don't
normally publish this type of data but
the idea here is to answer the very
highly requested community question of
can you test multitasking which
depending on which person you were
speaking with that meant a lot of
different things so we introduced stream
benchmarking alongside gaming and that
was a way of doing multi tasking
benchmarks but for some folks that was
too much multitasking so people wanted
other things that were maybe more normal
so to speak things like discord or
Chrome or video playback or music
playback or running steam you play
origin and stuff like that so that's
what we're visiting today and the point
of publishing this is to demonstrate why
it's hard to trust these multitasking
benchmarks and why we haven't quite
figured it out yet because there's just
there are so many variables when you
start introducing the real world
scenarios that everyone requests turns
out pretty much all the real world
scenarios involve internet attached
applications which do whatever the heck
they want a lot of time to the point of
really throwing off results and we have
examples of those where in some cases
the risin 3cp was just getting destroyed
in numbers but not because it was
actually bad just because some
application decided to fire off some
kind of background task while we were
benchmarking and that's why you don't
run tests with those things in the
background normally but you can try to
and if you're able to get these things
well one work with applications that
don't just randomly spawn more services
or randomly start downloading stuff or
start doing whatever in the background
it is that they do if you can figure
that out then you might have a
multitasking benchmark but it starts
looking more and more like a normal test
as you stray from the internet attached
applications that everyone wants us to
test like again Steam battlenet Chrome
all that type of thing so in listening
to the community questions and thoughts
from a while ago and throughout the last
several CPU reviews we've wouldn't that
a lot of people think things like
discord plus gaming is multitasking or
that Skype plus gaming is multitasking
it kind of at the purest definition of
it it is but for the folks who say that
they're seeing performance differences
that are noticeable from running discord
or Skype while playing a game there is
something seriously wrong with your
computer if you're seeing that even on a
gee 45 60 that is not an intensive
workload so we're looking at something
in the middle we're not looking at just
one application discord or Skype because
it's pointless but if you look at stuff
like a bunch of bloatware all at once
then you start developing a real test so
we're doing that we have another test
where we tried to do video playback with
a 4k video while gaming and then tried
logging the 4k videos playback
performance while logging the gaming
performance because if you just take one
of them you don't get the full picture
maybe the games I was perfectly fine but
the videos dropping frames so we tried
that as well
but have more to work with what we
didn't do is the I keep 100 tabs of
Chrome open test because Chrome also
seems to do whatever it wants we looked
at it the theta is Chrome does all kinds
of caching it deactivates tabs sometimes
when they're not in use you have dynamic
page elements like advertisements that
just kind of fire off whenever and sure
you can adjust some of these problems in
testing by getting it
blocker or by trying to avoid things
like YouTube and twitch but then again
we're straying from real-world scenarios
that people are interested in so you
really might as well just use a local
application but even those are
problematic as we'll show you so Chrome
not a good benchmark for a lot of
reasons but as we look through this
other stuff we learned that it's not
alone and being not a good benchmark for
a lot of reasons so here's what we've
got for the bloatware test this one
includes game clients Blizzard or
battlenet or whatever they call it now
mixed with Origen you play and steam all
open simultaneously monitoring software
so hardware info 64 and MSI Afterburner
with hardware info 64 login actively
every two seconds chat clients like
discord in a call and Skype peripheral
software and the xdcam corsair q
logitech gaming software and overwolf
all of which had some sort of used six
overwolf because we don't like it other
than that we had VLC with mp3 playback
looping and then all the tests were
performed for a period of about two
minutes with three passes for parody
just to figure out if this was even
worth pursuing further for longer
durations so here's the problem
something like half of these
applications or more are internet
attached either silently or noticeably
they might be doing things in the
background that we don't know about
unless you really sit there and look at
it and monitor packets and things like
that so that's a problem that's that
means that you have a dynamic testing
element to account for and then another
thing is if we get these benchmarks
working in a satisfactory manner we'd
also have to look more at loss from
baseline performance rather than exact
like absolute performance because if
you're looking at in our 3G 45 60 the
performance ultimately will be different
because the performance should be
different baseline in general we picked
a few games where was about the same but
what you care about is which one doesn't
more efficiently so what's your
performance loss what's the Delta
between a baseline test and a bloated
test so that's that's one of things
we're looking at this was our nuclear
option we ran it this way because just
doing discord in a game is not enough
but because people kept asking for
Internet attached application testing
while gaming so we've done basically all
of them at this point and it should
produce some kind of difference so
that's the
base testing plan we'll start with the
video stuff talk about why we scrapped
it let me go through the bloatware video
testing was our starting point the goal
was to play a game while playing back
one of our own 4k videos hosted locally
and playing back on a secondary monitor
then using software to log game
framerate and video playback drop frames
at the same time VLC was used initially
because it has an expansive options menu
and it can count to dropped frames but
it also came with some issues out of the
box VLC was hitting nearly 100% CPU
usage while watching a 4k video with
these lower end CPUs and toggling some
options in the codec section helped with
that but then the software was no longer
stock so he starts straying from that
scenario in addition there's a sample
that we've got here of what it took to
log a single benchmark run because it
really wasn't trivial windowed mode is
mentioned here because VLC was crashing
when certain games were launched in
full-screen logging the framerate of
video playback was important as well
since one of the issues people
experience is video choppiness not just
loss of framerate in-game that meant to
framerate loggers onto windows at the
same time or relying on VLC zone drop
frame meter we considered switching to
Windows Media Player but detected
playback of more than 60 frames per
second on a 60 frames per second video
which indicates some kind of Microsoft
or Windows Media Player shenanigans
after encountering so many issues with
monitoring playback reliably while also
playing back a video at 4k while gaming
video testing was shelved for the time
being and we moved on the next option
was one that Patrick dubbed the quote
nuclear option including numerous viewer
requested game clients peripheral
clients and other background software
this we thought would surely draw out
any differences between the g45 60 and
1200 in these multitasking benchmarks
and again we started this thinking that
we might have an actually valid
benchmark to look at performance
differences between the two CPUs but we
left with low confidence in multitasking
benchmarking in general without using
local non-internet attached software
let's start with blizzards clients and
why it is 100% unreliable for any type
of benchmark while it's open
beginning with the expectations most of
you would agree that it is reasonable to
assume one of two outcomes here either
there's really no difference with the
battlenet client open and you have
basically the same
forints between the 4560 and the r3 1200
with no loss baseline vs battlenet being
open this isn't lost between the two cps
against each other it's what is baseline
performance of the CPU versus baseline
with battlenet open that's what we're
looking at so it would one expectation
is that there's no difference the other
expectation is that maybe there's a
slight advantage for the r3 1200 but
what you wouldn't expect is that the G
4560 would perform 31% better than in r3
1200 I think we can all agree that's
pretty unreasonable to assume but that's
what we saw that's because this type of
testing is unreliable here's a chart for
anyone who skipped to this chart and
bypassed all the stuff I just said
you're not going to understand any of
the numbers and you're gonna post a
comment calling us chills if you skipped
the last few minutes stop now go back
because you're not going to understand
the context which is that this is not a
valid benchmark that's the whole point
of this video we've got a few main
figures here the G 4560 baseline is 40
8.3 FPS with lows at 33 and 27.5 the R 3
1200 is remarkably close by with the two
parts more or less tied and within
variance in the frame time department
when bloated with all of these
applications which will show on the
screen once again we lose 4 FPS on the G
45 60 that's average dropping 8% in
performance and they hit the frame times
is a bit worse illustrated by 1 percent
in point one percent low is here here's
a surprise though the R 3 1200 drops 11
FPS or 24% and now suffers with have the
0.1% lows that align with more stutter
clearly something went wrong here we
started disabling applications
ultimately finding that battle.net was
the culprit and decided to do something
in the background during the our three
tests that it did not do in the
background during the G 4560 tests
solving for this bumps the R 3 1200 up
to G 4560 performance once again with
the - more or less tied when we acts
battle.net from the R 3 again these are
not definitive benchmarks we're not
telling you that one CPU is better than
the other or even that they're tied what
we're showing you is the results from
different test passes with an R 3 1200
and a G 45 60 and why it doesn't really
make sense some of the numbers you get
sometimes and that's
because of the variance that's why we
don't do these tests normally I'd like
to do them because it is so heavily
requested but we're not gonna be able to
do these benchmarks with the
applications everyone wants to see
benchmarked battle nets behavior here
could also explain other weird frame
deltas when you have battlenet open
while benchmarking something like
overwatch or any other game that they
have on there so it's just kind of a
weird application that throws issues to
begin with let's go to the Metro last
light again we're looping tests for
about two minutes with three loops each
time here is measure last lights
benchmark between the same two CPUs
baseline we've got the G 45 60 at about
84 fps and the r3 1200 at about 85 FPS
we chose this benchmark because they
were so close and performance was more
or less equal the 45 60 drops to 75 FPS
average with bloatware or about 10%
performance loss the r3 1200 drops 31%
of its average performance again because
of completely unpredictable internet
attached software in the form of
battlenet doing things in the background
and of course other South or - it's not
just battlenet disabling battlenet gives
a more proportional loss as you can see
in the bloatware without battlenet
results and who knows what other
software was responsible for performance
that we saw we got lucky and disabling
blizzards client finding that I had
issues and kind of moving ahead
but the theta is retesting the r3 1200
with battlenet sometimes it's numbers
look fine other times it's a loss same
is true for the G 45 60 it's just a
matter of what was going on what was
Blizzard doing what was the client doing
when you ran the test
here's Ash's escalation where we see a 1
to 2 FPS lost with bloatware though the
G 45 60 ran blizzards battlenet and the
R 3 did not in this test so again this
isn't the test you can rely on for a
comparative data between the two
aside from showing that we saw a little
performance loss in this particular
title with the applications doing
whatever they were doing when we had
them on on the background and this
brings up another challenge with
multitasking benchmarking or what we're
calling multitasking benchmarking
because this game is so taxing on the
CPU anyway because it is commanding all
of the CPU resources from Andy and Intel
we now have an issue where ash is
clearly is more or less the same
performance but what's happening
in the background applications what are
they dropping that we don't know about
to try and make sure and keep up with
ashes well the thing is you can look at
some of that
for example logging applications in the
past we've noticed that 8 o 64 will drop
log intervals when it is incapable of
keeping up with the performance for
example running fer mark with prime95
and keeping aida64 in a normal priority
in task manager means that 8 a 64 with
the hardware we were testing on when we
do this will drop intervals so you'll
instead of an interval every second
you'll get an interval at second 1 an
interval at second 19 at 22 at 37 it's
kind of random and you're dropping
performance in that application a 264
but maybe not the other ones so that's
another really difficult thing you have
to keep an eye out for is a de or Hart
grant for those are easier you produce a
log file you check the interval average
and you know if it was logging the whole
time or not but what about things like
video playback software now you need
either another frame monitor that you
know is accurate for video playback or
an application that logs drop frames ok
you can figure that out what about the
other software what about music playback
or streaming playback if you're trying
to work with Chrome or something like
that there are all these things it's not
just monitoring the game it's monitoring
everything else and trying to figure out
where is the performance loss occurring
especially when you start accounting for
things like Windows which schedules
stuff in ways that should theoretically
be beneficial to the user but it's not
the same between Intel and AMD so that's
hard to here's one more rocket League
where we've again got roughly equal
performance to start which was an
intentional choice and then a gap of
about 8% when we use the bloatware
here's the thing with this one we have
no idea if we're getting game priority
on the G 45 60 and some negative effect
to the background software or if the R 3
1200 is genuinely just slower or if
there's some sporadic and unpredictable
background process firing off as a
result of all the game clients and
internet attached applications trying to
do stuff at the end of the day this test
means nothing data is not reliable we
could make
the win or Intel win just by running the
test enough times that one of them has
some terrible thing going on in the
background to take the performance as we
saw with rising in that particular set
of tests with battlenet so the
alternative to that is you could as a
tester not know that something just
happened publish those results and then
you end up with results that make a huge
disparity between two components that
might only exist because something
started occurring like maybe a download
or maybe some kind of Auto video
playback and one of the clients you're
working with that starts tapping into
the CPU for some kind of encoding
process whatever something like that
could go on in one of the applications
depending what you're using so it
requires very careful selection of
applications and then some way to
monitor the performance of those
applications as well if it's video
playback or music playback or something
like that
so a lot of trouble there it's easy to
overlook the difference caused by a
background result that was not the case
for the other product and it's easy to
just have applications that for whatever
reason are scheduled differently between
tests so very difficult to do this kind
of benchmarking and ideally you do it
with stuff that's not internet attached
so the next step to this would be we
ditch all of these game clients
especially battlenet and start doing
stuff with Excel you can make Excel
enumerate or iterate through some
formula ad infinitum and just sit there
and process a formula non-stop while you
play a game ok and then at the end of it
you look at how far did Excel get or was
the framerate of the game you compare
the two numbers unfortunately that's not
really something people do too much I
certainly there are people who do that
but our audience probably not so much
and ultimately the comments we would get
for doing something like that would be
this isn't a real-world test we want
real we're holding multitasking
benchmarking so you're back to the
original problem of here's something
that's kind of synthetic we've created
to simulate multitasking benchmarking
although it is technically a real
multitasking thing that's just does
anyone do it or care about it you could
do that but it doesn't satisfy the
demand which is for more normal
applications so we've got excel you make
it
great numbers you look at some media
player that's trustworthy in its
drop-frame login or some other way to
log it doesn't crash like VLC was doing
under certain conditions you find
something like that maybe there's
something there to test but it starts
looking more and more like a normal test
environment and not the scenario that
people want to see which is unfortunate
because we'd love to fill that demand
it's just it's not easy and we don't
want to publish numbers for something
that clearly has so much variance and we
have no confidence in what's going on in
the background if I could look at it and
know something happened that didn't
happen in the other test and by
something happened I mean one of the
applications painting a server were
tapping into the CPU a different way if
we could look at it and identify that
reliably without investing an absurd
amount of effort and I mean we're
willing to invest effort but there's a
reasonable amount that you can do if
that were the case we'd run the tests
but until that point there's really no
point in running them so this is why we
don't test with that many variables we
tried this continent B started out with
the hopes we would do an actual G 4560
vs. r3 1200 over overclocking over
tasking benchmark we'll call it but it
just didn't turn out that way it turned
into a content piece of this is why this
thing's hard to do with the applications
we used and then besides most people who
talk about feeling a faster or smoother
response in just Windows after changing
CPUs it's probably because you
reinstalled windows unless you went from
something like a garbage seller on up to
literally anything else
so that's another thing to consider is a
lot of these subjective user feeling of
responsiveness comes from things like
SSDs or if that was already in there
then reinstalling Windows but that's
enough of my thoughts on it we're gonna
get patrick on for a minute to talk
through some of his testing and see what
he thinks about the future of trying to
do something like this now that we've
learned a bit and then we'll close it
out okay so I've got Patrick Hahn now
Patrick ran the tests figured out how to
do most of them let's start with that
let's start with what
going back to the video stuff what was
the process to start and adequately
execute all the logging of her video
while gaming we wrote this down in the
article in a little more detail but
basically what I was having to do is
have the video open on one monitor the
game open on another monitor switch the
video to be the active window so that
fraps would detect it said the game as
the process that present Mon was
detecting hit a key combination to start
logging with fraps and present MA and
hit play on the video tab back into the
game and start benchmarking the game and
do all of that within a reasonable at a
time so that the benchmarks gonna be
synched up and so that we wouldn't be
logging frame rates before or after the
benchmark because then we get really
weird 0.1% was just like a really
complicated process for something that
isn't super important really I mean like
we want to log framerate of both the
video and the game because a lot of the
issues that people have when
multitasking are not just with the game
but also like if they're watching a
video if the video is playing back badly
where the audio is skipping so we want
to test for that as well but then adding
that to the benchmark makes it
exponentially more difficult well also
and we could deal with a difficulty but
then you have stuff like one is fraps or
present Mon or whatever monitoring the
video application even accurate yeah is
it even a frame output that is realistic
or what's occurring on screen and then
also they probably add some level
conflict with each other yeah like I've
definitely run tests in the past
accidentally that we figured out we're
running incorrectly where I've had
present mana and fraps log in the same
game and you'll see it because the
numbers don't look right so that's a
question we had VLC's drop frame output
to so VLC does have a feature where it
will display dropped frames and that's a
pretty useful feature if what you're
doing is watching VLC but when you're
doing multiple
things at the same time and the video
isn't the active window we were getting
playback that was stuttering but then
not being reflected in the dropped frame
counter I think we pretty much dropped
VLC after that point I don't know right
now my feeling is I just generally don't
trust multitasking benchmarks with a lot
of the applications people want us to
test because they're like like battlenet
is internet attached yeah and we saw a
30% performance class like for who knows
what reason
yeah there was some pretty weird stuff
and just the the nature of the test
makes the test difficult I mean people
that are talking about difficulties with
multitasking they're talking about
unpredictable behaviors and weird
unpredictable behaviors aren't lab
friendly right there they're hard to
test for they're hard to account for and
they're hard to reproduce this isn't us
complaining saying this is hard feel bad
for us this is saying like this is what
everyone wants to see and we want to see
it too that's why I just paid Patrick
for like a week to try and do this is we
want to see these tests I'd like to do
them I'd love to be one of the only
sites that has the way of doing them and
you know a trustworthy fashion
unfortunately until we can really figure
out key applications that are both
representative of what users want and
friendly to benchmarking processes where
you know what's going on and there's not
variants until we configure those
applications out and everyone agrees
that they're good to test then we don't
have multitasking test for you that
anything beyond today which was like
here's an exploratory look at it because
otherwise like I was saying before we
cut in with Patrick you can do stuff
like we talked about using Excel to just
enumerate some formula yeah that would
be reliable does it count and we are
just not comfortable with the level of
accuracy that we would get from doing
like a casual benchmark with like just
Chrome like a YouTube video playing we I
mean we could do that pretty easily but
then we wouldn't be comfortable
publishing results and standing by the
yeah be trivial to do like how do you
how do you isolate the software from the
network from the OS from the hardware
because we want to isolate the hardware
yeah but network alone could be a big
factor in things so at the end of the
day we don't want to say that the the
1200 is a better or worse cpu than the G
4560 based on this I mean before the
best time yeah just a really complex
question and really a lot of variants
and the answers to that question yep
and ultimately if your form of
multitasking is discord YouTube
your inbox and playing a game it's not
gonna matter which CPU you buy like it
will matter for other reasons for which
we have the r3 1200 review so you can
check that out if you want those reasons
well that's all for this one thank you
all for watching and for filing that
request seriously it's it's very
interesting to look into we like knowing
these things even if we didn't come to
the conclusion that I wanted which was a
real test that shows real differences
that we could trust didn't get that
today but we still got interesting
content and we have stuff we can work on
for the future so thank you for
requesting it if you want to leave
comments with application suggestions
for like video playback maybe you know
of one that is really good at logging
its performance please leave them below
we'll look into it for next time
but I think we're good for now so they
you were watching subscribe for more
patreon.com slash gamers Nexus tops out
directly and we will see you all next
time
you
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.