Gadgetory


All Cool Mind-blowing Gadgets You Love in One Place

Ryzen CCX Performance: 2+2 vs. 4+0

2017-03-26
hey welcome back to harbor unboxed so we now know that risin 5 is just around the corner and even sold out only a few weeks away I just couldn't wait and we took a sneak peek at the gaming performance by running a few simulated tests and this was done by disabling a few calls on the rise and 7 processes I do expect the simulator performance to be pretty much spot-on with what we will see from rising 5 in a few weeks time and really that is great news because things look very good the risin 5 simulated performance video came about because a week prior a heap of you asked me to play around the down core feature and mimic the core configurations of the 6 core and 4 core models and it wasn't because just 24 hours earlier before my video lioness released the same kind of test looking at simulated or eyes and fire performance the testing from ovl alone took an entire two days anyway in the event that Linus beats me once again by looking at the impact see 6 latency has on performance please note I'm not copying him testing for this video began on the 22nd of March I promise once again I decided to make this video because so many of you requested it so what exactly are we going to be testing well actually before we get to that for those of you who don't know here is a quick explanation of how Verizon cpus are designed horizon 7 features 8 cores in total and with the addition of simultaneous multi-threading or SMT for short there are 16 threads on offer however not all 8 cores are located within the same die rather they are spread across two modules or CPU complexes as AMD calls them the CPU complexes or CCX for short are connected using an interface called infinity fabric but we won't cover that in detail here let's focus on the core configuration Rison 7 and the upcoming rise and 5 processors feature to CCX modules which means to a degree half the cores are separated and as a result for them to work together means they will likely be a performance penalty in contrast Intel's 10 core desktop CPUs work within a single die the Broadway lay architecture stacks the cause around a shared level 3 cache the fully enabled silicon offers 10 cores and this is how the six 950 X is configured the 6100 K features two cores disabled while the 6800 K has 4 cores disabled typically processes with defective cause get bend as lower end parts so what would have been a ten-course 69 50 X becomes an eight-core 69 or K or a 6-4 6800 a so what's key to know here is that latency between any core is the same moving back to Horizon it has been discovered that the latency penalty are between cause of different CC axes is over twice that of cause within the same CC X so basically for cause to communicate within the same CC x you're looking at around a 40 nanosecond delay meanwhile when going between CC X's so one core over here in one cc X a core over here in another CC x there's about a hundred nanosecond latency penalty when talking between CC X's and that takes the total time to around 140 nanoseconds opposed to 40 nanoseconds within the same CC x as I just said it is believed as this added latency is why Verizon isn't as impressive for gaming as you might expect it to be based on productivity performance and the reason why AMD is gone with this modular design is well so simple fact that it is just that modular the design is allowed AMD's new zen based naples server chips to pack up to 32 physical cores per chip using multiple c CX modules so essentially arising is a server chip that's been scaled down for desktop computing I should note that it intel scales up the amount of cores their xeon cpus contain they also use a modular design though it only splits the CPU into two their method is called cluster on dial cod4 sure and this is ideal for highly Numa optimized workloads but again we won't go into detail about this here getting back to the matter at hand let's talk about the upcoming rise on 5 models these six core and four core CPUs are based on the same physical chip as Rison 7 so this means all models feature to CC axes each with 4 cores though not all of them will be enabled basically this means Rison 7 CPUs that feature one or more defective cores will be Bend as rising five parts the six core models feature one core disabled Persie CX while the quad core parts featured two cores disabled Persie CX the news that the quad core rising 5 parts would still utilize true c CX units disappointed quite a few people as they were hoping that the four core models would be better for gaming as they wouldn't suffer the late the penalty when working between CCX units with just two calls per CCX the latency penalty will be amplified as it's far more likely crosstalk will occur with fewer cause that being the case a shipload of you have asked me to test the rise in seven processes in a 2+2 configuration and then compare it with a four plus zero configuration that is to say emulating the rise in five quad cores with two cores per CCX and then testing them again with four cores in a single CCX with the second CCX completely disabled the idea being that the latter configuration won't suffer CCX crosstalk latency as all four calls will be working within the same CCX in theory this means games should run better but we'll have to go find out so for testing we have six games in total all of which were tested at 1080p using the Titan XP to try and remove any kind of GPU bottleneck so let's go and check out the result first up we have f1 2016 and here we see running a single CC X for the four plus zero configuration performance is much the same as the two plus two configuration still this game provided a strong result for AMD as the quad-core eyes and five part clocked at four gigahertz a slightly faster than the 7600 K clocked at 4.8 gigahertz as we have seen in previous tests Far Cry primal is a gamut rise and really struggles with evidently though the performance issues aren't caused by the CCX latency as running all four cores within a single CCX did not improve performance in this title this test was a bit pointless but I included it anyway since we already have the 2+2 results from last week's video as you can see the Titan XP is maxed out in front using either configuration on the Rison processor Ghost Recon wildlands is another GPU intensive game and here we see much the same performance using either the standard 2+2 configuration or the 4 + 0 configuration Mark III is a title where I suspected removing the CCX latency might help improve performance further but I was wrong we've seen a real difference here testing with battlefield 1 shows very minor performance improvements when using a single zcx here the 4 + 0 configuration allows us 3% more performance not exactly a huge increase but with roughly the same boost to the minimum and average frame rates it seems like removing the CC across 2 here does lead to slightly better performance interestingly though if we look at the 1% and 0.1% frame time performance in battlefield one the four plus zero and two plus two configurations deliver the same results so it's really starting to look like the increased latency incurred with crosstalk between the CC X's doesn't really impact gaming performance at least in the games we tested the horrible Far Cry primal performance for example certainly isn't CCX related now you might be wondering how do I actually know if the bias was configuring the rise in CPU as it claimed when set to the four plus zero for example how I know it wasn't just still in a 2+2 configuration well the easiest way to determine this is by measuring the level three cache performance here we are looking at the cache latency and as expected the level one and level two cache performance remains much the same regardless of the configuration as this isn't shared cache in other words each core has its own dedicated level 1 and level 2 cache the level 3 on the other hand which is split into eight megabytes chunks of shared cache per CC X will be impacted by the core configuration as we can see here keeping all four cores and the same CCX we only have an eight mega byte level three cache but it's all under the same roof so it doesn't incur a latency penalty with both CCX modules enabled we now have 16 megabytes of level 3 cache but of course it's spread across both CC X's and this increases latency looking at the level 3 cache bandwidth we see that the 2+2 configuration heavily cripples right performance reducing throughput from 210 gigabytes per second to just 91 gigabytes per second the reads throughput also takes a hit dipping from 211 gigabytes per second to 168 gigabytes per second so we know for a fact the down core feature is working and configuring the CPU as claimed before wrapping things up here is a look at the battlefield 1 benchmark running in either configuration as you can see performance is much the same this is a custom perhaps path so the benchmark is an identical but it's very close we of course report on the average minimum and average framerate from 3 runs finally I also took a look at Mass Effect Andromeda before wrapping things up and again this is another fraps pass measuring in-game performance as such the benchmark runs while very similar aren't identical for the most part the 4 + 0 configuration looks much faster but having run the 60-second test three times on average it was just a single frame faster at 115 fps 214 fps the minimum frame rate was also just a single frame faster for the 4 + 0 configuration well initially we were concerned with AMD's decision to spread the Rison 5 quad-core CPUs across 2 CC axes rather than keep them in a single module and this was because we were aware of the latency penalty when communicating between CCX modules and we believed as a more you know gaming orientated CPU it would be imperative that AMD avoided this delay in communication as it turns out at least based on the testing done here but for the most part CCX crosstalk won't have a noticeable impact on gaming performance so the fact that AMD has decided to arrange the rise in 5 quad-core processors in a 2+2 configuration won't be disastrous for gaming so with c6 crosstalk latency not looking to be the problem the main culprit now appears to be memory bandwidth evidence has surfaced recently suggesting that when using ddr4 3600 memory for example Rises gaming performance improves dramatically because of this a few viewers have suggested I retest using ddr4 3600 memory to show what Rison is truly capable of sounds good and I certainly don't disagree more testing needs to be done that said once I managed to get one of my rising systems working with DDR speeds above 3200 I will certainly retest right now they're even getting ddr4 3000 to work as a real chore and I've seen countless user reports from new Rison owners struggling to get ddr4 2666 working so while the testing with ddr4 3,600 memory might be a good indicator of Verizon's untapped or at least future performance it's far from representative of the kind of performance most consumers are going to see in its current condition I feel like most ERISA owners will be using ddr4 2666 memory the fact that you need to play around with base clock overclocking to exceed ddr4 3 and doesn't really make it a viable option at this point for now I really just wanted to get this video out of the way especially as Verizon five approaches for the most part we saw next to no difference spreading the cause across two CCX modules or keeping them under one roof in a single CCX battlefield one was really the only game that showed a slight performance advantage when sticking to a single CCX but yeah 3% gain using an extreme GPU at 1080p isn't exactly noteworthy stuff well that's all for this one guys I hope you enjoyed the testing and I bet a few of you were quite surprised by the findings I know I was anyway I'm your host Steve catch again soon you
We are a participant in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.