Comparative Deep-Dive: Zen vs. Bulldozer Architectures

AMD's glory days come in waves as do Intel's although up until very recently it seemed like Intel had almost full control of the consumer grade CP market risin has added the competition this space so desperately needed we could have a bit more if you ask me but risin provided builders with a value-based alternative that still pushes solid frame rates without running extremely hot or requiring ddr3 or requiring beefy overclocking motherboards or sporting pcie 2.0 ya platform is that old and we covered the viability of FX processors in our last video which you can check out right here but today we'll take one last deep dive into the bulldozer and piledriver architectures and compare them with current and Zen and Zen plus offerings just a heads up this one's gonna be a bit technical so let's start right away with the bulldozer block diagram bulldozer was the codename for the first set of FX queues including the fx-8150 and 6100 typically those names with a 1 in the second place holder Bashir or piledriver CPUs were essentially refreshes of the 32 nanometer bulldozer architecture internally everything is nearly identical the notable improvements included integer scheduling and power consumption will address both families interchangeably throughout this video because the block diagrams do look nearly identical the basic breakdown of the bulldozer architecture is as follows a fetcher splits requests and instructions between two decoders where control words are created sent to the dispatch and fed to two unique integer schedulers and a single floating-point scheduler the center block in each integer cluster is a set of Al use and AG use which perform arithmetic operations and calculate addresses respectively these are important for memory calls between the CPU and the main memory also called system Ram that's these guys right here the CPU and RAM communicate via the memory bus which is an ultra-low latent highway for sending and receiving temporary calculations and instructions it's made up of two parts the address bus and the data bus where the latter is in charge of transferring information to and from the CPU the address bus tells the data bus where to find the data required by the CPU you can see how those two kind of work hand and now back to our block diagram instructions not sent to the integer clusters are fed to the FPU or a floating-point unit here numbers are approximated and expressed in scientific notation so values requiring decimals or involving division for example may be sent to the FPU for processing simply for its ability to distinguish the significand base and exponent and then that show the key difference between an FPU and an ALU for example involves this decimal point if values cannot be expressed as whole numbers the FPU is probably involved now by contrast ALUs are intended to perform logical operations involving and or not you get the point math may still be involved but it won't stray very far from simple addition and subtraction this is why so many pipelines exists within integer clusters particularly if there are more cores and pipelines at a programs disposal and processes can be expedited and thus parallelized a few side points I want to touch on GPUs or excellent parallel processors thanks to their typically several thousand cores whether they be stream processors from AMD or CUDA cores from Nvidia graphically driven programs are resource intensive and extremely demanding in real time so the GPU handles these in real time as a result and another thing with respect to the FPU back in the good old days basically before I was alive a fuse used to be a dense that you could buy and install after the fact if you are running some seriously heavy programs nowadays FPS are almost totally integrated so what happens with the three sets of data then two instructions are executed integer paths and another on the floating-point level where do they meet LSU no not not that LSU this LSU the load store unit and that it isn't technically neat here that would be the core interface unit which we'll discuss next but in this case the LSU literally does what its name suggests loads and stores instructions to and from the memory it's how the AG use an FPU send and retrieve data from the system memory and from the memory subsystem so this ties back into the memory bus we discussed earlier just wanted to throw that out there because we were talking about memory that's not gonna go now down here toward the bottom we have two important blocks and then we'll talk about the cache we're still just on bulldozer architecture by the way we haven't even touched horizon yet though you'll seem very familiar acronyms so we won't need to P ourselves too many times so this right coalescing block basically acts as a filter for repeating right requests easing the load on the l2 cache the core interface unit tied just below is the network unifier linking all important aspects of the module and allowing the ICS to communicate with the l2 cache directly this cache reduces the latency incurred when executing certain tasks so if for example you use a particular program quite a bit and there are certain instructions the CPU can send to the cache and you open the program it can be expedited it can run quicker open quicker run some particular program or you know line of code within that program very quickly because it's stored in cache up front and it's extremely fast because it's already pre-loaded and doesn't have to run through the pipeline's system Ram does a similar thing now that the processing apparently much slower because the latency is also significantly higher on top of that not all data that's in system Ram it just bypasses the pipeline sometimes it has to be reprocessed sometimes there are instructions that haven't yet been processed that are stored in system Ram temporarily in the case of cache it's almost always data that's already been executed and then just kind of there temporarily for repetitions sake so that you don't have to keep running the same process over and over again it's just very expensive from a resource perspective but anyway this is one of the reasons why CPU support various levels of onboard cache and in general the smaller it is the faster it is level one cache being the fastest again the smallest as well so you can't use a lot of it all right are you ready for Raisa now I know that was a lot to digest this script took several days to write on account of all the research involved however I think you'll find this part of the video a bit easier to understand if a lot of this is new to you up front just because we'll be comparing not necessarily talking about how they work or why they work a certain way so here we go one of the key differences between the architectures is the lack of a split integer cluster layout in horizon bulldozer packed two unique schedulers and a single FPU in each module and there were up to four modules in FX processors meaning in certain cases these cpus could act as eight core cpus and in other cases they'd act as four cores this partly explains why certain programs including Cinebench initially detected chips like the ADM 150 as for core 8 thread units instead of 8 core as was advertised by AMD bulldozer implemented what was referred to as a clustered multi-threading module which literally implies that some aspects of the unit are sharing resources including the FPU and including l2 cache needless to say a huge shortcoming of an FX processor could be identified when the floating point pipeline was fully saturated since there were only up to four of those per die in quote-unquote eight core CPUs six core CPUs only had three of these reisen largely avoided this issue by delegating a single integer cluster in FPU per core this block diagram gets into a bit more detail than the last one we saw but the key things to point out are the retire queue the dispatcher and the integer cluster Rison CPUs boast a simultaneous multi-threading or SMT which is a way for schedulers to prioritize and sort data through logical pipelines it's basically a more efficient scheduler we discuss hyper-threading which is Intel's derivative in this video right here it's basically the same general process you can see the rename reallocate block inside the integer cluster which sends redundancies to the retire queue it's a way to filter out repetitions and keep the expensive loads on the pipeline's themselves the same is true for the floating-point unit inside the integer cluster we see six unique pipelines for Al use and to age years we only saw two and two before in bulldozer the extra ALU speed-up logical operations and allow rise and handle more instructions per core they're also important for SMT allowing an added scheduler to saturate pipelines with fewer skips and errors this is one of the big reasons why Rison is so much more efficient per core twice IPC's generally a lot higher than that of bulldozers another key difference between the two architectures has to do with cache as discussed earlier - bulldozer ICS which are essentially cores of their own at least according to AMD themselves share one large chunk of l2 cache and remember the larger the cache the slower operates Rison cut this size down from two megabytes down to 512 kilobytes and that resulted in a much quicker response from the cache though a trade-off there is not being able to store as much data in it so in general Zen levels 1 & 2 cache are roughly 2 times as fast as the previous architectures again though with that size drop it's not really going to play too heavily into the ability for the CP to perform everyday tasks another radical shift from bulldozer is reisen's CCX or core complex dependency with each CC x4 Zen cores exist and 2 CC exes per die exist allowing for up to 8 cores and 16 threads per chip CC X's are connected to each other via the Infinity fabric which is AMD's way of interconnecting CPUs to GPUs as well as clusters of course to other cores Layton sees between cores in each CC x are extremely low by comparison while Layton sees between CC X's can be up to 10 times slower depending largely on system ram frequency but it's one of the variables involved and this is why first generation CPUs typically yielded higher frame rates when faster memory kits were used Rison also sports a B x2 rather than a VX first introduced in Sandy Bridge and bulldozer architectures things like XFR were bundled in precision boost ATX things you probably wouldn't even know were present unless you purposely looked for them but the options and tweaks have definitely improved the user experience and even allow us to tinker a bit more with the UEFI in question which is a good thing all around so I say all of that to say this very obvious statement Rison was a big step in the right direction given the technology of its time we went from 32 nanometers down to 14 and now twelve nanometer degrees of precision in just a few short years and if a few important notes I want to mention horizon chips not ending with a G do not include iGPS which are a totally different discussion for a different video bulldozer omitted on board graphics as well now if we looked at the block diagram of an Intel CPU would look a little different I mean that the CPU side would look the same but you'd also have to worry about the IGP on board as well it's the integrated graphics processor which means that if you have one of those you can just plug an HDMI cable into your motherboard and run off of the chips graphics instead of having to you know use a discrete card like a 980ti or an rx 582 you actually get a picture on your monitor so with Rison and bulldozer you couldn't do that unless you were using like a 785 G chipset I have 786 on the scribble pretty sure at 785 or something similar to be bundled with the board it's basically a dedicated graphics driver that's embedded on the board itself it's not included in CPU so the rule still applies I do hope you've enjoyed this video by the way I know it's a lot to take in but I appreciate you giving me the chance to explain this stuff as best I can to you with the limited primary resources at my disposal I have a lot of fun with these and I hope you are willing to contribute at least willing to be somewhat entertained or interested in this kind of stuff because I feel like this is how the channel started and we've kind of straight away from that because as the channel grew a lot more people were interested in just PC builds benchmarks but this is the heart of the channel right here and this is why the channel is called science studio so for those wondering these videos are why the name is still what it is leave a comment down below click that red subscribe button if you're feeling especially special and give me a thumbs up if you liked the video if you like content look for stuff on the channel past abuse topics like these you guys are awesome I will catch you in the next one this is science video thanks for watching and thanks for learning you

Gadgetory

All Cool Mind-blowing Gadgets You Love in One Place

Comparative Deep-Dive: Zen vs. Bulldozer Architectures

2019-03-17