Primitive Discarding in Vega: Mike Mantor Interview

hey everyone we are at the AMD event for RX Vega and thread Ripper modeled some WX stuff as well I enjoyed by mic mentor who is a corporate fellow at Andy Mike works a lot on the architecture side of things so very knowledgeable on actual GPU low-level architecture we're gonna be focusing on one topic today which is a primitive discard and kollene before getting to that this content is brought to you by the Thermaltake flow RGB closed-loop liquid cooler which is a 360 millimeter radiator plus 3 120 fans that are RGB eliminated if then we'll take it ring fans at that this is a 4.5 done 8 attack pump which is one of the faster problems you can learn more at the link in the description below so the first question I guess is what is the small primitives this Carter or what do we need to know about the basics before diving into it let's just talk about the basics at the high level right when an application sends or renders objects every object has characteristics a model is built for an object that's then rendered into a 3d scene and an object is usually modeled as a complete object so no matter how you're viewing it where it is in your field of view the representations there and these models move around and view space and the graphics processor processes the triangles of the process or the of the objects that are in the in the field of view and decides whether or not they're visible and so one of the first things that happen when we run and are we draw an object is we we do run a shader sooner or later that processing the vertices and creates vertex positions in a common view space and part of the state data that defines the common view space is a view frustum that's to decide whether or not the triangles or the triangles of an object the primitives of an object are actually within the view or outside of the view so you can think if if objects are completely outside view there's no reason to sum down the graphics pipeline in many times applications do the first-level process of culling before the object is ever even sent to the GPU or they may send that a course for representation of the geometry which is just a few triangles to query the GP whether or not objects even in the view if it's not in the view there's no reason to send it to the rasterizer if you can predetermine that and if it's a very complex object instead of taking the time to render it you send down a bunch of bounding volumes find out whether or not each object is visible or might be visible in some way right so when the object comes to the graphics pipe and we do the position processing you can think that there's different kind of columns so part of the object might be outside of my field of view and part of the triangles are gone the backside of the objects made up of triangles too that are not visible or viewable and then obviously you can have an object that's in space such that part of the triangles far away and when it projects in the screen space to become very small triangles the front-facing one so I really described three different scenarios there were I maybe outside of the view frustum the triangle might be back faced or the triangle somehow in screen space can become very small and so for most content that comes to the GPU we see on average more than 50% of the triangles that come down the pipeline are still in a place where they don't need to be rendered meaning we don't need to scan convert them and process them in the earlier that we can determine that the triangles out of the view frustum the triangles back place cold or the triangle is too small to hit any samples when you go to render it the quicker we can remove any effect of that triangle from the rendering process serum your immediate goal is reducing the load on I guess overall pipelines to get stuff out of the way that doesn't need to be there is you can free up resources for other tasks yeah and just unnecessary work so in the ships that we've been building for a while now we have a vertex process that runs and it could either be a domain shader or could actually be a vertex shader it could be a vertex process on the output of a geometry shader that's doing amplification or decimation and at that point when you finally have the final position of the vertices of the triangle is is one point where we can always find out whether or not the triangle is is inside of the frustum back faced or too small to hit and from up from a frustum testing you know there's a mathematical way to figure out whether or not a vertex is inside of you for us from and if any one of the vertices are inside the view frustum then we'll know that that triangle could potentially create pixels to do a back face culling perspective you can find two edges so with the three vertices you can find a one edge in a second edge and then you can take a cross product of that and determine the face it does up the triangle and you can then top product that with the eye rate and if it's a positive result then it's facing in the direction of the view and if it's negative it's a back face triangle and you don't need to do it now geometry can be defined with two sided geometry one sided geometry so state data goes in to whether or not you can opportunistically throw a triangle way so you can be rendering something where you can actually fly inside of an object can see the interior of it and then when you come outside you can see it from down side in in those cases you can't do back face culling and then the last one is really a zero area triangle in depend on how fine-grained you're sampling is you know it become less and less if that zero area as and effectively zero visibility because it's just that small is that it's just that small in screen space and you can determine that it's not going to invoke any sample in the sampling process say you're looking at something between samples that you're taking getting rid of that can not touch genetics because it isn't touching example sine across the pending samples and again it's a test and as we've talked today a bit about our premise Reiter which i think is what kind of been folks some of this discussion correct is that when we tradition have done a vertex shader we we do position and then we do attribute calculation which is per vertex data and after we've produced a per vertex data to be used in an interpolation process for how to be data to go into a pixel shader we have to store that that attribute data in a place that can be accessed when we're launching pixel shaders and so in the Vega architecture one of the challenges that we set out to take care of is that when we want to increase our efficiency which we operate work so we looked at one of the places that we could make an improvement in the latest architectures is to be able to discard primitives before we store the attribute data so that our storage in between the vertex processing and the pixel processing can be more effective at storing data that's really going to be needed for pixel shaders that are going to be launched so I guess if you're looking at this in terms of a canonical view of the render pipeline this type of calling happens essentially in one of the first stages it sounds like the the culling without a prim trader happens after all vertex shading which means we've already created that attitude data we go in the prim shader we can do it before we can do the position process and then if we can create a prim traitor we can have access to three vertices of a prim or two or one depending what kind of primitive you are and do the culling process and then conditionally do the attribute processing in the writing of the attribute data so one of the advantages of this new primitive processing is that you can remove the geometry before any attribute data is stored and before any day to go any primitive data it gets into the pipeline and once you put primitive into the pipeline some more later it potentially takes a clock cycle to take it out and discard it so if we can discard it before you send it down the river deliver doesn't have it in there Yeah right and what about for differences with Vega specifically versus previous architectures you've worked on are there any major milestones and changes like it's what grim shaders the big one I've been talking about but if we look back at Fiji for example are there any major differences there that are of note for Vega oh yes as we talk to this in the releases you raise so as we talked about the optimizations that we did to increase the frequency are a big challenge so in the same process is the Polaris class of GPUs um we've uplifted the frequency you know kind of a minimum of 400 megahertz yeah crouching 1700 is Oh actually exceeding 1700 so we're saying we can run at least 1700 and depending on the product defined the the device is capable and will burst up above 1700 so when you're pulling the object the primitives out of the pipeline earlier it sounds like if I understand sounds like you're saving potentially on a lost cycle because you're removing it before it's ever consumes the cycle and then is there an effect to memory bandwidth as well or anything else outside of cycles oh yes there can be so a lot of times you know I use the example the cup and you got a bunch of back face triangles on the backside so usually when we call a primitively we can call a group of primitives that are interconnected away which means then there's certain vertices that will not need attribute processing at all and after me processing involves touching per vertex attribute data from memory in to the shader and then doing some kind of calculations in it to prepare it for use in shading or at least position it in the on chip storage for use for that so if if I were doing my cup for instance and if I send this down the pipeline I don't do any back base calling for the triangles on the backside all of these vertices making up the primitives back there will have to fetch our attribute data when really all we needed what's visible on the front side of things what what sort of resource gets used on the GPU and you're fetching has that data what's being engaged the whole cache hierarchy and the memory pins right so we already do vertex for use in most cases so each vertex is only fashion and data once and then when you reference it from that one patch in other words a vertex can bring in cache lines neighboring vertices if they happen to be co-located and memory can find their data in cache lines but the whole cache hierarchy into the shade array the shaders doing that the per vertex any kind of calculations you have to do and then storing the data off into internal storage in the chip it's it's an interesting amount of circuitry that's engaged in what is there an average a percentage of objects that you're able to call yeah I think is a very when we first started talking to you that roughly in most scenes that we look at greater than 50% of the triangles are called which kind of makes sense just when we look at the object perspective right half of an object is usually visible another half but it can vary quite greatly right just depending on the scene in geometry sometimes it can be as high as 80 to 90 percent and I guess the the process sounds like that you're stepping through the process of figuring out which items to call in a way that theoretically there should be no visible impact to the what the user seen in other words you're not like reducing the geometric complexity by for the visible geometric complexity by difference no it's only it's not a change in the complexity of anything need to do in every sample or pixel you know so anything that the rasterizer is going to sample is not discarded so it's all very conservative discard in actually that's very important that you can only Dischord the things that have no impact to the scene right for focus are Vega epi coverage energy was one of the applications that really saw a massive output and performance versus previous architectures - this is just the mental what you're doing as always yes any other major items of note here you want to go over before we close out from I think that that does it hopefully this helps so you and your readers understand well thank you Mike I appreciate it thank you thank you all for watching as always you can subscribe for more links in the description below for more information or if you want transcriptions of some of the us to make it a bit easier to consume I'll see you all next time we had a Sam Knapp stick around previously and there's someone the taunts making some of the corporate fellow title like they just thought it they thought it meant some kind of like marketing executive like oh no like much different yeah

Gadgetory

All Cool Mind-blowing Gadgets You Love in One Place

Primitive Discarding in Vega: Mike Mantor Interview

2017-08-05