AMD's Raja Koduri on Dx12 Performance, GPUOpen, Moore's Law

this is part two of our interview with AMD's Raja Kaduri and SVP & chief architect of the Radeon technologies group in this part of the interview we talked about GPU open the Boltzmann initiative which is pretty interesting for users who interact with CUDA but might want am the hardware and we also talked about how software is a major part of the optimization problem more so than Hardware these days so enjoy the interview for part 1 check the channel we spoke with Raja about shader intrinsics that's also link to the description below along with an article of this interview that contains a large transcripted portion so I will let you get to the interview yeah you mentioned a few moments ago GPU open so let's go into that cuz I know that's a big topic later Andy yeah so the the first question what is book you know so you know what the you know kind of projected I would say a few years ago that with the transition to low-level API right x12 welcome and opening up the you know the underlying guts of the GPU through shader intrinsic Sandahl to get performance out of a system right to get for best performance out of the GPU the best practices you know the best techniques to you know rendered shadows you know do lighting you know do draw trees or whatever right different ways to do that but what is the best way to do that that value-add we figured out that that value add kind of moves into engines right it's basically in the game engines and the games themselves have to figure out kinda they have to do more heavy lifting of figuring out kind of what's the most optimal thing to do the drivers themselves have become very thin writing I can't kind of you know do something super special and said the driver to work around a games and efficiency Android pedal right and we used to do that in like dx11 and before api is where you know when we focus on a particular game and we find that the game isn't doing the most efficient thing for our hardware we used to have pets you know if we call those application profiles for the for each application we said oh you know you could exactly draw the same thing if you you know change the you know particular shader that they have to something else right so we did kind of like manual optimization work within the flavors but with this law or had a pas we can't actually you know we don't touch anything right it's just this API whatever the game passes to us it goes to the hardware store it's nothing you know that we do so we said but how do we you know we have a lot of knowledge in optimization inside AMD and so do our competitors right so we said how do we get all of that knowledge easily accessible to the game engines and game developers so he said you know we have lots of interesting kind of libraries and tools and all inside AMD let's make it accessible to everybody let's put it out in the open so that's why I said you know can you write and and in white developers to contribute as well and kind of build this ecosystem of kind of you know libraries middleware tools and all that are completely open that work on you know not just AMD hardware but what can other people start where to write and and and you know the goal is to kind of make every game and every VR experience get the best out of the hardware right and so that was we've started this portal with that you know with that kind of vision and goal and you know we had a huge collection of libraries and all that we had internally that we put out and it's it's got you know good traction and it also became a good portal for developers to share best practices to so you know recently we had you know some nice blogs from some rheya in there sharing their techniques and all and and you know most more often than not they these blogs have links to source code as well and hey this is how I did this I did that I think that's right that's probably an important discussion point is just this idea that the GPU does obviously all the hardware level work but there's a lot going on in software yes we're just brute force in visual effects or volumetric particle effects or whatever isn't necessarily the best best yeah yeah so you know software is more than 50% if not higher portion of kind of the performance that you see on a system right so you know the GPUs I mean you know yeah we have transistors that kind of wiggle around and do whatever 500 or flops or something but it's like you know if they're not you know proprietary scheduled and appropriately used by by these software techniques you know they are wasted right so that was the intent and also tools right that making developers productive in debugging they're either you know their quality issues or performance issues what we noticed was that we just as a collectively as an industry haven't done a good job of providing a consistent set of tools right and and frankly if I'm a developer right just you know putting a developer hat I don't want to be learning like you know one set of tools for Nvidia one set of tools for AMD another set of tool for Intel right another set of tools when I get onto the game console you know what ends up happening is that they kind of you know don't use anybody's tools right you know relying on just kind of you know printf yeah bugging or like in this stuff so that's not a good place to get a few jobs here so so that was one of our goals with GPU open is that you know hey let me put out our tools as well with full source code our entire tool chain and we want to encourage people to kind of you know help us get their stools working on other hardware as well right now we are not married to our tools we are actually quite open to using other people's tools too or you know some other company wants to contribute a tool and you know put it out there and open we'd be more than happy to kind of pitch in and you know get those tools working because I think the opportunity the industry has right as we make transition to connect you know this immersive experiences we are at kind of get the beginning stage the first year or so of we are where we are going to be in four years is amazing but the performance kind of you know if we just use today's software in today's hardware the performance we we need to be able to support a 16 K by 16 K headset at 120 Hertz is a million times more info totally kind of you know to get to a photo really level right and you're not going to get million times more with Moore's law right I said kind of my goal is you know we need to get there before I'm dead or retired and we're not going to get there by just kind of doing what we what we're doing because the entire software framework need to change the software developers the game developers need to be they need to be a thousand times more productive than their today for it for them to give us the million X you know kind of explain of course hardware will move forward we'd have better hardware and all but you know it may be four times faster in four years rate or may be eight times faster right you know in different segments but not million times faster so but software can make it million times faster right we've seen like you know the amount of wasted computation like frame to frame in a scene is quite high right but that's you know but they need different ways of thinking of generating these pixels right and there is some fascinating work going on in kind of both you know we look at some things but you know game developer so let me get variety of things so this whole VR think has sparked a whole bunch of kind of fundamental research back on computer graphics again on you know am i drawing these things the most efficient way possible right right now will you see actually you know frame to frame you're generating like you know each frame is so complex and at 60 Hertz and it's regenerating the frame or and again and you see what is the difference between this frame and the next frame hmm these hundred pixels but that's more but i but I drew redo the entire right because that's the way the whole kind of you know pipe yeah that works that I guess speaks to just as one of the easiest examples Delta collar compression yes on the memory side yes it's the idea of basically not not absolutely pulling the number for each color yes yeah yeah so yeah I mean that's for memory itself right here so you know Delta color compression is one of the techniques you know we we have in Polaris and alright which it says a ton of memory and because there's so much what you call correlation between you know neighboring pixels or neighboring Texas right in there so yes so Delta color compression is one of the techniques but there are things like that that I can do or like hardware can do in its control but imagine the kind of things that the software can do because they know the context of their scene they know what is changing what's not changing they know what can be reused what can be reused i I can do certain things for them but I'll be guessing right you know they don't need to guess right you know what's good next right I would like one of the classic examples I give is that they know that you know you see in many games especially on load and hardware when you have a big explosion in a scene everything status right that really nothing slow down there because just like massive explosion I say I don't know that there is an explosion coming in the hardware but the game developer knows that there is an explosion right hardware right to say say they they have a mechanism to hint me like same in to hardware or to the drivers I can do the you know I can boost to the max o'clock if I have some o'clock Headroom for example right just for two frames right or three frames or four frames right right you know that won't make make me go beyond my thermal budget or something because I'm staying within thermal budget room you know TDP limit so things like that when the game developers start thinking in those terms they have more juice available in hardware then they get take advantage of here in different APM states or something like that right yeah they can you know they they would go like hey for most of my game I need to be running it I am sorry I really need you know because I'm smooth anyway so don't waste energy don't kind of or heat the graphics card because when I need it which is like this explosion sort of a massive stuff I need you to be there so not kind of all subscribed already and running at peak temperature right that's one example candy I'm just saying that you know when developers start thinking you know performance and there is so much interesting stuff available right right a one one topic I think partnered with GPU open so starting to Scott Watson about this idea of C reservation mm-hm so through audio next was the exactly why he gave where you can reserve some of the seat is just for this function yes you speak more about how that works yes yes yes so you know the interesting thing about the GCN architecture and I think even today it it's the only architecture that's kind of capable of this that you know the first thing is the whole notion of the asynchronous building where you can dispatch a compute task to the DCN engine and it can run asynchronously to whatever else graphics task that's running already right and it kind of it uses the Cu resources that are not fully used by the graphics engine right so it can come in and it can go out without kind of you know halting or pausing anything that's going on now with that kind of in that class of features we also give the ability save that task that is coming in and going out is a real time task like an audio audio you know it's not very intensive but it needs to be real-time it has to happen like and I say you know I submitted audio job it has to finish you know within a prescribed number of milliseconds so we needed an ability for the engine to say that hey no matter what for audio you know you can use all the resources but I need at least once you always available for audio sure okay or two or you know the whatever whatever the task is and so that feature is architected into our hardware to be able to do that kind you know reservation that we are in T's real-time now if you don't need real-time you can select the async compute method you can slide in and slide out but it's not guaranteed because if the graphics engine say is occupying all the see use right which is rare by the way which is completely rare where we have the graphics engine completely all subscribing all see is what we find is there's always one or TCU's you can slide in and get work done and get out but for audio we can take a risk right because you could have something very intense like I said this explosion case for everything and then you don't want your audio to try 50 milliseconds later exactly right it's like I have explosion in the audio it's like you know the fight crackle that you have right so you don't want that so that's why we you know have to audio next uses the concept of CU reservation right I didn't and the API is flexible to kind of give developer a control of you know based on how much load they're putting you know how many see use you know to reserve so then what is the Boltzmann initiative is that what that is or is this a separate no boatsman it's a - it's separate it's late Boltzmann initiator is related to our GPU compute so you know we had a long history and the industry has a long history of kind of Kino the GPU compute API so it open CL is a standard for computing and right and then their proprietary you know initiatives like CUDA and others now the kind of the holy grail for GPU computing always that if you talk to the programmers and the computing world is that they would like GPUs to be programmed kind of directly in all the current tool chain that they use at C C++ Python for whatever you know languages they used to kind of you know do their daily work that that's really what you know they want to do is just get kind of you know here and I have some compute intensive tasks that can benefit from a GPU I should be just able to use from my language not kind of you know then have to learn some new language like OpenCL or something like that right so that was the Holy Grail and that was we were working towards and as we were working to to towards that goal what we discovered was the architecture of you know OpenCL and kind of graphics API is doesn't suit well for supporting you know all these random languages right you know and scripting languages and all we needed a completely different approach and second most of this kind of successful language frameworks whether it's you know poles pythons and they're all and LLVM has become a big compiler infrastructure that everybody is using are all based on open source frameworks and it is really hard to integrate a closed for runtime framework like graphics drivers into all of the stacks so that was kind of the genesis for our boltzmann initiated when we said what if we kind of gonna do a compute stack which is all the way from top to bottom including the kernel mode drivers it's completely open ok and the way the stack is structured that these frameworks can integrate right to any level they choose any level of abstraction that you some frameworks want to go all the way to Mission directly themselves right to the machine code Boltzmann allows that some frameworks want to stay one level above right kind of like you know like a Vulcan level type of share in a equivalent abstraction on computer they give that some frameworks want to go all the way up to kind of high some higher-level language like OpenGL or C C++ extensions or you know some other libraries that we have sitting on top of this we allow that so Boltzmann is the first open compute stack for GPUs right and it it is one of the key steps we took in you know my goal of opening up the GPU let's the GPU was it is a black box for 20 years now right black box abstracted by very thick API really thick runtime so it is a kind of voodoo magic behind there right so we're trying to kind of you know get the voodoo magic out of the GPU you know software stack and we believe that you know there is still you do magic in transistors and how we assemble them and there is voodoo magic in connect you know game engines compute engineers this libraries the middleware and kind in this experience so you know the voodoo magic in this kind of middle driver levels is not actually beneficial to anybody because it's preventing the widespread adoption of GPUs right so if there is it sounds like at least at some level it I guess the we work with Adobe Premiere a lot right in Premiere and these other tools Maya often have OpenCL acceleration in CUDA acceleration one of the things that I was curious about is is there a way to take CUDA code and make it work more efficiently on your hardware I'm glad you asked so you know one of the elements of for kind of Boltzmann initiator was you know a framework in the tool we put in that again fully open sourced called hip and what hip does is exactly what you're asking for it takes CUDA code and you know from sat on Indy Hardware very efficiently okay and and we got actually you know millions and millions of CUDA code converted over with the tool and like the rest of the tools that we put they're completely open-source and it can support other people's hardware as well right so yes so you know we you know we you know we have no religion against you know enabling couldn't really on our hardware cool one of the earlier projects you worked on was DX TC so looking at the I guess the modern equivalents are the the ants the that's that's sort of the ancestor of texture compression mmm-hmm what do you work with today to more efficiently process game graphics we've talked about some of the stuff shader intrinsics mm-hm what else is going on within the G yeah I mean you know now that you remain with that trust nineteen years ago oh my god it's yes DX TC was you know one of the first I'd say kinda keno standardised compression format and you know it's still kind of you know supported in you know almost all hardware and I think it's even in mobile rates from you know from phones to big computers you know compression has evolved in a since but you know the fundamental construct for GPU based texture compression Hardware compression hasn't you know evolved that much radically right it kind of improved in quality you know we got more interesting data types the compression is supporting the first you know instantiation of DX TC was kind of good for you know RGB texture Maps but then we got like you know very interesting other data types like in the normal Maps and you know like maps and residence maps this chance a lot like they changed a lot so the compression I mean we evolved like you know ATI and AMD as well kind of II contribute to the evolution of the xtc2 kind of you know the next-generation formats and other stuff and of course there are new formats and you know higher compression rates that are coming to but I call all of the steps credible if you know evolutionary right from a compression standpoint the you know that evolutionary stuff that the developers asked for would be connect you know every developers dream would be hey if you can sample a texture straight out of like JPEG or something right right you know which has you know much higher compression rates and you didn't get thousand to one hey you know compression depending on the continent right so those are kind of variable rate compression stuff you know those are I mean if you are kind of understand you know hardware Mechanics for compression and decompression and all you know that sounds good on paper but you know it costs kind of you know just the decompression hardware to go at the rate at which say the HDD compressor goes it'll be a chip that's bigger than the entire GPU just to do kind of you know big completion right just so you understand the reason why we don't do it it's not because you know we don't know how to do it we can it just kinda cost you know and are going to like to do that but I think there is a happier medium again in collaboration at the level of course that is in between right between you know giving you that benefits of kind of you know JPEG like compression and the speed of kind of DX T class algorithms right it's like how do you kinda you know connect them how do you marry them in a better way so that all my assets from you know my authoring time to my you know your download time to coming onto the computer to you know into the GPU memory if it can stay compressed all the way to the GPU memory kind of the whole experience speeds up right you know like gee you know one thing that I say you know Hardware got so much faster but man the game loading time still is the same from whatever the last 15 years I mean if you have an SSD or something it's you know yeah that's a hardware solution to a software problem right and there but it's like takes you know loading time is you know and and with increasingly attention-deficit population on devices like this things start instantly I think games needs to start instantly right and and so I think there is a happy happy medium there that you know you will see I think industry together solving it in the next kind of you know or next forty years is that how do we make make this terabyte Triple A game titles just load instantly yeah that'd be awesome yeah I way down the road something like the SSD is kind of interesting to you for that I mean that's actually the you know I'm glad you brought up SSD because that's that's one of the you know driving factors even though you know we have positioned it for first for kind of professional drivers and others right you know you can kinda canna see the see the path there and you know the usefulness for you know gaming users send other stuff do ya peel off lands from the CPU or whatever there's SSD yeah yeah well lots of information as hoist we will have a recap of this in the article linked in the description below Raja thank you for joining me thank you Steve my pleasure we'll see you all next time okay thank you

Gadgetory

All Cool Mind-blowing Gadgets You Love in One Place

AMD's Raja Koduri on Dx12 Performance, GPUOpen, Moore's Law

2016-10-22