Today NVidia announced that they are releasing an open source kernel driver for their GPUs, so I want to share with you some background information and how this will impact Linux graphics and compute going forward.
One thing many people are not aware of is that Red Hat is the only Linux OS company who has a strong presence in the Linux compute and graphics engineering space. There are of course a lot of other people working in the space too, like engineers working for Intel, AMD and NVidia or people working for consultancy companies like Collabora or individual community members, but Red Hat as an OS integration company has been very active on trying to ensure we have a maintainable and shared upstream open source stack. This engineering presence is also what has allowed us to move important technologies forward, like getting hiDPI support for Linux some years ago, or working with NVidia to get glvnd implemented to remove a pain point for our users when it came to the NVidia driver and Mesa fighting over the OpenGL driver .so file. We see ourselves as the open source community’s partner here, fighting to keep the linux graphics stack coherent and maintainable and as a partner for the hardware OEMs to work with when they need help pushing major new initiatives around GPUs for Linux forward. And as the only linux vendor with a significant engineering footprint in GPUs we have been working closely with NVidia for a couple of years now trying to help prepare the ground for NVidia moving to a model with an open source kernel driver. An effort that has now borne fruits in terms of todays announcement from NVidia about releasing an out of tree kernel driver for their GPU. People like Kevin Martin, the manager for our GPU technologies team, Ben Skeggs the maintainer of Nouveau and Dave Airlie, the upstream kernel maintainer for the graphics subsystem, Nouveau contributor Karol Herbst and our accelerator lead Tom Rix have all taken part in meetings, code reviews and discussions on how to make this happen with NVidia over the last Month. So let me talk a little about what this release means (and also what it doesn’t mean) and what we hope to see come out of this long term.
First of all, what is in this new driver?
What has been released is an out of tree source code kernel driver which has been tested to support CUDA usecases on datacenter GPUs. There is code in there to support display, but it is not complete or fully tested yet. Also this is only the kernel part, a big part of a modern graphics driver are to be found in the firmware and userspace components and those are still closed source. But it does mean we have a NVidia kernel driver now that will start being able to consume the GPL-only APIs in the linux kernel, although this initial release doesn’t consume any APIs the old driver wasn’t already using. The driver also only supports NVidia Turing chip GPUs and newer, which means it is not targeting GPUs from before 2018. So for the average Linux desktop user, while this is a great first step and hopefully a sign of what is to come, it is not something you are going to start using tomorrow.
What does it mean for the NVidia binary driver?
Not too much immediately. This binary kernel driver will continue to be needed for older pre-Turing NVidia GPUs and until the open source kernel module is full tested and extended for display usecases you are likely to continue using it for your system even if you are on Turing or newer. Also as mentioned above a big chunk of the driver are to be found in the firmware and userspace bits and they are going to continue to be around even once the open source kernel driver is fully capable.
What does it mean for Nouveau?
Nouveau is the in-kernel graphics driver for NVidia GPUs today. It is fully functional, but is severely hampered by not having had the ability to for instance re-clock the NVidia, meaning that it can’t give you full performance like the binary driver can. So what does this new driver mean for Nouveau? Once again very little initially, but a lot in the long run. To give a little background first. The linux kernel does not allow multiple drivers for the same hardware, so in order for a new NVidia kernel driver to go in the current one will have to go out or at least be limited to a different set of hardware. The current one is of course Nouveau. And also just like the binary driver a big chunk of Nouveau is not in the kernel, but are the userspace pieces found in Mesa and the Nouveau specific firmware that NVidia currently releases. So regardless of the long term effort to create a new open source in-tree kernel driver based on this new open source driver for NVidia hardware, Nouveau will be staying around to support pre-turing hardware.
So the plan we are working towards from our side, but which is likely to take a few years to come to full fruition, is to come up with a way for the NVidia binary driver and Mesa to share a kernel driver. The details of how we will do that is something we are still working on and discussing with our friends at NVidia, but it is likely to be a brand new driver designed to address both the needs of the NVidia userspace and the needs of the Mesa userspace. Along with that evolution we hope to work with NVidia engineers to refactor the userspace bits of Mesa that are now targeting just Nouveau to be able to interact with this new kernel driver and also work so that the binary driver and Nouveau can share the same firmware. This has clear advantages for both the open source community and the binary driver. For the open source community it means that we will now have a kernel driver and firmware that allows things changing the clocking of the GPU to provide the kind of performance people expect from the NVidia graphics card and it means that we will have an open source driver that will have access to the firmware and kernel updates from day one for new generations of NVidia hardware. For the ‘binary’ driver it means as stated above that it can start taking advantage of the GPL-only APIs in the kernel, distros can ship it and enable secure boot, and it gets an open source consumer of its kernel driver allowing it to go upstream.
If this new shared kernel driver will be known as Nouveau or something completely different is still an open question, and of course it happening at all depends on if we and the rest of the open source community and NVidia are able to find a path together to make it happen.
What does this release mean for linux distributions like Fedora and RHEL?
In the immediate near term it will not have a major impact. But over time it provides a pathway to radically simplify supporting NVidia hardware due to the opportunities discussed elsewhere in this document. Long term we will hope be able to get a similar experience with NVidia hardware that that we today can offer for Intel and AMD hardware, in terms out of box functionality. Which means day 1 support for new chipsets, a high performance open source Mesa driver for NVidia and it will allow us to sign the Nvidia driver alongside the rest of the kernel to enable things like secureboot support. Since this first release is targeting compute one can expect that these options will first be available for compute users and then graphics at a later time.
What are the next steps
Well there is a lot of work to do here. NVidia need to continue the effort to make this new driver feature complete for both Compute and Graphics Display usecases, we need to work together to come up with a plan for what the future unified kernel driver can look like and a model around it that works for both the community and NVidia, we need to add things like a Mesa Vulkan driver, similar to how we have RADV for AMD. We at Red Hat will be playing an active part in this work as the only Linux vendor with the capacity to do so and we will also work to ensure that the wider open source community has a chance to participate fully like we do for all open source efforts we are part of.
If you want to hear more about this I did talk with Chris Fisher and Linux Action News about this topic. Note: I did state some timelines in that interview which I didn’t make clear was my guesstimates and not in any form official NVidia timelines, so apologize for the confusion.