Sometimes, I’m becoming really pestered by the number of issues, especially hardware issues, that remain prominent in the Linux system after so many years. And I’m convinced that one of the root causes is the kernel development model. While far to be the only one, you understand a lot when you have a look at it.
The stable API “nonsense”
To explain the development model, the kernel documentation contains a document written by Greg Kroah-Hartman called stable_api_nonsense.txt. I think this document gets one thing very right: advertising it is complete nonsense.
Many things in this document can get you a good laugh if you are used to software development, unfortunately they are not funny given how much they affect us as users and developers. Let’s start with the executive summary:
You think you want a stable kernel interface, but you really do not, and you don't even know it. What you want is a stable running driver, and you get that only if your driver is in the main kernel tree. You also get lots of other good benefits if your driver is in the main kernel tree, all of which has made Linux into such a strong, stable, and mature operating system which is the reason you are using it in the first place.
What it should say instead is:
You know you want a stable kernel interface, but you don't have it, and we will never provide it. You want stable interfaces to focus on fixing bugs in your driver instead of updating it for interface changes, and to make integration of your driver in the main kernel tree easy. You would get a lot of good benefits if your driver was in the main kernel tree, but it won't make it unless you adapt to our bizarre processes, all of which have made the Linux kernel into a constantly moving target which is the reason many hardware vendors don't want to support it in the first place.
Our compiler does not have the same ABI as yours
This is what happens when the only thing that interests people in charge is to code. Writing this means that these developers have no interest in making the system usable. This becomes more unbearable when such technically sharp people make up false technical arguments:
Depending on the version of the C compiler you use, different kernel data structures will contain different alignment of structures, and possibly include different functions in different ways (putting functions inline or not.)
Really, I wonder how the Microsoft developers, the GNOME guys, the glibc guys, and basically all library developers who know what they are talking about, manage to have stable ABIs despite the compiler changing all the time. One of the incredibly complicated techniques they use is to only rely on guaranteed functionality of the C specification, instead of doing many clever and funky hacks. For things more related to development itself, there are also incredibly complicated techniques like opaque structures, which save a lot of ABI breakage. Actually the only thing you need, if you design your ABI properly, is to be rigorous.
Depending on what kernel build options you select, a wide range of different things can be assumed by the kernel
This statement of a problem also contains the solution: stop making available build options no one cares about (remember, people install binary packages from their distributions) to guarantee stability instead. Instead of implementing the solution, kernel developers deliberately choose to deal with this insanity in the worst possible way: by encouraging it.
Only lame developers need stable APIs
But the nonsense doesn't stop to the ABI, which after all is merely a problem for distributors. To be more annoying to developers themselves, let’s make up reasons to break the API all the time:
Linux kernel development is continuous and at a rapid pace, never stopping to slow down. As such, the kernel developers find bugs in current interfaces, or figure out a better way to do things.
Explained this way, it looks like a good thing. If you've done serious development or integration, you know that actually it is a problem. If you never slow down to look at what you've done, you're only going to add new bugs while you're trying to fix the others.
When they do so, function names may change, structures may grow or shrink, and function parameters may be reworked. If this happens, all of the instances of where this interface is used within the kernel are fixed up at the same time, ensuring that everything continues to work properly.
Everything is said. In the Linux kernel, changes are not done in a continuous way. Every non-minor change is going to have an impact on hundreds of different modules. The amount of work needed to accomplish these changes is absolutely insane. When a project like GNOME decides to phase-out an API, it takes several years before being actually replaced. In the kernel, it can simply happen between two minor releases, together with a rewrite of the whole ATA stack.
The lolution
Of course, Greg KH has a simple solution for you, in the “What to do” section.
So, if you have a Linux kernel driver that is not in the main kernel tree, what are you, a developer, supposed to do? Releasing a binary driver for every different kernel version for every distribution is a nightmare, and trying to keep up with an ever changing kernel interface is also a rough job.
No shit !
Simple, get your kernel driver into the main kernel tree.
Simple, isn't it? Unfortunately this is not going to happen while you are too busy adapting it for the constantly moving interfaces. This is quite feasible if you are writing a driver for an Ethernet network adapter, but for a video card this is another story, and for a virtualization layer? Well, the Xen developers have been struggling for years to integrate their technology in the kernel, and it is still far from done. No wonder why you get good drivers only for ethernet cards and not for 3D cards or even wifi chips.
This doesn’t go without other, worse consequences for users and distributors. Not everyone can afford to run Ubuntu, Debian unstable or Fedora, especially if they have simple requirements like not changing the whole system every 6 months. Corporate users using Debian stable, RHEL, SLES or Ubuntu LTS need a stable kernel that doesn’t move, and that doesn’t break everything when upgraded.
The introduction of projects like DKMS to distribute drivers in a sourceful way while keeping them easily installable is a step in the good direction, despite being a workaround for a broken situation underneath. However these efforts are ruined by the constantly changing APIs that will go on requiring changes to the sources as well as the binaries.
Why it can still work
In fact, with such processes, you can wonder how it is possible that the system is actually usable. The answer is simple: while being smaller than many other projects, the Linux kernel has much, much more developers.
A rough estimation shows that Linux has 2000 developers, many of which are paid full-time to work on it, to work on a 15 Mloc codebase. You can compare it with X.org, which has a few dozens of developers for 3 Mloc. Or with GNOME, 400 developers for 20 Mloc. With OOo, 150 developers for 10 Mloc.
Yes, you need 10 times as many people to maintain the kernel code than for another comparable project. That's of course what happens when you keep them busy with a permanent refactoring of the code.
This development method, while certainly having the advantage of bringing lots of innovation and fast integration of new features, is also driving away a very large portion of the open source community to work on rewriting Firewire stacks every 6 months. And as long as they are busy with this, they are not writing correct X drivers nor fixing bugs in the applications.
