?

Log in

No account? Create an account
 
 
20 October 2008 @ 02:07 pm
A little rant on the Linux kernel development model  

Sometimes, I’m becoming really pestered by the number of issues, especially hardware issues, that remain prominent in the Linux system after so many years. And I’m convinced that one of the root causes is the kernel development model. While far to be the only one, you understand a lot when you have a look at it.

The stable API “nonsense”

To explain the development model, the kernel documentation contains a document written by Greg Kroah-Hartman called stable_api_nonsense.txt. I think this document gets one thing very right: advertising it is complete nonsense.

Many things in this document can get you a good laugh if you are used to software development, unfortunately they are not funny given how much they affect us as users and developers. Let’s start with the executive summary:

You think you want a stable kernel interface, but you really do not, and you don't even know it. What you want is a stable running driver, and you get that only if your driver is in the main kernel tree. You also get lots of other good benefits if your driver is in the main kernel tree, all of which has made Linux into such a strong, stable, and mature operating system which is the reason you are using it in the first place.

What it should say instead is:

You know you want a stable kernel interface, but you don't have it, and we will never provide it. You want stable interfaces to focus on fixing bugs in your driver instead of updating it for interface changes, and to make integration of your driver in the main kernel tree easy. You would get a lot of good benefits if your driver was in the main kernel tree, but it won't make it unless you adapt to our bizarre processes, all of which have made the Linux kernel into a constantly moving target which is the reason many hardware vendors don't want to support it in the first place.

Our compiler does not have the same ABI as yours

This is what happens when the only thing that interests people in charge is to code. Writing this means that these developers have no interest in making the system usable. This becomes more unbearable when such technically sharp people make up false technical arguments:

Depending on the version of the C compiler you use, different kernel data structures will contain different alignment of structures, and possibly include different functions in different ways (putting functions inline or not.)

Really, I wonder how the Microsoft developers, the GNOME guys, the glibc guys, and basically all library developers who know what they are talking about, manage to have stable ABIs despite the compiler changing all the time. One of the incredibly complicated techniques they use is to only rely on guaranteed functionality of the C specification, instead of doing many clever and funky hacks. For things more related to development itself, there are also incredibly complicated techniques like opaque structures, which save a lot of ABI breakage. Actually the only thing you need, if you design your ABI properly, is to be rigorous.

Depending on what kernel build options you select, a wide range of different things can be assumed by the kernel

This statement of a problem also contains the solution: stop making available build options no one cares about (remember, people install binary packages from their distributions) to guarantee stability instead. Instead of implementing the solution, kernel developers deliberately choose to deal with this insanity in the worst possible way: by encouraging it.

Only lame developers need stable APIs

But the nonsense doesn't stop to the ABI, which after all is merely a problem for distributors. To be more annoying to developers themselves, let’s make up reasons to break the API all the time:

Linux kernel development is continuous and at a rapid pace, never stopping to slow down. As such, the kernel developers find bugs in current interfaces, or figure out a better way to do things.

Explained this way, it looks like a good thing. If you've done serious development or integration, you know that actually it is a problem. If you never slow down to look at what you've done, you're only going to add new bugs while you're trying to fix the others.

When they do so, function names may change, structures may grow or shrink, and function parameters may be reworked. If this happens, all of the instances of where this interface is used within the kernel are fixed up at the same time, ensuring that everything continues to work properly.

Everything is said. In the Linux kernel, changes are not done in a continuous way. Every non-minor change is going to have an impact on hundreds of different modules. The amount of work needed to accomplish these changes is absolutely insane. When a project like GNOME decides to phase-out an API, it takes several years before being actually replaced. In the kernel, it can simply happen between two minor releases, together with a rewrite of the whole ATA stack.

The lolution

Of course, Greg KH has a simple solution for you, in the “What to do” section.

So, if you have a Linux kernel driver that is not in the main kernel tree, what are you, a developer, supposed to do? Releasing a binary driver for every different kernel version for every distribution is a nightmare, and trying to keep up with an ever changing kernel interface is also a rough job.

No shit !

Simple, get your kernel driver into the main kernel tree.

Simple, isn't it? Unfortunately this is not going to happen while you are too busy adapting it for the constantly moving interfaces. This is quite feasible if you are writing a driver for an Ethernet network adapter, but for a video card this is another story, and for a virtualization layer? Well, the Xen developers have been struggling for years to integrate their technology in the kernel, and it is still far from done. No wonder why you get good drivers only for ethernet cards and not for 3D cards or even wifi chips.

This doesn’t go without other, worse consequences for users and distributors. Not everyone can afford to run Ubuntu, Debian unstable or Fedora, especially if they have simple requirements like not changing the whole system every 6 months. Corporate users using Debian stable, RHEL, SLES or Ubuntu LTS need a stable kernel that doesn’t move, and that doesn’t break everything when upgraded.

The introduction of projects like DKMS to distribute drivers in a sourceful way while keeping them easily installable is a step in the good direction, despite being a workaround for a broken situation underneath. However these efforts are ruined by the constantly changing APIs that will go on requiring changes to the sources as well as the binaries.

Why it can still work

In fact, with such processes, you can wonder how it is possible that the system is actually usable. The answer is simple: while being smaller than many other projects, the Linux kernel has much, much more developers.

A rough estimation shows that Linux has 2000 developers, many of which are paid full-time to work on it, to work on a 15 Mloc codebase. You can compare it with X.org, which has a few dozens of developers for 3 Mloc. Or with GNOME, 400 developers for 20 Mloc. With OOo, 150 developers for 10 Mloc.

Yes, you need 10 times as many people to maintain the kernel code than for another comparable project. That's of course what happens when you keep them busy with a permanent refactoring of the code.

This development method, while certainly having the advantage of bringing lots of innovation and fast integration of new features, is also driving away a very large portion of the open source community to work on rewriting Firewire stacks every 6 months. And as long as they are busy with this, they are not writing correct X drivers nor fixing bugs in the applications.

 
 
 
(Anonymous) on October 20th, 2008 12:58 pm (UTC)
Thanks !
Hi Joss !

Having tried to maintain an external module, I've always felt what you describe here.

Similar issues happen in various project, where management is too developper oriented. I can think, of course, of ffmpeg for instance, which never managed to release stable APIs, and use this awfull "static link and changelog from CVS" policy...

But the issue is that there are alternatives to ffmpeg, or at least they are possible. But what alternative for the linux kernel ? Hurd ? A fork ? Not realistic...

Toots
(Anonymous) on October 20th, 2008 01:00 pm (UTC)
good wifi driver
You should not make completely free statement like those related to wifi drivers.
I have an ibook, and thanks to a driver integrated into the kernel and made WITHOUT any specs by very good people I have a really good wifi driver.
I am also sure that if the kernel development model was bad, paid people could have deal with it more easily than volunteered ones.
The solution is Broadcom never intended to develop a driver, because they do not think they will earn more money by doing so.
(Anonymous) on October 20th, 2008 03:44 pm (UTC)
Re: good wifi driver
Yes, bc52 managed to get into the kernel, but it also suffered the kernel development model while it was out of tree. The same for ralink drivers. I'm quite sure both development teams had a fun ride chasing the kernel while writing these drivers.
Re: good wifi driver - (Anonymous) on October 20th, 2008 04:08 pm (UTC) (Expand)
Re: good wifi driver - (Anonymous) on October 20th, 2008 04:12 pm (UTC) (Expand)
Re: good wifi driver - (Anonymous) on October 30th, 2008 07:42 pm (UTC) (Expand)
(Anonymous) on October 20th, 2008 01:43 pm (UTC)
I like it!
In the end every driver should be open source and in the kernel. And the kernel should get new features really fast.

At least that is what I would like to see.

This development model really forces people to open source their drivers (see all server hardware vendors and ATI, Atheros etc.).

If we had a stable api we would maybe have new drivers for older distros (That backporting is now enterprise distro work) and easier to maintain out of free drivers (I don't care for those TBH). But we would have way less open source drivers and way more buggy closed source drivers that usually suck really bad. Just look at the stats about the closed source Nvidia drivers or Windows.
np237np237 on October 20th, 2008 01:53 pm (UTC)
Re: I like it!
This development model really forces people to open source their drivers (see all server hardware vendors and ATI, Atheros etc.).


This is simply not true. This drives hardware vendors away from Linux instead of incitating them to develop drivers. It never convinced ATI until the management changed, and Atheros users just had to use a second-grade driver for years.

If you want to force drivers to be free, you just need to start enforcing the GPL. Stop being laxist with nVidia and force them, legally, to comply with the copyleft.
Re: I like it! - (Anonymous) on October 20th, 2008 02:08 pm (UTC) (Expand)
Re: I like it! - np237 on October 20th, 2008 02:25 pm (UTC) (Expand)
Re: I like it! - (Anonymous) on October 20th, 2008 02:41 pm (UTC) (Expand)
Re: I like it! - np237 on October 20th, 2008 02:47 pm (UTC) (Expand)
Re: I like it! - (Anonymous) on October 20th, 2008 03:09 pm (UTC) (Expand)
Re: I like it! - (Anonymous) on October 20th, 2008 02:50 pm (UTC) (Expand)
Re: I like it! - schmexygeek on October 20th, 2008 10:33 pm (UTC) (Expand)
Another layer of complexity - (Anonymous) on October 24th, 2008 08:17 am (UTC) (Expand)
Re: Another layer of complexity - schmexygeek on October 24th, 2008 12:41 pm (UTC) (Expand)
Re: Another layer of complexity - np237 on October 24th, 2008 05:05 pm (UTC) (Expand)
(Anonymous) on October 20th, 2008 03:21 pm (UTC)
I guess you should now try to explain another article ao-authored by the same author:

http://www.linuxfoundation.org/publications/linuxkerneldevelopment.php

Interestingly enough it shows that more and more people and companies get involved in the kernel as time goes by.

Please compare that to the ammount of hardware supported by Solaris. It must have accumulated tons of supported hardware with its stable APIs over the last ten years, right? How easy is it for you to get drivfers for Solaris?
(Anonymous) on October 20th, 2008 08:56 pm (UTC)
Developer head count
Developer head count is not a good comparison, as I'm sure you realize.

The last kernel developer I exchanged emails with worked for a specific hardware vendor and did some of the drivers for that vendors hardware. As that increases the kernel will end up with one or more developers per vendor, and this is a good thing I think.

X might be a fair comparison, but almost invariably the conversation goes "I have problem with card X", "I don't have card X can you run test Y, and compile in feature Z, and disable XaaFooBar, and wibble wobble gobbledy gook". So I think X has rather more developers, most of them unskilled than the headline figure.

Although I have had a couple of drivers break in the last few years, because the kernel developers moved on. In one case the vendors developers messed up, in the other the vendor went bust and no one seemed to care after than. So I don't know if it is simply the development model, since I suspect I'd be in the same boat if the same thing happened in the proprietary world.

In the Windows world the OEM process usually irons out oddities, and the volume means that the vendors pressure the supplier for good component drivers (and DELL still manage to ship drivers that regularly blue screen XP, and Microsoft shipped core drivers with stupid flaws in). This OEM pressure doesn't happen as much in the Linux world, although you see a bit in the server market with Redhat Enterprise, a lot of the server class hardware drivers aren't exactly what I'd call enterprise class. On the other hand the results are better than Microsoft, and Linux supports more classes of hardware so something is working.
gravityboygravityboy on October 21st, 2008 03:53 pm (UTC)
Re: Developer head count
No, X really doesn't have more developers than that. If you want to count mesa and DRI, then yeah, there's a couple more who don't overlap with X, but it really is under 50 people. That's two orders of magnitude less than the kernel.
Grok McTanysgrok_mctanys on October 20th, 2008 09:33 pm (UTC)
Grok McTanysgrok_mctanys on October 21st, 2008 06:36 am (UTC)
While that gives an outline of the possible dangers of a stable ABI and prolific binary modules, another article on the benefits of the current system is Greg K-H's 2006 OLS keynote speech Myths, Lies, and Truths about the Linux kernel.

In short, even in 2006 Linux supported more devices, on more architectures, than any other OS ever has. Yes, there are some gaps on a small range of high-profile consumer devices (3d graphics cards, wifi) which manufacturers are particularly secretive about, but the Linux model works better for pretty much every other type of device ever made than any other development model ever has.
Grok McTanysgrok_mctanys on October 21st, 2008 11:13 am (UTC)
"Really, I wonder how the Microsoft developers, the GNOME guys, the glibc guys, and basically all library developers who know what they are talking about, manage to have stable ABIs despite the compiler changing all the time."

Well, that depends on which ABI you're talking about. If you're talking about the public ABI, which is the one exposed to other applications, then yes they do, but then so does the kernel. As Greg points out in that very document the public kernel ABI, the kernel<->userspace ABI is completely stable.

If you're talking about internal ABIs, in the interfaces between different parts of the glibc internals, or the Gnome internals, then in those projects, just like in the kernel, there are no guarantees. They are free to move those internal ABIs around if they choose, and anyone with external code not in the main tree that relies on those ABIs gets no sympathy from anyone if it suddenly breaks between even minor version upgrades.

Drivers are part of the kernel. They rely on APIs/ABIs yes, but those are internal. Internal APIs/ABIs are always subject to change without warning in any project. If you've got code that messes with or relies on those internals, the only way to make sure its kept up-to-date is to get it submitted upstream.
np237np237 on October 24th, 2008 07:55 am (UTC)
I guess I should comment on these two documents as well.

Linux in a binary world is absolutely non-seriously and full of bullshit. It is showing what would happen if the kernel started allowing (legally) binary drivers without doing anything (technically) to support them. Of course it is a doomsday scenario, and you should note that I’m suggesting the exact opposite: disallowing seriously binary drivers (including the nvidia ones) while providing a decent driver interface.

As for Myth, lies and truths about the Linux kernel, I have addressed most of its contents in my post, since it only re-tells the stable_api_nonsense document in a less technical way.
- It’s good to support more devices than any other operating system, but you need to support 100% of the hardware that is being sold right now, not legacy hardware everyone has dumped to the garbage long ago.
- Rewriting USB stacks is fine. What is not is having zero management of the kernel-driver ABI. The kind of stuff that healthy open source projects started dealing with very long ago.
- By saying closed-source modules are illegal without actually suing people distributing them, the kernel developers are weakening their copyright and as such risk to lose it completely, should a serious case of GPL violation appear.
- And worst of all, Greg KH assumes all the time that a stable ABI would only be useful for proprietary drivers, which is a total fallacy. Look at the number of GPL’ed out-of-tree drivers and at the number of proprietary ones, and think about it for a minute.
(no subject) - grok_mctanys on October 24th, 2008 08:34 am (UTC) (Expand)
(no subject) - np237 on October 24th, 2008 08:52 am (UTC) (Expand)
(no subject) - grok_mctanys on October 24th, 2008 11:05 am (UTC) (Expand)
(no subject) - np237 on October 24th, 2008 12:07 pm (UTC) (Expand)
(no subject) - (Anonymous) on October 24th, 2008 02:46 pm (UTC) (Expand)
(no subject) - np237 on October 24th, 2008 02:55 pm (UTC) (Expand)
(no subject) - (Anonymous) on October 25th, 2008 12:42 pm (UTC) (Expand)
ext_128912 on October 21st, 2008 03:46 am (UTC)
comparison
So, what kernel projects get it right?

The Linux developers claim that their processes get better quality than other alternatives. The fact that they get a better range of drivers than any of the other free or semi-free OS kernels seems obvious. Comparing reliability of the OS to Windows and other proprietary systems is an obvious win for Linux, although I'm not sure how much of that is due to the kernel and how much of it is due to Linux continuing to run after things like X servers crash.

I think this would be a much better post if you have given a reference for another OS doing it right to illustrate each point.
np237np237 on October 21st, 2008 08:57 am (UTC)
Re: comparison
I don’t think any project is perfect, especially in this matter. Windows is doing pretty well for hardware support and binary drivers, but OTOH its backward compatibility is a heavy burden and it prevents innovation in the kernel. Other OSes are far from doing as well as either Linux or Windows, or do not have the same constraints (for MacOS), so it’s hard to see them as models.

I think a better balance has to be found between the two rather extreme positions that exist today. It is not unacceptable to break ABIs and APIs from time to time. It is unacceptable to break them at every minor release and without any kind of warning and deprecation process. Take inspiration in projects that are healthy and do not have these ABI instability issues: GNOME, Python or KDE would be a good start.
Re: comparison - (Anonymous) on October 23rd, 2008 06:05 pm (UTC) (Expand)
Re: comparison - np237 on October 24th, 2008 07:43 am (UTC) (Expand)
Re: comparison - (Anonymous) on October 31st, 2008 07:36 am (UTC) (Expand)
Re: comparison - np237 on October 31st, 2008 10:28 am (UTC) (Expand)
(Anonymous) on October 22nd, 2008 06:56 am (UTC)
Large parts of Xen failed to get in for the simple reason that they did development in isolation for years before they ever tried to send their changes upstream. Lo and behold, when they tried to push it into the kernel, the kernel community wanted changes made so it would fit in better (for instance, you might actually want support for *other* virtualization software too). Had they worked with the kernel community from the beginning, they would have figured out the right architecture from the beginning, and ended up with code they could submit upstream with minimal trouble.