Mere Code

Diverse Topics of General Interest to the Practicing Programmer

I don't want to talk about documentation

Most of the time, I don’t like talking about documentation. I particularly don’t like talking about documenting code. There’s not much to say.

I do like talking about testing though, since I enjoy writing code TDD, since tests for big, big, big programs are really hard and because most everyone else is doing it wrong.

Thing is, every time I talk about testing in the Python world, I have to talk about documentation because someone always brings up doctest and thinks it’s a good way of testing code. It’s not. Andrew has explained clearly the problems with the principle and with the implementation. He concludes that the only thing that doctest is any good for is writing self-testing documentation about Python code, and I agree.

However, I humbly suggest that for many projects, this is a solution in search of a problem. Which means I am going to have to talk about documenting code. Before that, a short plea.

Please stop talking about documentation and testing at the same time! They are both actually quite tricky, and you can never, ever, ever effectively address both of them with the same initiative. They are different! Just stop it!

OK, let’s talk about documenting code.

Why bother?

Before you even begin to talk about the best way to document your code, you must seriously consider why you are even bothering.

Time spent documenting code is time not spent fixing bugs. Instead of writing docs you could be talking to users, making your software internationalizable, improving its website, improving the user documentation, making the test suite run faster or any number of things that directly help your end users or your existing developers.

You might want to document your code as part of an initiative to get more contributors, or to make your existing contributors’ lives happier. If so, great, but make sure that documenting code will actually achieve these goals.

If not, take pride in your lack of code documentation! It is the direct fruit of you doing better things with your life. Stand up, walk out the door and skip down the street, clutch the first suit-wearing stranger you see by his lapels and shout “My code is under-documented, yippee ki-yay!”

More seriously, know why you are documenting your code, don’t just do it out of guilt, and don’t feel guilty if your code is under-documented while your users are many and happy.

Guiding principles

Audience and benefit
Do not even bother to write a document unless you have an audience in mind and a clear benefit in mind for what they’ll get out of reading this document. And no, “help them understand the branch puller XML-RPC API” is not a clear benefit.

As an example, I’m writing this blog post primarily for Python programmers at work and in the open source projects I care about. My aim is to convince them to be silent about doctest when we’re talking about testing and to see the whole picture when talking about documentation so that they’ll have good unit tests and won’t misdirect energy toward inappropriate documentation. I have a secondary aim of learning where I’m wrong by reading the comments.

Clear code
If someone is reading documentation that’s about code, then they can probably read code. You can probably save everyone a lot of trouble by picking better names, adding a couple of docstrings, fixing the bits you’re embarrassed by and deleting the crap that you don’t need.

To put it another way, when people say “this needs documentation” they often mean “I don’t understand this” (similarly, “we have a communication problem” often means “you are not doing what I want”). The best way to help them is not necessarily to write documentation.

Value is in the output
Documentation that’s not being read is worthless and probably incorrect, much like code that is not being executed. Documentation that cannot be found cannot be read. How is your audience going to find your documentation? Is it going to be in a format they like to read? Don’t bother writing anything until you’ve figured this out.

Different approaches

No documentation, just code
Some people believe that no human language text should ever sully their code base. There are plenty of good sentiments behind this idea: source code is a powerful tool for describing how to think about a problem; textual documentation about code frequently goes out of date and it’s often used as a crutch for bad code.

Personally, I think it’s a bad idea to have no documentation. Even the best coders read good prose faster than good code, and text has a wonderful power of summary that code lacks. Sometimes it’s impossible to communicate the intent of the code in the code itself (for example, you might be working around a POSIX insanity). Nothing wrong with using a crutch when your leg is broken.

API reference documentation
Instead of having no documentation, you can use Python’s docstring feature to add a mini-document describing a class or function. This docstring can tell you how to use it, what to expect from it, and most importantly, why you should care. Because Python functions don’t have explicit type declarations, these docstrings can be very useful (is that branch parameter a Bazaar branch, a Launchpad IBranch object or the URL for a branch?). Also, because the docstring is so close to the code, they are much less likely to be out of date or incorrect.

Specifications
Some people like having specifications as part of the documentation for their code. I haven’t really seen this in practice, so I can’t comment much. I can say that I find good comments on unit tests extremely helpful, and now almost always write such a comment before I write the test.

Guides, tutorials and howtos
Rather than consulting a reference, you sometimes want to be guided through a task or to be introduced to some new area of the problem domain. In these cases, it’s pretty hard to beat a solid chunk of prose with some code examples. It’s here that doctest shines, since it’s quick to write, can be rendered nicely and can be executed to guarantee the code is not hopelessly wrong.

Summary

Code documentation is not intrinsically valuable. It has no value unless you give careful thought to why you want to do it and how it is going to connect you to your audience. Once you’ve done that, prose documentation can be very helpful, but you can also get a lot of the same benefits by cleaning up your code base.

Doctest is neither necessary nor sufficient for good code documentation. Do not use it simply because it is there. Use it when it fits.

Now, please can we go back to talking about testing?

Comments

mbp on 2010-05-25 04:14
I agree. If documentation is helping you delight your customers, then add it; otherwise you are wasting your time and should feel ashamed rather than proud.

So documentation could help you satisfy your customers by

1- directly helping them: developers are attracted to the Apple platform because they feel it has good consistent developer-oriented documentation, whereas Ubuntu's tends to be inward-looking and patchy

2- making you more effective, by letting people get into the code or letting them find conceptual bugs in the course of writing docs

3- drawing in new contributors: perhaps the best way to satisfy this is to update the documentation when a would-be contributor says "how do I…?" or "what is the …?"

So in Launchpad, perhaps the most useful kind of technical documentation is external API docs.
jkakar on 2010-05-01 12:12
Code without a documented API is incomplete, in the same way that it is without tests. Documenting the tests clearly is also important. Like you, I typically write the docstring for a test before I write it. On the topic of API documentation, I far too often see comments that describe what the code does, instead of what its purpose is. I can read the code to see what it does, but often that doesn't help me understand what its purpose is. Writing good API documentation is a skill.

At a previous job it was mandated that a specification must be written before doing any coding. It was ridiculous, because it wasn't really about discovering the domain, but about documenting the classes you would be implementing. These specifications were expected to include UML diagrams to show class hierarchies, the sequence of calls that would be made to satisfy (very poorly identified) use cases. It was horrible. In the beginning I tried to comply but quickly realized that this was a complete waste of time: it was super boring and the code I ended up writing never matched the specification.

I think this kind of documentation about code is pointless. It's another artifact to maintain and inevitably it gets out of date. That said, I do find high-level documentation useful. In many cases these are diagrams. For example, an architecture diagram showing, at a high-level, the components in a system and how they interact can be very enlightening. It's often that high-level perspective that is hard to determine by reading code. Also, documentation that describes user stories and provides context about the domain can be useful. In most cases I find a list of 'As a X I want to Y so that Z' stories adequate. I just want something to help me get into the problem space, I don't need tons of prose.

One interesting thing at the previous job where there was a push for detailed specifications is that the programmers there felt reading code was too hard. They didn't want to do it, they weren't good at it and so they would look for documentation as a way to figure it out. There was no review process there. When I first got involved with reviews I realized how poor my reading skills were… having been doing it for a while, my ability to dive into code and read and comprehend it has improved immensely.

I don't think we should be proud of a lack of documentation. I think we should determine what kind of documentation is useful, write it and then produce no more. Documenting everything we possibly can is a bad idea.
jml on 2010-04-29 17:06
Yeah.

I'm a huge believer in putting code clarity above everything else, and think good code design can do a lot to reduce the amount of documentation you need, much like a well-designed product doesn't need a massive user manual.

Still, I'm simply bewildered that people can say "My code is self-documenting" with a straight face. Maybe if it's a really small sample of code.
Thomas Hervé on 2010-04-29 16:58
I sure agree with you, I was just pointing out that telling people "not to write doc is OK" is not OK.

Also, I have abandoned for long the idea that I could write code easy to understand. It may be possible for self-contained code, but any interaction with a library, an API, or an external application will result in some kind of stupid error management (like POSIX you mention). That's why you need tests too :)
jml on 2010-04-29 16:26
First, I'm not against documenting code. I'm against documenting code without knowing why you are doing it.

Second, you are conflating "documenting code" with "making code easy to understand". Only the second one is valuable in itself. The first is a means to that end, and only one of many.

Did you read past there?
Thomas Hervé on 2010-04-29 16:17
If not, take pride in your lack of code documentation! It is the direct fruit of you doing better things with your life. Stand up, walk out the door an skip down the street, clutch the first suit-wearing stranger you see by his lapels and shout "My code is under-documented, yippee ki-yay!"

I couldn't disagree more. Because once you're in the street, there is a good chance that you get hit by a bus. And then, the person hired to replace you (let's face it, we won't mourn you forever), will have to take over your undocumented piece of crap, and it will take him months to understand your APIs because of the lack of documentation.

Also, admitting that yourself in 6 months is not the same person today, you're making him a big favor by documenting your code. That is, you.