Thursday, March 13, 2008

User Experience — When Reality Attacks

I've just got back from a hectic week in London, where members of the Bazaar community got together and thrashed out a bunch of important topics.

We talked about "user experience" and how we all want Bazaar to be a joy to use. More than one person said that we have been focusing too much on features and performance instead of user experience. The term was never really pinned down, but it's fair to say that there are things other than convenience and speed that affect how users feel while using Bazaar and that we need to work on those things, once we figure out what they are.

I think I might know the name of one of them: errors. Next post: "Notes on error".

Labels: , ,

Saturday, August 11, 2007

Within You Without You

Testing is hard, writing testing frameworks is easy. In an effort to make testing easier, big projects like Twisted, Bazaar and Zope write their own testing frameworks. That way they control both the test runner and the tests that are run. It's actually quite convenient.

However, it's led to a significant problem:
There are many similar implementations of xUnit in Python, each with subtle incompatibilities.

Running Twisted tests in the Zope test runner? Watch out for the threads that the reactor maintains between tests. Running Bazaar tests with Trial? On my machine, I get told that elementtree doesn't have an 'ElementTree' attribute. Hmm.

When talking about this problem, I often refer loosely to "PyUnit compatibilty". The idea is that:

  1. Every Python test runner should support running vanilla Python standard library unittest.TestCase tests.

  2. Every Python unit test should be able to be run using the mechanisms in unittest.py in the Python standard library.


In other words, this code should Just Work:

import unittest
from yourframework import testing

class PythonTestCase(unittest.TestCase):
def test_something(self):
pass

class FrameworkTestCase(testing.TestCase):
def test_something(self):
pass

if __name__ == '__main__':
python_test_result = unittest.TestResult()
framework_test_result = testing.TestResult()
FrameworkTestCase('test_something').run(python_test_result)
FrameworkTestCase('test_something').run(framework_test_result)
PythonTestCase('test_something').run(python_test_result)
PythonTestCase('test_something').run(framework_test_result)
# At this point, python_test_result and framework_test_result
# should hold equivalent data.

If your framework is PyUnit compatible then the above fragment should give the same results if run directly or if run in your runner. Things get a little bit hazier when it comes to test discovery.

So, if your unit test requires that it be run inside a special suite (e.g. TrialSuite) in order to work correctly, it is not PyUnit compatible. If your test runner does some critical set up that enables features that your tests need, then it is not PyUnit compatible.

This leads to a kind of thinking where certain features belong on the base test case and others belong in the test runner. Putting features in the wrong place might not lead to a strict incompatibility, but it can lead to significant inconvenience. (And what are automated tests if not a convenience?).

Two examples from Twisted:

Temporary Working Directory

Trial creates a _trial_temp working directory and changes into that directory to run tests. In Trial, this feature is provided by the test runner. It should be provided by the base TestCase class.

  • It's not clear that every test needs this feature.

  • Twisted tests now assume that they can create files with impunity. When Twisted tests are run in a different test runner, they leave garbage files everywhere.


Timeouts

By default, any Trial test that runs for more than two minutes will fail with a timeout error. The timeout period can be configured on a per-test, per-test-class, per-module or per-package basis. Trial implements this feature on TestCase, it should be implemented on the runner.

  • Even in the Twisted test runner, this makes debugging more painful. You must do all of your debugging in under two minutes.

  • Intuitively, you might think that the runner should control how tests are run.

  • Tests that don't descend from Trial's TestCase can still hang.

  • Two minutes might be good enough for me on a Monday, but I might be busier on Friday. I should be able to change the timeout without changing code.

Labels: , ,

Saturday, August 4, 2007

Heartbeats and Sails

Mark Shuttleworth:
What’s good enough performance? Well, I like to think in terms of “heartbeat time”. If the major operations which I have to do regularly (several times in an hour) take less than a heartbeat, then I don’t ever feel like I’m waiting. Things which happen 3-5 times in a day can take a bit longer, up to a minute, and those fit with regular workbreaks that I would take anyhow to clear my head for the next phase of work, or rest my aching fingers.

Take this rule of thumb and apply it to unit tests:

  • Tests for whatever chunk of code you are working on should take "less than a heartbeat".

  • Your entire testing suite (that you run 3-5 times in a day, right?) can take longer to run, up to a minute.


Authors of tests and testing frameworks, there's your challenge.

Tests that take too long to run just won't get run. Programmers will postpone running the suite until the last possible moment. When using something like PQM or Buildbot, this can be disastrous. Other developers might have to wait hours for their code to land on trunk.

Gerard Mezsaros's new book, xUnit Test Patterns has some good ideas about what to do and what not to do to make your tests run in a couple of heartbeats.

Labels: ,