write testable documentation, not literate programs

Author: Chris Zheng  (z@caudate.me)
Library: v0.0.17
Date: 31 October 2013
Website: http://www.github.com/zcaudate/lein-midje-doc
Generated By: MidjeDoc

1   Quickstart

  1.1   Syntax Highlighting
  1.2   Generating from Source
  1.3   Installation
  1.4   Usage

2   Programming vs Documentation

3   Tooling for Documents

  3.1   Documentation Bugs
  3.2   Test Cases vs Docstrings
  3.3   Bridging the Divide
    3.3.1   Features
    3.3.2   Benefits

4   API Reference

  4.1   The Basics
    4.1.1   elements
    4.1.2   attributes
  4.2   Sectioning Elements
  4.3   Content Elements
    4.3.1   :paragraph
    4.3.2   :image
    4.3.3   :file
  4.4   Code Elements
    4.4.1   normal s-expressions
    4.4.2   facts form
    4.4.3   fact form
    4.4.5   :code

5   A Bug's Life

  5.1   Version One
  5.2   Version Two
  5.3   The Bug Surfaces

6   End Notes


write testable documentation, not literate programs

Author: Chris Zheng  (z@caudate.me)
Library: v0.0.17
Date: 31 October 2013
Website: http://www.github.com/zcaudate/lein-midje-doc
Generated By: MidjeDoc

1    Quickstart

lein-midje-doc can be used to generate documents with code examples from midje test files. For motivation, see chapter 2 and chapter 3. For api descriptions, see chapter 4. lein-midje-doc has been used to generate its own documention. The source can be found here.

1.1    Syntax Highlighting

For syntax hightlighting of code examples, the pygments python package should be installed. Make sure that the directory containing pygmentize is in your PATH settings.

1.2    Generating from Source

The html documentation can be generated after downloading the project:

> git clone https://github.com/zcaudate/lein-midje-doc.git
> cd lein-midje-doc
> lein midje-doc once
> open ./index.html

1.3    Installation

lein-midje-doc is a leiningen plugin. Install by adding entries in ~/.lein/profiles.clj:

 {:user {:plugins ...
         [lein-midje-doc '0.0.17']
         [lein-midje     '3.0.1']

1.4    Usage

  1. Start with a new or existing project.
  2. In project.clj make sure that midje is in the dependencies and place a :documentation entry in the defproject form (e.1.1).
  3. Create a file in test/docs/my_first_document.clj (e.1.2).
  4. Run lein midje-doc in a terminal within your project folder.
  5. The output documentation should be generated within the project directory as /my-first-document.html. An example of how this should look can be seen here
  6. Run lein midje :autotest in another terminal window. This will ensure that the documentation is correct.
  7. Use live-reload for the best experience of live documenting

e.1.1  -  project.clj

(defproject ...
  :profiles {:dev {:dependencies [[midje "1.5.1"]]}}
  {:files {"<document-name>"             ;; my-first-document
           {:input "<input-file-path>"   ;; test/docs/my_first_document.clj
            :title "<title>"             ;; My First Document
            :sub-title "<sub title>"     ;; Learning how to use midje-doc
            :author "<name>"
            :email  "<email>"}}}

e.1.2  -  test/docs/my_first_document.clj

(ns docs.my-first-document
  (:require [midje.sweet :refer :all]))

[[:chapter {:tag "hello" :title "Hello Midje Doc"}]]

"This is an introduction to writing with midje-doc."

[[:section {:title "Defining a function"}]]

"We define function `add-5`"

[[{:numbered false}]]
(defn add-5 [x]
  (+ x 5))

[[:section {:title "Testing a function"}]]

"`add-5` outputs the following results seen in
 [e.{{add-5-1}}](#add-5-1) and [e.{{add-5-10}}](#add-5-10):"

  [[{:tag "add-5-1" :title "1 add 5 = 6"}]]
  (add-5 1) => 6

  [[{:tag "add-5-10" :title "10 add 5 = 15"}]]
  (add-5 10) => 15)

2    Programming vs Documentation

The phrase 'Literate Programming' has been very popular lately. The main idea is that the code is written in a way that allows both a machine and a person to understand what is going on. Most people seem to agree that it is a great idea.

However Humans and machines are fundamentally different and rely on completely different methods of communication:

In short: Machines are programmed while humans are engaged, inspired and taught. Programs are written linearly for machines. Documentation are written like a woven lattice for humans. The fundamental structure of programs and documentation are very different from each other. Therefore, thinking that documentation can be automatically generated from doc-strings is a mechanistic approach not a humanistic one. Documents should be written for people, not machines. Our tools for documentation should reflect this as well.

3    Tooling for Documents

3.1    Documentation Bugs

Programming is a very precise art form. Programming mistakes, especially the little ones, can result in dire consequences and much wasted time. We therefore use tools such as debuggers, type checkers and test frameworks to make our coding lives easier and our source code correct.

Documentation is the programmers' means of communicating how to use or build upon a library, usually to a larger audience of peers. This means that any mistakes in the documentation results in wasted time for all involved. Therefore any mistake in documentation can have a greater effect than a mistake in source code because it wastes everybody's time.

There are various tools for documentation like latex, wiki and markdown. However, none address the issue that code examples in the documentation are not checked for correctness. A fictitious example illustrates how errors in documentation can produced and propagated can be seen in chapter 5

3.2    Test Cases vs Docstrings

The best description for our functions are not found in source files but in the test files. Test files are potentially the best documentation because they provide information about what a function outputs, what inputs it accepts and what exceptions it throws. Instead of writing vague phrases (e.3.1) in the doc-string, we can write the descriptions of what a function does directly with our tests (e.3.2).

e.3.1  -  source code (how to do something)

(defn split-string
  "The split-string function is used to split a string
  in two according to the idx that is passed."
  [s idx]
  [(.substring s 0 idx) (.substring s idx)])

e.3.2  -  test code (how something is used)

(facts "split-string usage:"

  (split-string "abcde" 1)
  => ["a" "bcde"]

  (split-string "abcde" 3)
  => ["abc" "de"])

It can be seen that test cases provides a much better explaination for split-string than the source doc-string. The irony however is that when a readme says: 'for documentation, please read the test files', the common consensus is that the project developer is too slack to write proper documentation. However, if we are truly honest with our own faults, this occurs because most programmers are too slack to read tests. We only want to read pretty documentation.

3.3    Bridging the Divide

lein-midje-doc plugin attempts to bridge the gap between writing tests and writing documentation by introducing three novel features:

3.3.1    Features

The features are:

  1. To generate .html documentation from a .clj test file.
  2. To express documentation elements as clojure datastructures.
  3. To render clojure code and midje facts as code examples.
  4. To allow tagging of elements for numbering and linking.

3.3.2    Benefits

In this way, the programmer as well as all users of the library benefits:

  1. All documentation errors can be eliminated.
  2. Removes the need to cut and copy test examples into a readme file.
  3. Entire test suites can potentially be turned into nice looking documentation with relatively little work.

4    API Reference

4.1    The Basics

4.1.1    elements

Elements are constructed using a tag and a map contained within double square brackets. The format is shown in (e.4.1). Elements tags have been inspired from latex:

Clojure strings are treated as paragraph elements whilst clojure forms are treated as code elements. fact and comment forms are also considered code elements. Elements will be described in detail in their respective sections.

e.4.1  -  Element Notation

[[<tag> {<key1> <value1>, <key2> <value2>}]]

for example

[[:chapter {:title "Hello World" :tag "hello"}]]

4.1.2    attributes

Attribute add additional meta-data to elements. They are written as a single hashmap within double square brackets. Attributes mean nothing by themselves. They change the properties of elements directly after them (e.4.2). Multiple attributes can be stacked to modify an element (e.4.3).

e.4.2  -  Attribute Notation

[[{:tag "my-paragraph"}]]
[[:paragraph {:content "This is a paragraph"}]]

is equivalent to

[[:paragraph {:content "This is a paragraph"
              :tag "my-paragraph"}]]

e.4.3  -  Stacked Attributes

[[{:numbered false}]]
[[{:lang "python"}]]
[[:code "
a = 1 + 1
print a   # outputs 2

produces the following python code
a = 1 + 1
print(a)   # outputs 2

4.2    Sectioning Elements

Sectioning elements are taken from latex and allow the document to be organised into logical sections. From highest to lowest order of priority, they are: :chapter, section, subsection and :subsubsection, giving four levels of organisation.

The numbering for elements are generated in sequencial order: (1, 2, 3 ... etc) and a tag can be generated from the title or specified for creating links within the document. :chapter, section and subsection elements are list in the table of contents using tags.

For example, I wish to write a chapter about animals and have organised content into categories shown in (e.4.4), it is very straight forward to turn this into sectioning elements (e.4.5) which will then generate the sectioning numbers for the categories as well as their tags listed as a comment (e.4.6):

e.4.4  -  Animal Categories

- Mammals
- Birds
*- Can Fly
**- Eagle
**- Hummingbird
*- Flightless
**- Penguin

e.4.5  -  Animal Elements

[[:chapter {:title "Animals"}]]
[[:section {:title "Mammals"}]]
[[:section {:title "Birds"}]]
[[:subsection {:title "Can Fly"}]]
[[:subsubsection {:title "Eagle"}]]
[[:subsubsection {:title "Hummingbird"}]]
[[:subsection {:title "Flightless"}]]
[[:subsubsection {:title "Penguin"}]]

e.4.6  -  Animal Generated Numbering

1 Animals                  ;; animals
  1.1 Mammals              ;; mammals
  1.2 Birds                ;; birds
    1.2.1 Can Fly          ;; can-fly Eagle        ;; eagle Hummingbird  ;; hummingbird
    1.2.2 Flightless       ;; flightless Penguin      ;; penguin

4.3    Content Elements

Content elements include :paragraph, :image, and file elements.

4.3.1    :paragraph

Paragraph elements should make up the bulk of the documentation. They can be written as an element (e.4.7) or in the usual case, as a string (e.4.8). The string is markdown with templating - so that chapter, section, code and image numbers can be referred to by their tags (e.4.9).

e.4.7  -  Paragraph Element

[[:paragraph {:content "Here is some content"}]]

e.4.8  -  Paragraph String

"Here is some content"

e.4.9  -  Markdown String

[[:chapter {:title "Chapter Heading" :tag "ch-heading"}]]

# Heading One
Here is some text.
Here is a tag reference to Chapter Heading - {{ch-heading}}

- Here is a bullet point
- Here is another one"

4.3.2    :image

The :image element embeds an image as a figure within the document. It is numbered and can be tagged for easy reference. The code example in (e.4.10) produces the image seen in Figure 1.

e.4.10  -  :image tag example

[[:image {:tag "clojure-logo" :title "Clojure Logo (source clojure.org)"
          :src "http://clojure.org/space/showimage/clojure-icon.gif"}]]

fig.1  -  Clojure Logo (source clojure.org)

4.3.3    :file

The :file element allows inclusion of other files into the document. It is useful for breaking up a document into managable chunks. A file element require that the :src attribute be specified. A high-level view of a document can thus be achieved, making the source more readable (e.4.11). This is similar to the \include element in latex.

e.4.11  -  :file tag example

[[:file {:src "test/docs/first_section.clj"}]]
[[:file {:src "test/docs/second_section.clj"}]]
[[:file {:src "test/docs/third_section.clj"}]]

4.4    Code Elements

Code displayed in documentation are of a few types:

  1. Code that needs to be run (normal clojure code)
  2. Code that needs verification taking input and showing output. (midje fact)
  3. Code that should not be run (namespace declaration examples)
  4. Code in other languages

The different types of code can be defined so that code examples render properly using a variety of methods

4.4.1    normal s-expressions

Normal s-expressions are rendered as is. Attributes can be added for grouping purposes. The source code shown in (e.4.12) would render the outputs (e.4.13) and (e.4.14)

e.4.12  -  seperating code blocks through attributes

[[{:title "add-n definition" :tag "c-add-1"}]]
(defn add-n [n]
  (fn [x] (+ x n)))

[[{:title "add-4 and add-5 definitions" :tag "c-add-2"}]]
(def add-4 (add-n 4))
(def add-5 (add-n 5))

e.4.13  -  add-n definition

(defn add-n [n]
  (fn [x] (+ x n)))

e.4.14  -  add-4 and add-5 definitions

(def add-4 (add-n 4))

(def add-5 (add-n 5))

4.4.2    facts form

Documentation examples put in facts forms allows the code to be verified for correctness using lein midje. Document element notation still be rendered except before and after the midje arrows (=>). Consecutive code within a fact form will stacked as one common code block. An example is given below where the source (e.4.15) gives two outputs: (e.4.16) and (e.)

e.4.15  -  Facts Form Source

[[{:tag "facts-form-output" :title "Facts Form Output"}]]
  [[{:title "Definining an atom" :tag "c-facts-1"}]]
  (def a (atom 1))
  (deref a) => 1

  [[{:title "Updating the atom" :tag "c-facts-2"}]]
  (update a inc 1)
  (deref a) => 2)

e.4.16  -  Definining an atom

(def a (atom 1))

(deref a)

=> 1

e.4.17  -  Updating the atom

(update a inc 1)

(deref a)

=> 2

4.4.3    fact form

For an entire block to be embedded in code, use the fact form. The source (e.4.18) will render the output (e.4.19)

e.4.18  -  Fact Form Source

[[{:tag "fact-form-output" :title "Fact Form Output"}]]
  (def a (atom 1))
  (deref a) => 1

  (update a inc 1)
  (deref a) => 2)

e.4.19  -  Fact Form Output

(def a (atom 1))
(deref a) => 1

(update a inc 1)
(deref a) => 2


Comments are clojure's built-in method of displaying non-running code and so this mechanisim is used in clojure for displaying code that should not be run, but still requires display. For example, (e.4.20) will output (e.4.21) without interferring with the midje tests.

e.4.20  -  Switching to a new namespace

[[{:title "Switching to a new namespace" :tag "c-com-1"}]]
  (in-ns 'hello.world)
  (use 'clojure.string)
  (split "Hello World" #"\s") ;=> ["Hello" "World"]

e.4.21  -  Switching to a new namespace

(in-ns 'hello.world)
(use 'clojure.string)
(split "Hello World" #"\s") ;=> ["Hello" "World"]

4.4.5    :code

The most generic way of displaying code is with the :code tag. It is useful when code in other languages are required to be in the documentation.    Python Example

The source and outputs are listed below:

e.4.22  -  Python for Loop Source

[[{:lang "python" :title "Python for Loop" :tag "c-py-1"}]]
myList = [1,2,3,4]
for index in range(len(myList)):
  myList[index] += 1
print myList

e.4.23  -  Python for Loop

myList = [1,2,3,4]
for index in range(len(myList)):
    myList[index] += 1
print myList    Ruby Example

The source and outputs are listed below:

e.4.24  -  Ruby for Loop Source

[[{:lang "ruby" :title "Ruby for Loop" :tag "c-rb-2"}]]
array.each_with_index do |element,index|

e.4.25  -  Ruby for Loop

array.each_with_index do |element,index|

5    A Bug's Life

5.1    Version One

A new clojure project is created.

> lein new fami  
> cd fami
> lein repl

A very useful function add-5 has been defined (e.5.1) and the corresponding tests specified (e.5.2). There are additional entries for add-5 in the readme as well as also being scattered around in the readme and various other documents (e.5.3).

This version of this library has been released as version 1.0

e.5.1  -  src/fami/operations.clj

(ns fami.operations)

(defn add-5 
"add-5 is a function that takes any number of arguments 
 and adds 5 to the sum"
[& ns]
(apply + 5 ns))

e.5.2  -  src/fami/test-operations.clj

(ns fami.test-operations
(:require [fami.operations :refer :all]
          [midje.sweet :refer :all]))

(fact "add-5 should increment any list of numbers by 5"
(add-5 5)  => 10
(add-5 1 2 3 4) => 15)

e.5.3  -  readme.md, operations.md


Here are some of the use cases for add-5

(add-5 5)    ;; => 10
(add-5 1 2 3 4)   ;; => 15


5.2    Version Two

The library is super successful with many users. The code undergoes refactoring and it is decided that the original add-5 (e.5.1) is too powerful and so it must be muted to only accept one argument. An additional function add-5-multi is used to make explicit that the function is taking multiple arguments (e.5.4). The tests throw an exception (e.5.5), and are quickly fixed (e.5.6)

This version of this library has been released as version 2.0

e.5.4  -  src/fami/operations.clj  -  v2

(ns fami.operations)

(defn add-5 [n]   ;; The muted version
"add-5 is a function that takes a number and adds 5 to it"
(+ n 5))

(defn add-5-multi 
"add-5-multi is a function that takes any number of arguments 
 and adds 5 to the sum"
[& ns]
(apply + 5 ns))

e.5.5  -  faliure message

FAIL "add-5 should increment any list of numbers by 5"
"Expected: 15
 Actual: clojure.lang.ArityException - Wrong number of args (4) passed to: fami.operations$add-5"

e.5.6  -  src/fami/test-operations.clj

(ns fami.test-operations
 (:require [fami.operations :refer :all]
           [midje.sweet :refer :all]))

 (fact "add-5 should increment only one input by 5"
 (add-5 5) => 10
 (add-5 1 2) => (throws clojure.lang.ArityException))
 (fact "add-5-multi should increment any list of numbers by 5"
 (add-5-multi 1 2 3 4) => 15)

5.3    The Bug Surfaces

Although the tests are correct, the documentation is not. Anyone using this library can potentially have the clojure.lang.ArityException bug if they carefully followed instructions in the documentation.

This is a trival example of a much greater problem. When a project begins to evolve and codebase begins to change, the documentation then becomes incorrect. Although source and test code can be isolated through testing, fixing documentation is a miserable and futile exercise of cut and paste. With no real tool to check whether code is still valid, the documentation become less and less correct until all the examples have to be rechecked and the documention rewritten.

Then the codebase changes again ...

Once the library has been release to the world and people have already started using it, there is no taking it back. Bugs propagate through miscommunication. Miscommunication with machines can usually be contained and fixed. Miscommunication with people becomes potentially more difficult to contain.

6    End Notes

For any feedback, requests and comments, please feel free to lodge an issue on github or contact me directly.