cronj

task scheduling and simulation


Author: Chris Zheng  (z@caudate.me)
Library: v1.4.1
Date: 04 September 2014
Website: http://github.com/zcaudate/cronj
Generated By: MidjeDoc


1   Installation

2   Background

3   Design

  3.1   Seperation of Concerns
  3.2   Thread Management
  3.3   Simulation Testing

4   Walkthrough

  4.1   Creating a Task
  4.2   Running Tasks
  4.3   Running Simulations
    4.3.1   Y2K Revisited
    4.3.2   Single Threaded
    4.3.3   Interval and Pause
    4.3.4   Speeding Up
  4.4   Task Management
    4.4.1   Showing Threads
    4.4.2   Killing Threads
  4.5   Pre and Post Hooks

5   API Reference

  5.1   cronj
    5.1.1   crontab
  5.2   system commands
    5.2.1   start!
    5.2.2   stop!
    5.2.3   shutdown!
    5.2.4   restart!
    5.2.5   stopped?
    5.2.6   running?
    5.2.7   uptime
  5.3   task scheduling
    5.3.1   enable-task
    5.3.2   disable-task
    5.3.3   task-enabled?
    5.3.4   task-disabled?
  5.4   task simulation
    5.4.1   simulate
    5.4.2   simulate-st
  5.5   thread management
    5.5.1   get-ids
    5.5.2   get-task
    5.5.3   get-threads
    5.5.4   exec!
    5.5.5   kill!

6   End Notes


cronj

task scheduling and simulation


Author: Chris Zheng  (z@caudate.me)
Library: v1.4.1
Date: 04 September 2014
Website: http://github.com/zcaudate/cronj
Generated By: MidjeDoc


1    Installation

Add to project.clj dependencies:

[im.chit/cronj "1.4.1"]

All functions are in the cronj.core namespace.

(use 'cronj.core)

2    Background

cronj was built for a project of mine back in 2012. The system needed to record video footage from multiple ip-cameras in fifteen minute blocks, as well as to save pictures from each camera (one picture every second). All saved files needed a timestamp allowing for easy file management and retrieval.

At that time, quartzite, at-at and monotony were the most popular options. After coming up with a list of design features and weighing up all options, I decided to write my own instead. As a core component of the original project, cronj has been operational now since October 2012. A couple of major rewrites and api rejuggling were done, but the api has been very stable from version 0.6 onwards.

There are now many more scheduling libraries in the clojure world:

With so many options, and so many different ways to define task schedules, why choose cronj? I have listed a number of design decisions that make it beneficial. However, for those that are impatient, cut to the chase, by skipping to the simulations section.

3    Design

cronj was built around a concept of a task. A task has two components:

Tasks are triggered by a scheduler who in-turn is notified of the current time by a timer. If a task was scheduled to run at that time, it's handler would be run in a seperate thread.

; cronj                schedule
; --------------       +-------------------------+
; scheduler watches    |  '* 8 /2 7-9 2,3 * *'   |
; the timer and        +-------------------------+
; triggers tasks       |  :sec    [:*]           |
; to execute at        |  :min    [:# 8]         |
; the scheduled time   |  :hour   [:| 2]         |
;                      |  :dayw   [:- 7 9]       |
;                      |  :daym   [:# 2] [:# 3]  |
;                      |  :month  [:*]           |
;                      |  :year   [:*]           |
;                      +-----------+-------------+
; task                              |                        XXXXXXXXX
; +-----------------+   +-----------+-----+                XX         XX
; |:id              |   |           |     |\             XX  timer      XX
; |:desc            +---+-+:task    |     | \           X                 X
; |:handler         |   |  :schedule+     |  \         X     :start-time   X
; |:pre-hook        |   |  :enabled       | entry      X     :thread       X+----+
; |:post-hook       |   |  :opts          |    `.      X     :last-check   X     |
; |:enabled         |   |                 |      \      X    :interval    X      |
; |:args            |   _-------._--------,       \      XX             XX       |
; |:running         |    `-._     `..      `.      \       XX         XX         +
; |:last-exec       |        `-._    `-._    `.     \        XXXXXXXXX         watch
; |:last-successful |            `-._    `-._  `.    `.                          +
; +----------+------+                `-._    `-. `.    \                         |
;                         +----+----+----`-._-+-`-.`.--->----+----+----+----+----+----+
;                         |    |    |    |   `-..  | `. |    |    |    |    |    |    |
;                         +----+----+----+----+--`-.---'+----+----+----+----+----+----+
;                                                                          scheduler

3.1    Seperation of Concerns

A task handler is just a function taking two arguments:

(fn [t opts]
    (... perform a task ...))

t represents the time at which the handler was called. This solves the problem of time synchronisation. For example, I may have three tasks scheduled to run at a same time:

All these tasks will end at different times. To retrospectively reasoning about how all three tasks were synced, each handler is required to accept the triggred time t as an argument.

opts is a hashmap, for example {:path '/app/videos'}. It has been found that user customisations such as server addresses and filenames, along with job schedules are usually specified at the top-most tier of the application whilst handler logic is usually in the middle-tier. Having an extra opts argument allow for better seperation of concerns and more readable code.

3.2    Thread Management

In reviewing other scheduling libraries, it was found that fully-featured thread management capabilities were lacking. cronj was designed with these features in mind:

3.3    Simulation Testing

Because the timer and the scheduler modules have been completely decoupled, it was very easy to add a simulation component into cronj. Simulation has some very handy features:

4    Walkthrough

In this section all the important and novel features and use cases for cronj will be shown. Interesting examples include: simulation, task management and hooks.

4.1    Creating a Task

print-handler outputs the value of {:output opts} and the time t.

(defn print-handler [t opts]
  (println (:output opts) ": " t))

print-task defines the actual task to be run. Note that it is just a map.

(def print-task
  {:id "print-task"
   :handler print-handler
   :schedule "/2 * * * * * *"
   :opts {:output "Hello There"}})

4.2    Running Tasks

Once the task is defined, cronj is called to create the task-scheduler (cj).

(def cj (cronj :entries [print-task]))

Calling start! on cj will start the timer and print-handler will be triggered every two seconds. Calling stop! on cj will stop all outputs

(start! cj)

;; > Hello There :  #<DateTime 2013-09-29T14:42:54.000+10:00>

       .... wait 2 secs ...

;; > Hello There :  #<DateTime 2013-09-29T14:42:56.000+10:00>

       .... wait 2 secs ...

;; > Hello There :  #<DateTime 2013-09-29T14:42:58.000+10:00>

       .... wait 2 secs ...

;; > Hello There :  #<DateTime 2013-09-29T14:43:00.000+10:00>

(stop! cj)

4.3    Running Simulations

Simulations are a great way to check your application for errors as they provide constant time inputs. This allows an entire system to be tested for correctness. How simulate works is that it decouples the timer from the scheduler and tricks the scheduler to trigger on the range of date inputs provided.

4.3.1    Y2K Revisited

For instance, we wish to test that our print-handler method was not affected by the Y2K Bug. T1 and T2 are defined as start and end times:

(def T1 (local-time 1999 12 31 23 59 58))

(def T2 (local-time 2000 1  1  0  0 2))

We can simulate events by calling simulate on cj with a start and end time. The function will trigger registered tasks to run beginning at T1, incrementing by 1 sec each time until T2. Note that in this example, there are three threads created for print-handler. The printed output may be out of order because of indeterminancy of threads (we can fix this later).

(simulate cj T1 T2)

           .... instantly ...

;; > Hello There :  #<DateTime 1999-12-31T23:59:58.000+11:00>
;; > Hello There :  #<DateTime 2000-01-01T00:00:02.000+11:00>    ;; out of order
;; > Hello There :  #<DateTime 2000-01-01T00:00:00.000+11:00>
           

4.3.2    Single Threaded

To keep ordering of the println outputs, simulate-st can be used. This will run print-handler calls on a single thread and so will keep order of outputs. Because of the sequential nature of this type of simulation, it is advised that simulate-st be used only if there are no significant pauses or thread blocking in the tasks.

(simulate-st cj T1 T2)

           .... instantly ...

;; > Hello There :  #<DateTime 1999-12-31T23:59:58.000+11:00>
;; > Hello There :  #<DateTime 2000-01-01T00:00:00.000+11:00>
;; > Hello There :  #<DateTime 2000-01-01T00:00:02.000+11:00>
           

4.3.3    Interval and Pause

Two other arguments for simulate and simulate-st are:

It can be seen that we can simulate the actual speed of outputs by keeping the interval as 1 and increasing the pause time to 1000ms

(simulate cj T1 T2 1 1000)

;; > Hello There :  #<DateTime 1999-12-31T23:59:58.000+11:00>

       .... wait 2 secs ...

;; > Hello There :  #<DateTime 2000-01-01T00:00:00.000+11:00>

       .... wait 2 secs ...

;; > Hello There :  #<DateTime 2000-01-01T00:00:02.000+11:00>
       

4.3.4    Speeding Up

In the following example, the interval has been increased to 2 seconds whilst the pause time has decreased to 100ms. This results in a 20x increase in the speed of outputs.

(simulate cj T1 T2 2 100)

;; > Hello There :  #<DateTime 1999-12-31T23:59:58.000+11:00>

       .... wait 100 msecs ...

;; > Hello There :  #<DateTime 2000-01-01T00:00:00.000+11:00>

       .... wait 100 msecs ...

;; > Hello There :  #<DateTime 2000-01-01T00:00:02.000+11:00>
       

Being able to adjust these simulation parameters are really powerful testing tools and saves an incredible amount of time in development. For example, we can quickly test the year long output of a task that is scheduled to run once an hour very quickly by making the interval 3600 seconds and the pause time to the same length of time that the task takes to finish.

Through simulations, task-scheduling can now be tested and entire systems just got easier to manage and reason about!

4.4    Task Management

Task management capabilities of cronj will be demonstrated by first creating a cronj object with two task entries labeled l1 and l2 doing nothing but sleeping for a long time:

(def cj
  (cronj :entries
         [{:id       :l1
           :handler  (fn [dt opts] (Thread/sleep 30000000000000))
           :schedule "/2 * * * * * *"
           :opts {:data "foo"}}
          {:id       :l2
           :handler  (fn [dt opts] (Thread/sleep 30000000000000))
           :schedule "0-2 * * * * * *"
           :opts {:data "bar"}}]))

4.4.1    Showing Threads

The task will be triggered using the exec! command. This is done for play purposes. Normal use would involve calling get-threads after start! has been called.

(get-threads cj :l1)     ;; See that there are no threads running
=> []                    ;;    - :l1 is empty

(get-threads cj :l2)
=> []                    ;;    - :l2 is empty

(exec! cj :l1 T1)        ;; Launch :l1 with time of T1

(get-threads cj :l1)
=> [{:tid T1 :opts {:data "foo"}}]  ;; l1 now has one running thread

(exec! cj :l1 T2)        ;; Launch :l2 with time of T2

(get-threads cj :l1)
=> [{:tid T1 :opts {:data "foo"}}   ;; l1 now has two running threads
    {:tid T2 :opts {:data "foo"}}]


(exec! cj :l2 T2 {:data "new"})     ;; Launch :l2 with time of T2
(get-threads cj :l2)
=> [{:tid T2 :opts {:data "new"}}]  ;; l2 now has one running thread

(get-threads cj)     ;; if no id is given, all running threads can be seen
=> [{:id :l1, :running [{:tid T1 :opts {:data "foo"}}
                        {:tid T2 :opts {:data "foo"}}]}
    {:id :l2, :running [{:tid T2 :opts {:data "new"}}]}]

4.4.2    Killing Threads

(kill! cj :l1 T1)       ;; Kill :l1 thread starting at time T1
(get-threads cj :l1)
=> [{:opts {:data "foo"}, :tid T2}] ;; l1 now has one running thread

(kill! cj :l1)          ;; Kill all :l1 threads
(get-threads cj :l1)
=> []                   ;; l1 now has no threads

(kill! cj)              ;; Kill everything in cj
(get-threads cj)
=> [{:id :l1, :running []}    ;; All threads have been killed
    {:id :l2, :running []}]

4.5    Pre and Post Hooks

Having pre- and post- hook entries allow additional processing to be done outside of the handler. They also have the same function signature as the task handler. An example below can be seen where data is passed from one handler to another:

(def cj
  (cronj
   :entries [{:id        :hook
              :desc      "This is showing how a hook example should work"
              :handler   (fn [dt opts]
                           (println "In handle, opts:" opts)
                           (Thread/sleep 1000) ;; Do Something
                           :handler-result)
              :pre-hook  (fn [dt opts]
                           (println "In pre-hook," "opts:" opts)
                           (assoc opts :pre-hook :pre-hook-data))
              :post-hook (fn [dt opts]
                           (println "In post-hook, opts: " opts))
              :opts      {:data "stuff"}
              :schedule  "* * * * * * *"}]))

(exec! cj :hook T1)

;; > In pre-hook, opts: {:data stuff}
;; > In handle, opts: {:data stuff, :pre-hook :pre-hook-data}

           .... wait 1000 msecs ....

;; > In post-hook, opts:  {:data stuff, :pre-hook :pre-hook-data, :result :handler-result}

As could be seen, the :pre-hook function can modify opts for use in the handler function while :pre-hook can take the result of the main handler and do something with it. I use it mostly for logging purposes.

5    API Reference

5.1    cronj

cronj constructs a task-scheduler object.

(cronj :entries <vector-of-tasks>)

An simple example:

(def cnj
  (cronj :entries
         [{:id "print-task"
           :handler (fn [t opts] (println (:output opts) ": " t))
           :schedule "/2 * * * * * *"
           :opts {:output "Hello There"}}]))

5.1.1    crontab

Each cronj task has a :schedule entry. The value is a string specifying when it is supposed to run. The string is of the same format as crontab - seven elements seperated by spaces. The elements are used to match the time, expressed as seven numbers:

 second minute hour day-of-week day-of-month month year

The rules for a match between the crontab and the current time are:

Where A, B and N are numbers; E1 and E2 are expressions. All seven elements in the string have to match in order for the task to be triggered.

;; Triggered every 5 seconds

"/5 * * * * * *"


;; Triggered every 5 seconds between 32 and 60 seconds

"32-60/5 * * * * * *"

;; Triggered every 5 seconds on the 9th aand 10th
;; minute of every hour on every Friday from June
;; to August between years 2012 to 2020.

"/5  9,10  * 5 * 6-8 2012-2020"

5.2    system commands

System commands mainly work with the cronj timer.

5.2.1    start!

Starts up the timer such that tasks are launched at the scheduled time

(start! <cnj>)

5.2.2    stop!

Stops the timer. New task threads will not be launched. However, exisiting tasks threads will not be killed and finish naturally.

(stop! <cnj>)

5.2.3    shutdown!

Stops the timer. New task threads will not be launched. Exisiting tasks threads will be killed immediately

(shutdown! <cnj>)

5.2.4    restart!

Restarts the timer, killing all exisiting threads.

(restart! <cnj>)

5.2.5    stopped?

Checks whether the timer is stopped

(stopped? <cnj>)

5.2.6    running?

Checks whether the timer is running. Complement of stopped?

(running? <cnj>)

5.2.7    uptime

Checks how long the timer has been running. Returns a long representing the time in msecs.

(uptime <cnj>)

5.3    task scheduling

5.3.1    enable-task

If a task has been disabled, meaning that the task will not run at its allocated time, enable-task will enable it.

(enable-task <cnj> <task-id)

5.3.2    disable-task

Disables a task so that it will not run

(disable-task <cnj> <task-id)

5.3.3    task-enabled?

Checks if a task has been enabled. Tasks are enabled by default

(task-enabled? <cnj> <task-id)

5.3.4    task-disabled?

Checks if a task has been disabled.

(task-disabled? <cnj> <task-id)

5.4    task simulation

5.4.1    simulate

Simulates the timer over start-time and end-time. Additional parameters are interval and pause. More examples can be found in Running Simulations

(simulate <cnj> <start-time> <end-time>)

(simulate <cnj> <start-time> <end-time> <interval> <pause>)

5.4.2    simulate-st

Simulate the timer over start-time and end-time. Just like simulate but all tasks are executed on a single thread (only should be used on non-blocking handlers)

(simulate-st <cnj> <start-time> <end-time>)

(simulate-st <cnj> <start-time> <end-time> <interval> <pause>)

5.5    thread management

5.5.1    get-ids

Return a list of all task ids:

(get-ids <cnj>)

5.5.2    get-task

Return the task entry by id

(get-task <cnj> <task-id>)

;; Example Output:
(get-task cnj "print-task")
=> {:task {:desc ""
           :running "<Ova1228148821 []>"
           :last-exec "#<Ref@791c61fe: nil>"
           :last-successful "#<Ref@3665a8d0: nil>"
           :handler "#<cronj_api$fn__8574 midje_doc.cronj_api$fn__85744c2e0b96>"
           :id "print-task"}
    :schedule "/2 * * * * * *"
    :tab-array (("<tab$_STAR__$fn__2780 cronj.data.tab$_STAR__$fn__278062facbec>")
                (:*) (:*) (:*) (:*) (:*) (:*))
    :enabled true
    :opts {:output "Hello There"}
    :output "#<Atom@3f6225b8: nil>"}

5.5.3    get-threads

Returns a list of running threads. See Task Management for examples.

(get-threads <cnj>)  ;; Gets all running threads in <cnj>

(get-threads <cnj> <task-id>)  ;; Gets all threads for <task-id> in <cnj>

5.5.4    exec!

Launches a thread for the task, irrespective of whether the task has not been scheduled or that it has been disabled.

(exec! <cnj> <task-id>)  ;; launches a new thread for <task-id> using
                         ;; the current time and default opts

(exec! <cnj> <task-id> <dt>)  ;; launches a new thread for <task-id> using
                              ;; the time as <dt> and default opts

(exec! <cnj> <task-id> <dt> <opts>) ;; launches a new thread for <task-id> using
                                    ;; the time as <dt> and opts as <opts>

5.5.5    kill!

Kills running threads. See Task Management for examples.

(kill! <cnj>)    ;; Kills all running threads

(kill! <cnj> <task-id>)  ;; Kills all running threads for <task-id>

(kill! <cnj> <task-id> <dt>) ;; Kills only thread started at <dt> for <task-id>

6    End Notes

For any feedback, requests and comments, please feel free to lodge an issue on github or contact me directly.

Chris.