CS2011 Lab 2: Timing analysis of algorithms (1 lab session)

Purpose.

This is an exercise to illustrate how different algorithms for the same task (sorting of lists) can have different performances. We undertake a timing analysis to see how the behaviour of the algorithm varies with input size.

Unlike the Exercise 1, this exercise is to be implemented in Standard ML. So you will need to revise the syntax of SML and elements of functional programming.

For lists, you should use the inbuilt type 'a list, where 'a is a type variable standing for the type of elements in the list. Recall that lists are created from the empty list nil or [], and by adding an element x to the front of a list xs to form the new list x::xs. Recall also that, unlike C, we may define functions by pattern matching.

We consider the (familiar) task of rearranging a list of elements which support an order operation (allowing us to compare elements: a reflexive, antisymmetric and transitive binary relation) into ascending order, a task sometimes called ``sorting''.

There is a wide range of sorting algorithms available (see the textbooks for the course). In this exercise, we code up several of the algorithms and look at their performance in terms of CPU cycles used in processing.

One of the simpler algorithms to code is called InsertSort. We code it below for lists of real numbers. All the code presented in this exercise can be copied from $CS2011/lab2/sorting.

   fun insert(x:real,[]) = [x]
     | insert(x, y::ys) = 
          if x<y then x::y::ys else y::insert(x,ys);

   fun insertsort([]) = []
     | insertsort(x::xs) = insert(x,insertsort(xs));

Make sure you understand this algorithm and run it on several examples of lists of real numbers. In order to run the timing analyses below you will need to load (Moscow) SML with the option -P full to load the timing units.

In order to assess the performance of the algorithm we need to test it on a range of lists, in particular we wish to see what happens on long lists, so that we can assess how fast it is as the size of the input list increases.

To do so we generate some lists of random numbers. Random number generators can be built using arithmetic operations. Strictly speaking we should call these pseudo-random number generators. There is a considerable literature on them (see Knuth's books, or Sedgewick's Algorithms if you are interested).

Here is one random number generator:

   local val a = 16807.0 and m = 2147483647.0
   in fun next(seed) =
            let val t = a * seed
            in t - m * real(floor (t/m)) end
   end;

It defines a function next which, starting at any real number, generates the next number in a pseudo-random sequence. By successively calling this function, we can generate a pseudo-random sequence, as follows.

   fun randomlist (n, seed, tail) =
        if n=0 then (seed,tail) 
               else randomlist(n-1,next(seed),seed::tail);

   fun examplelist(n) = 
         let val (_, result) = randomlist(n,1.0,[]) in result end;

The function examplelist takes an integer n and generates a pseudo-random sequence of length n as a list of real numbers.

Run the above sorting algorithm on some pseudo-random sequences.

As the list gets longer, the running time increases (is this always the case?). We show how to compute the `running time' in SML. Here is some code:

   val cycles = Timer.startCPUTimer();

   insertsort(examplelist 200);

   (Time.toMilliseconds)(#usr ((Timer.checkCPUTimer)(cycles)));

The first line sets up and initialises a process (called cycles) which computes the CPU cycles. The next line runs the InsertSort algorithm on a list of length 200. The final line computes and returns an estimate of the CPU time used in milliseconds. Remember to load SML with the option -P full to load the timing units. Each time you run a timing analysis you need both the first line (to initialise the timing process) and the final line (to return the time used). You need not understand timing processes any further but if you wish to the online manual is at http://www.dina.kvl.dk/-sestoft/mosml.html.

Task 1

Run the InsertSort algorithm on a selection of lists (pseudo-random sequences) of increasing size. I suggest lists of length 200, 400, 600, etc. up to 2000. Plot the results of the timing as a graph of time in milliseconds against list length.

You may use any plotting method. A simple one is to use gnuplot. Type gnuplot and at the prompt, type plot "datafile" where datafile is a file that you have created with a list length and running time on each line. For example:

200 830
400 1510
... etc

You may assemble the datafile by hand, or generate it automatically.

Task 2

Look up the two sorting algorithms called MergeSort and QuickSort. Code them up in SML using list recursion (i.e. tail recursion). Call your program file timing.sml. QuickSort operates by splitting a list into two sublists by comparing elements with a so-called pivot element. Choose the head of the list as a pivot element. Plot the timing results for these two algorithms on the same selection of lists as for InsertSort. What do you notice about the performance of these three algorithms as the length of the list increases?

For marking, you need to show the demonstrator your code for these two algorithms, show that they work and explain why. You will also need to generate the timing graphs for the three sorting algorithms, and explain the shape of the curves: Look at the three timing graphs. Clearly some algorithms grow slower than others as the size of the input increases. Simple algorithms are not always the best! Can you see what sorts of rates of growth are involved? For example: are they linear - a straight line, quadratic - grow as $\bgroup\color{red}$N^2$\egroup$ where $\bgroup\color{red}$N$\egroup$ is the length of the list, or cubic ...? Hint: a linear algorithm doubles its running time as the size of the input doubles. A quadratic one quadruples (times 4) its running time as the size of the input doubles, and a cubic is times 8 etc. Also note that for short lists the rate of grow differs from that of longer lists.

Next: CS2011 Lab 3: Intrinsic Up: CS2011. Algorithms and Data Previous: CS2011 Lab 1: List

Graham Gough
2001-11-12