Unlike the Exercise 1, this exercise is to be implemented in Standard ML. So you will need to revise the syntax of SML and elements of functional programming.
For lists, you should use the inbuilt type 'a list, where 'a is a type variable standing for the type of elements in the list. Recall that lists are created from the empty list nil or [], and by adding an element x to the front of a list xs to form the new list x::xs. Recall also that, unlike C, we may define functions by pattern matching.
We consider the (familiar) task of rearranging a list of elements which support an order operation (allowing us to compare elements: a reflexive, antisymmetric and transitive binary relation) into ascending order, a task sometimes called ``sorting''.
There is a wide range of sorting algorithms available (see the textbooks for the course). In this exercise, we code up several of the algorithms and look at their performance in terms of CPU cycles used in processing.
One of the simpler algorithms to code is called InsertSort. We code it below for lists of real numbers. All the code presented in this exercise can be copied from $CS2011/lab2/sorting.
fun insert(x:real,[]) = [x] | insert(x, y::ys) = if x<y then x::y::ys else y::insert(x,ys); fun insertsort([]) = [] | insertsort(x::xs) = insert(x,insertsort(xs));
Make sure you understand this algorithm and run it on several examples of lists of real numbers. In order to run the timing analyses below you will need to load (Moscow) SML with the option -P full to load the timing units.
In order to assess the performance of the algorithm we need to test it on a range of lists, in particular we wish to see what happens on long lists, so that we can assess how fast it is as the size of the input list increases.
To do so we generate some lists of random numbers. Random number generators can be built using arithmetic operations. Strictly speaking we should call these pseudo-random number generators. There is a considerable literature on them (see Knuth's books, or Sedgewick's Algorithms if you are interested).
Here is one random number generator:
local val a = 16807.0 and m = 2147483647.0 in fun next(seed) = let val t = a * seed in t - m * real(floor (t/m)) end end;It defines a function next which, starting at any real number, generates the next number in a pseudo-random sequence. By successively calling this function, we can generate a pseudo-random sequence, as follows.
fun randomlist (n, seed, tail) = if n=0 then (seed,tail) else randomlist(n-1,next(seed),seed::tail); fun examplelist(n) = let val (_, result) = randomlist(n,1.0,[]) in result end;The function examplelist takes an integer n and generates a pseudo-random sequence of length n as a list of real numbers.
Run the above sorting algorithm on some pseudo-random sequences.
As the list gets longer, the running time increases (is this always the case?). We show how to compute the `running time' in SML. Here is some code:
val cycles = Timer.startCPUTimer(); insertsort(examplelist 200); (Time.toMilliseconds)(#usr ((Timer.checkCPUTimer)(cycles)));The first line sets up and initialises a process (called cycles) which computes the CPU cycles. The next line runs the InsertSort algorithm on a list of length 200. The final line computes and returns an estimate of the CPU time used in milliseconds. Remember to load SML with the option -P full to load the timing units. Each time you run a timing analysis you need both the first line (to initialise the timing process) and the final line (to return the time used). You need not understand timing processes any further but if you wish to the online manual is at http://www.dina.kvl.dk/-sestoft/mosml.html.
You may use any plotting method. A simple one is to use gnuplot. Type gnuplot and at the prompt, type plot "datafile" where datafile is a file that you have created with a list length and running time on each line. For example:
200 830 400 1510 ... etcYou may assemble the datafile by hand, or generate it automatically.
For marking, you need to show the demonstrator your code for these two algorithms, show that they work and explain why. You will also need to generate the timing graphs for the three sorting algorithms, and explain the shape of the curves: Look at the three timing graphs. Clearly some algorithms grow slower than others as the size of the input increases. Simple algorithms are not always the best! Can you see what sorts of rates of growth are involved? For example: are they linear - a straight line, quadratic - grow as where is the length of the list, or cubic ...? Hint: a linear algorithm doubles its running time as the size of the input doubles. A quadratic one quadruples (times 4) its running time as the size of the input doubles, and a cubic is times 8 etc. Also note that for short lists the rate of grow differs from that of longer lists.