next up previous
Next: About this document ... Up: CS2011. Algorithms and Data Previous: CS2011 Lab 3: Intrinsic

Subsections

CS2011 Lab 4: Tables (2 lab sessions)


Description

The implementation of a very simple spelling checker uses a Table in which to store all the words in a dictionary (in this Table there is no data associated with a key, just the key itself, a string.) A text is then read and words which are in the text which can not be found in the dictionary are reported to the user, together with the line numbers on which these words occur in the text.

Your tasks for this lab are to implement C data types and functions which provide two different implementations of the Table data type, one using Trees and the other using Hash Tables. These data types will then be used by the driver code which is provided for you, and the results compared.

There is just one deadline for this lab, at the end of the second lab session, when both implementations will be marked together.

In both cases the Table data type requires the following functions


Tree based implementation

The first task is to implement the Table type using ordered binary trees. To simplify your task, you are provided with C header files which define the appropriate data types and function headers.


/* table-tree.h */
/* Header file for tree based implementation of Table */

/* Time-stamp: <November 14, 2000 11:18 graham@cs.man.ac.uk> */

#ifndef TABLE_TREE_H
#define TABLE_TREE_H
/* Type for size of hash table */
typedef unsigned int Table_size;   /* ignored but necessary for consistent  */
                                   /* interface */

typedef int Boolean;

/* The type for keys */
typedef char * Key_Type;

typedef struct tree_node *tree_ptr;

struct tree_node {
  Key_Type element;
  /*   Data_Type data; */  /* we'll forget this */
  tree_ptr left;
  tree_ptr right;
};

/* The table */
typedef tree_ptr Table;

Table initialize_table(Table_size); /* Size of table not used in tree */
                                    /* implementation, arg only present */
                                    /* for consistent interface */

Table insert(Key_Type,Table);

Boolean find(Key_Type, Table);

void print_table(Table);

#endif

You can use a simple insert technique to build the tree that constitutes the Table, although for the usage it has in this example this will give extremely sub-optimal tree shape (Why is this?). The more ambitious amongst you might like to use a more balanced tree representation such as 2-3 trees, AVL trees, or Red-Black trees. See, for example Weiss: Data structures and Algorithms in C, Sedgwick: Algorithms in C, Standish: Data structures, Algorithms and Software Principles or many other Data Structures texts. The code implementing a Table using trees should be in a file tree.c.


Hash table based implementation

The second task is to reimplement the Table data type using hash tables. The hashing strategy to be adopted is that of closed hashing, so that collisions are dealt with by using a collision resolution function.

Your code should keep the num_entries field up to date and the insert function should check for a full table, and exit the program cleanly should this occur. An optional extra could allow the user to increase the hash table size when the table is getting full and then rehash into a larger table.

You should experiment with the various hash functions described in the lectures, and given in the file hash-funs.c, with different Table sizes and different collision resolution functions. You will need to add code to speller.c to report your findings.

The header file for this section is as follows:

/* table-hash.h */
/* Header file for hash table based implementation */

/* Time-stamp: <November 14, 2000 11:18 graham@cs.man.ac.uk> */

#ifndef TABLE_HASH_H
#define TABLE_HASH_H

typedef int Boolean;

/* The type for keys */
typedef char * Key_Type;

/* Type for size of hash table */
typedef unsigned int Table_size;

/* Type for index into hash table */
typedef unsigned int Index;

/* Type for state of hash table entry */
enum kind_of_entry { legitimate, empty, deleted };

/* Type for hash table entry */
struct hash_entry
{
  Key_Type element;
  /* Data_Type data; */  /* we'll forget this */
  enum kind_of_entry state;
};

typedef struct hash_entry cell;

/* The underlying hash table stucture */
struct hash_tbl
{
  Table_size table_size;
  Table_size num_entries;
  cell *the_cells; /* an array of hash_entry cells,
                    * allocated during initialisation */
};

/* The hash table */
/* - a pointer to the hash table stucture */
typedef struct hash_tbl *Table;

/* Function headers */

Table initialize_table(Table_size);

Table insert(Key_Type,Table);

Boolean find(Key_Type, Table);

void print_table(Table);

#endif

The code implementing a Table in this way should be in a file hash.c.


Putting it all together

You are provided with a driver spelling checker program, speller.c, that uses the data types together with a Makefile, to show how everything fits together, and a sample dictionary, sample-dictionary and text, test-text. All the files can be found in $CS2011/lab4/. You should copy all the files from that directory. (Note that the file links on this page will only work from the department's Unix machines)

The Makefile gives rules for building two binaries, spellt, which uses the tree implementation, and spellh, which uses the hash table approach (just use make spellh and make spellt to build them. Of course this requires you to provide the appropriate function definitions in tree.c and hash.c). Unless you modify speller.c these two programs should, of course, produce identical output. For both implementations you should test your program using a larger dictionary, such as /usr/share/dict/words. You can use the -d flag to specify an alternative dictionary, and the -s flag to specify a different hash table size. Try a prime number greater than twice the size of the dictionary (100003 is a suitable size for the file /usr/share/dict/words).

An optional extra (not assessed). Give your spelling checker the capability to read from a user's personal dictionary in addition to the system dictionary, and to add `incorrect' words to this personal dictionary if the user so chooses.


next up previous
Next: About this document ... Up: CS2011. Algorithms and Data Previous: CS2011 Lab 3: Intrinsic
Graham Gough
2001-11-12