Tue May 11 15:45:37 BST 2010

Make my web-site faster! chosen=13
Flexible Mind Maps chosen=10
When Doodle just isn't enough chosen=9
Why am I over quota? chosen=7
Why won't my Regexp work? chosen=6
An Management System for Essay Plagiarism chosen=5
Animating Algorithms students=2 chosen=5
Deriving regular expressions from example texts chosen=4
Deriving BNF from example texts chosen=3
Another Pattern Matching Language chosen=2
Why won't my BNF work? chosen=1
Deriving grammars from parse trees chosen=1

Make my web-site faster!

Difficulty
SH	CM	BM
F	F	F

Supervisor

Peter Jinks

Equipment

None

Pre-Requisites

None

Research by Google has shown that users are turned off by web-pages that take too long to load, and they are now using this as part of their page ranking algorithms: Using site speed in web search ranking. Unfortunately, users also want lots of functionality, and for pages to look good, so web-sites must use every possible trick to speed up loading. One way of doing this is to pre-process HTML files to make them smaller, by removing redundant characters.

This project is about creating a system that can input an HTML page and output an equivalent but compressed page. An example compression would be to get rid of white-space and comments, but there are many more possibilities described in the information provided by Google.

A possible extension would be work with different versions of HTML and XHTML, or perhaps to bring versions up-to-date by eliminating deprecated tags etc. Another would be to minimise the CSS files used by an HTML page (e.g. by omitting styles for tags that aren't used on the particular page). Others might be to try to compress images, or to create imperfect but still useable HTML (e.g. by omitting closing tags).

Flexible Mind Maps

Difficulty
SH	CM	BM
F	F	F

Supervisor

Peter Jinks

Equipment

None

Pre-Requisites

None

"A Mind Map is a diagram used to represent words, ideas, tasks or other items linked to and arranged radially around a central key word or idea. It is used to generate, visualise, structure and classify ideas, and as an aid in study, organisation, problem solving, and decision making." [Wikipedia] Mind-maps are essentially hierarchical, with a tree-like structure, possibly with a few cross-links between branches. A Concept Map is a similar idea, with a similarly hierarchical structure, but typically using many more cross-links: "A well made concept map grows within a context frame defined by an explicit 'focus question,' while a mind map has branches radiating out from a central picture." [Wikipedia]

Mind maps and Concept maps seem like a good way to write things down, but most implementations seem very rigid: they assume a unique, well-defined starting point and centre for each map, and the hierarchical links are treated very differently from cross-links. This is sensible if you have a clear idea of your starting point, but less so e.g. if you want to explore the relationships between a set of ideas that you don't yet fully understand. I would like to be able to build a less rigid structure with many cross-links, and then have different ways of redrawing the network, for example if I select a new central node, or reposition individual nodes, or if I "zoom in" on a subset of the nodes, with the remainder either hidden or indicated by links going "out of view". It would also be nice to be able to search for nodes or links via their text labels. (You can see something similar in e.g. the "Genealogy of Influence", a visualisation of the connections between influential people in Western culture.)

I also want to be able to manipulate multiple maps. For example, I would like to be able to compare and perhaps merge two maps created by different people but representing similar concepts, or two maps that represent different but related concepts. I would also like to be able to partition maps, for example to give different parts of a map to different people to update, and then merge them back together.

This project consists of designing and implementing such a system.

REFERENCES: Mind mapping (Wikipedia)
Concept mapping (Wikipedia)
Genealogy of Influence

When Doodle just isn't enough

Difficulty
SH	CM	BM
F	F	F

Supervisor

Peter Jinks

Equipment

None

Pre-Requisites

None

Doodle is very good for arranging a simple meeting, but sometimes you have a more complex situation, like scheduling the first year project presentations. First of all, there are 40 to 50 groups, each consisting of half-a-dozen students and their tutor. Each group can probably be handled like a single Doodle meeting, initialised by scraping the School timetable pages to find times when the students are likely to be free. However, once a set of possible times has been decided for each group, then the actual events, of 2 to 4 groups meeting together, need to be put together in rooms, with a staff "chairperson" (so we also need to scrape the School room timetables to find free slots in suitable rooms).

This project consists of creating a web application that can obtain all the availability information for students, staff and rooms, support whoever is organising the presentations to put together a complete timetable and update it as availability information changes, book rooms, and notify students and staff about when and where to turn up.

Why am I over quota?

Difficulty
SH	CM	BM
F	F	F

Supervisor

Peter Jinks

Equipment

None

Pre-Requisites

None

When students (or staff) go over quota, if we know about them, we can use tools like "duplus" or "KDirStat" http://kdirstat.sourceforge.net/ to help spot problems but we often need to understand a lot to decide what to do next - for example, we might need to know what each of the "." directories (or the files and subdirectories inside them) is for, or which commands will still work when we aren't allowed to create new files. (I have 56 subdirectories and 86 files in my home directory that start with a "." and I certainly don't know what all of them are for.)

It would be nice to have a system that provided the same sort of information as programs like "duplus", but also knew about directories like ".thunderbird" and ".Trash", and that ".class" files can be deleted if the original ".java" files exist. Even better would be if the system had some sort of configuration information describing these things, that could be easily updated when necessary. This project is about creating such a system to help users decide which files and directories can safely be deleted or moved, and then help them do so.

Why won't my Regexp work?

Difficulty
SH	CM	BM
F	F	F

Supervisor

Peter Jinks

Equipment

None

Pre-Requisites

None

No matter whether you are just learning to use regular expressions to recognise patterns in text, or whether you have been using them all your life, there comes a moment when you think you have got it right, but the stupid thing just isn't working! What you need is a tool that shows you why the regular expression isn't matching the text. This tool should have similar abilities to e.g. egrep or flex, but also be able to show where and why the regular expression stopped matching the text.

There are several different situations that need addressing:

should the regular expression match all the text, or just part of the text?
if it should match all the text, is it failing partway, or is it failing to start matching at all?
if it should match part of the text, is it failing to match the right part - is it matching too much or too little?
are we using just one regular expression, like egrep, or several, like lex?
if the latter, is it that the wrong regular expression has matched the text?

This project will involve creating a regular-expression matching program, similar to, but with more functionality than, e.g. egrep, plus a GUI to help the user make best use of the program.

An Management System for Essay Plagiarism

Difficulty
SH	CM	BM
F	F	F

Supervisor

Peter Jinks

Equipment

None

Pre-Requisites

None

We are required by the University to educate all students about plagiarism, and then to check all essays, reports etc, and deal with any plagiriam detected. This requires us to use Blackboard to educate students and Turnitin to check essays.

This project is about creating a system that gathers information from Blackboard and Turnitin, assists me to decide what students need chasing and track what the outcomes are, and then generates any necessary reports for permanent School and University records.

Animating Algorithms

Difficulty
SH	CM	BM
F	F	F

Supervisor

Peter Jinks

Equipment

None

Pre-Requisites

None

Courses like COMP20010 involve students implementing various algorithms, but there are always many more algorithms mentioned in lectures than there is time to implement. It would be very useful to have tools available for all these algorithms to help students understand, compare and analyse them.

This project would consist of implementing such tools for a range of algorithms. The details still need to be decided, however: which algorithms would you choose to implement, how would you animate and instrument them to help students' understanding, what data-sets would you provide to help students compare their efficiency, and so on.

Deriving regular expressions from example texts

Difficulty
SH	CM	BM
F	F	F

Supervisor

Peter Jinks

Equipment

None

Pre-Requisites

None

Recognising text is a common problem, often best solved by writing a grammar and then using a parser-generator (like lex or yacc) to automatically create a recogniser from the grammar. However, writing the grammar to begin with can be very difficult. Perhaps this process could be simplified by a tool that interacted with the user to derive a grammar from an example piece of text.

A complete grammar for a programming language would probably have to be created in two phases. The first (this project) would be simply to recognise individual "words" such as numbers, names, operators and punctuation for use with e.g. lex. This would then be used to create a BNF linking the words into "sentences" such as expressions, statements and declarations for use with e.g. yacc.

The tool might recognise typical strings, such as numbers consisting of digit characters "0" to "9", and perhaps also using "+" and "-" and ".". The tool might then interact with the user to decide whether these strings really were complete words, or whether we are trying to recognise something more complex e.g. course-unit codes like "COMP30900".

A significant problem will be preventing the tool from over-using an incomplete grammar. For example, if it is told that "for" is the start of an for-statement, it shouldn't try to use the "for" inside "before".

Another problem is that of white space (spaces, tabs, newlines etc.). In some circumstances, such as in typical computer languages like Java and C, white space can usually be discarded, although it may separate words. In other circumstances, such as input to a database, white space may be very important as tabs may be used to separate fields and newlines used to separate records. It may be hard to recognise these fundamentally different situations without significant help from the user.

A simple example is a list of symbols such as: abcdabcdddabcabcd. The first step would be to identify boundaries between repeated groups i.e. abcd abcddd abc abcdd. The next step would be to distinguish between special cases and generalisations i.e. do we permit only 0 to 3 instances of "d", or do we permit any number of instances? Must the list of groups contain exactly 4 instances of "abcd...", or any number? Must the instances be in the order "...d ...ddd ... ...dd" or is any order possible?

REFERENCES: Programming By Example
an example of using lex

Deriving BNF from example texts

Difficulty
SH	CM	BM
F	F	F

Supervisor

Peter Jinks

Equipment

None

Pre-Requisites

None

A complete grammar for a programming language would probably have to be created in two phases. The first would be simply to recognise individual "words" such as numbers, names, operators and punctuation for use with e.g. lex. This would then be used to create a BNF (this project) linking the words into "sentences" such as expressions, statements and declarations for use with e.g. yacc. For this project, therefore, we will have to assume that the first phase has already happened, by breaking the given text up into "words" using some simple algorithm.

Perhaps the user would select an example statement, and then interact with the tool to construct a simple grammar rule for it. The tool could then attempt to match other chunks of text with this rule, and the user could agree or disagree, allowing the tool to refine the grammar rule.

A significant problem will be preventing the tool from over-using an incomplete grammar. For example, if it knows how to recognise an expression consisting of names, numbers and operators, it shouldn't assume that the names in a declaration are also expressions.

Another problem is that of dealing with a hierarchical structure, for example trying to recognise if-statements and assignments. We can assume that we have a list of simple symbols, such as: i(n<n){n=n;i(n)n=n;} ("i" stands for "if", "n" for a name, and so on). We need to identify n=...; as an assignment statement, and then more complex statement forms, such as the if-statement: i(n) followed by a statement, and the block: { } containing one or more statements, and so on.

REFERENCES: Programming By Example
an example of using lex + yacc

Another Pattern Matching Language

Difficulty
SH	CM	BM
F	F	F

Supervisor

Peter Jinks

Equipment

None

Pre-Requisites

None

Every so often, we need to transfer textual information from one format to another - for example, copy the set of dates and times for first year project presentations from Moodle into Arcade. We want to automate this (and many similar problems) but the format in Moodle is different every time. We need a system that takes the input text, and grammar-like descriptions of the input and output formats, and generates the necessary output. This should be simple to do using flex and bison, or many other systems, except that it isn't because the set of basic primitives we would want to use (e.g. "the first 3 letters of the day-name" or "the time given as the hour from 9 to 5, so 1 follows 12" or "a list of these separated by blank lines") aren't simply available, and also because systems like flex and bison only process inputs, not outputs.

This project is about building a prototype program, perhaps using flex and bison or any similar set of tools, but then burying as much as possible of the details of the implementation inside a nice user interface.

Why won't my BNF work?

Difficulty
SH	CM	BM
F	F	F

Supervisor

Peter Jinks

Equipment

None

Pre-Requisites

None

No matter whether you are just learning to use BNF grammars to recognise patterns in text, or whether you have been using them all your life, there comes a moment when you think you have got it right, but the stupid thing just isn't working! What you need is a tool that shows you why the BNF isn't matching the text. This tool should have similar abilities to e.g. bison, but also be able to show us where and why the BNF stopped matching the text.

There are several different situations that need addressing:

usually, BNF is intended to match the whole text, but maybe you are still developing the BNF so only partial matches are possible (e.g. you haven't implemented all the different kinds of statement, when recognising a programming language)
if it should match the whole text, is it failing partway, or is it failing to start matching at all?
perhaps only some of the grammar rules are working correctly, but how can we tell which ones?
if it should match part of the text, is it failing to match the right part - is it matching too much or too little?
is it that the wrong grammar rule has matched (part of) the text?

This project will involve creating a simple but flexible parser (perhaps using chart parsing - a slow but simple algorithm) plus a GUI to help the user make best use of the program.

Deriving grammars from parse trees

Difficulty
SH	CM	BM
F	F	F

Supervisor

Peter Jinks

Equipment

None

Pre-Requisites

None

Creating a grammar for a computer language can be quite hard - often, we have a good idea of the sort of program or script we want to be able to write, but much less idea how to represent that in a grammar. The idea would be to create a system that would allow the user to start from such an example text in a window and then interact with the system to draw the "parse tree", and finally derive a grammar from it.

For the first step, for example, one might tell the system that this part of the text is an expression, the next part is a declaration, the next an if statement, and so on. One could then build up (this is a list of statements) and down (this is a sub-expression; this is a number, this is a variable, this is an operator) to create the complete parse tree.

The final step would be to take these parse trees, take each node in the tree as an example use of a grammar rule, and merge and generalise these to build up a complete grammar. For example, we might have examples of expressions containing several different operators, merge them to allow any of the given operators from the example, and then generalise to include all the operators we might actually want to use. We might even look at using concepts like Unification to help automate the merging.

Other facilities might be to provide suggestions or alternatives to choose from, to help the user when creating the parse trees or when converting the parse trees to a grammar (e.g. to help deal with operator precedence).