CS5031 Exercises 2: Lexical Analysis for C

1. Draw a box around each of the lexemes in the following ANSI C program. Ignore any characters that would be discarded and so are not part of any lexeme. (When you get bored with one part of the program, skip on to the next part!)

 #include <stdio.h>
int main (void)
{
  printf ("hello world\n");
  return 0;

  cauliflower; "cauliflower"; /*cauliflower*/
  /* if I do this then do you get it wrong? */
  "if I do this then do you get it wrong?";

  {static float i_1234 = 1 + 2.0 * 3 / .4 - 'i';}

  if (I_do_this)
  /*then*/ do_you(get+it-wrong)?or:not;
  else this(is_1.load,of->old*whatsit);

  does<this>>iffy*=expression%cause|problems&&elsewhere;

  "a string"; not-a-string; "another string";

  /* these are not valid operators */
  a#b; a@b;
  /* nor are these - recognise the individual operators they consist of */
  a:=b; a<>b; a**b; a~~b; a^^b;

/* if you can do all the above, try the following: */

	012345; 12345; 0x12345abc;
	012345LU; 12345l; 0x12345abcu;

	123.; 123.123;
	123.e-1; 123.123E99; 123e+23;
	123.e-1f; 123.123E99l; 123e+23F; 123L;

	'a'; '\n'; '\''; '\"'; '\\'; '\0'; '\01'; '\012'; '\xAB';

	/* a comment */ not-a-comment; /* another comment */

	/* a nasty * / *
	/* multi-line comment */

	"a nasty \" string";

	#define a(nasty)\
	multi-line command

/* if you can do all the above, try bigdata.c */

}

2. What other lexemes can occur in ANSI C programs? Classify the lexemes that can occur in ANSI C programs (e.g. identifiers, keywords, strings, numbers etc.) and write lex patterns (regular expressions) to recognise them and to discard characters that should be ignored.

Use any available ANSI C documentation. If you don't have anything with you, "Standard C - A Reference" by Plauger and Brodie is at URL http://www.cs.man.ac.uk/applhax/C/standard_c/index.html [local]

Why does ANSI C only include unsigned numbers in its syntax, rather than signed numbers?