Visual Dominance and The World-Wide Web

Alistair D N Edwards¹ and Robert D Stevens²

¹Department of Computer Science
University of York
York
England
YO1 5DD

alistair@minster.york.ac.uk

²Department of Computer Science
University of Manchester
Oxford Road
Manchester
England
M13 9PL

rstevens@cs.man.ac.uk

Abstract

In principle the World-Wide Web (or simply 'the Web') should afford access to blind people; the pages are composed in HTML which makes the structure of the document explicit. Browsers can be built which exploit that structure and render the information in a suitable non-visual format. However, in practice, this is not what has happened. There are a number of reasons for this which can loosely be characterized as `visual dominance'. In other words individual documents are designed in such a way as to enhance their visual presentation (while exacerbating non-visual accessibility) and HTML extensions are being developed which assume a visual presentation. This paper will discuss these trends, why they are occurring and whether there are ways of finding compromise between the (apparently) conflicting requirements of visual and non-visual presentation.

Access to the World-Wide Web

While the Internet has existed (under various names) for over twenty years, it is under the guise of the World-Wide Web that it has reached a level of significance in many people's lives, in work, education and leisure. Yet the World-Wide Web (or simply `Web') is really just an interface to the network. There is essentially nothing that the user can do via the Web that was not possible on the network before its advent; the underlying protocols and facilities always existed.

Nevertheless, the advent of the Web has led to the creation of vast amounts of information in a machine-readable form. This has become increasingly useful as a source of information. For those who have access, it is a convenient and powerful source of information. If the trends continue the Web may become invaluable. That is to say that the Web may become the default source of information in some areas. In that situation to not have access becomes more than an inconvenience, it amounts to a handicap.

Electronic forms of information have the advantage over traditional representations of flexibility. They can be communicated and transformed in a variety of ways. This is the major reason why for the most part the advent of digital information technology has been a positive boon for many people with disabilities. Compared to traditional forms of representing and communicating information, there has been a much greater ability to adapt to the needs of individuals. This has been most evident in the case of people who are blind. Whereas the traditional format of printed text is all but inaccessible to blind people, if the same information is held in an electronic form then it an be translated with relative ease into the non-visual form of synthetic speech (Edwards, 1991) or braille (Weber, 1995).

Print has become to an extent accessible via the technique of optical character recognition (OCR). Using a scanner, print can be read into a computer and then OCR software scans the image and translates it into a computer code representation which can then be rendered in speech or braille. In this way some blind people get access to print books. The level of access is hardly equivalent to that of a sighted person, though. One of the problems is that the text once scanned is unstructured. There is nothing in the stream of scanned data that identifies what was a section heading in the original text, for instance.

The degree to which the translation from a visual to a non-visual form can be achieved depends to some extent on the degree to which the information is structured in a non-visual form. It has to be conceded that some artefacts are inherently visual, so that there is no entirely adequate non-visual representation of a painting, for instance (though there have been attempts to approximate them, using speech while there is an argument that a piece of music is the appropriate alternative). At the other end of the spectrum there are forms of representation which come closer to being `pure information' which might equally be rendered in almost any medium.

The Web has shown the potential to be part of a structure-based world that would solve many of the problems of information provision in non-visual forms. In the visual world there is a natural inclination to make things look good. As vision is the dominant sense this seems like a reasonable thing to do. However, to achieve a good visual presentation, more information has to appear within the format of the information. thus we lose the separation of structure and presentation and lose the predictability of information structure. This is all the consequence of visual dominance - or even domination.

HTML

Central to the Web is the format of its pages. That format is usually based on the Hypertext Markup language, HTML. This is related to the Standard Generalized Markup Language (SGML, ISO, 1993). As its name implies, SGML is a very general language. It can be used to represent a wide variety of information. In fact SGML is a meta-language. That is to say that it is a language that can be used to define other languages. Each `language' is represented by a Document Type Description (or DTD) and as long as you can write a DTD to describe the information you want to represent, you can use SGML. So, while SGML is intended primarily as a means of representing textual information, it can be used for a much wider range.

SGML is one of a number of markup languages. The important difference between it and others is that it is standardized; an ISO standard exists describing it. Unlike proprietary formats, it can be implemented on a variety of platforms and programs. It should also be guaranteed a long existence, so that to create an SGML document now is not a gamble with regards to the question of whether one will still be able to access and read it in ten or twenty years. What the term `markup' implies is that elements of a document are labelled (or tagged) as to their role within the document. For instance, a first-level heading might be surrounded by the tags <H1> and </H1>, thus:

<H1>Visual Dominance and The World-Wide Web</H1>

The bracketing characters `<' and `>' are used to distinguish the tags, which are not part of the text, but signals to the software used to handle the text.

HTML is based on SGML, but in its early incarnations the resemblance was only a passing one. It used tags of a similar format, with `<' and `>' brackets and the convention of pairs of tags to mark the start and end of an entity, the labels differing only by the slash character, `/'. Subsequently, however, SGML and HTML have become more closely integrated and the current trend is towards browsers for SGML-based documents.

The syntax of HTML is very much looser than SGML. The only limitation on most HTML tags is that they should be balanced by a closing tag (<H1> and </H1> for instance), and even errors of unbalance are tolerated by most browsers. SGML is used to define rather more strict grammars. Any particular document has to conform to the grammar for that document in the DTD. Thus, for instance, a section cannot be preceded by a subsection. All pages have the structure header, body, foot and a body is made of headings and paragraphs and so on. If one knows the DTD, then one is assured that the document conforms to that structure. The SGML world, being structure-based and having strict DTDs, is a predictable one in which structure is separated from presentation. This separation allows alternative forms of presentation to be based on the same information source.

HTML gives the creator of a document the ability to describe that document's structure. For instance in the example above, a first-level heading is identified. This would be integrated with other headings at the same and subordinate levels to outline the sectional structure of the document. At the same time, by designating a part of the text as a first-level heading, the creator of the document is saying nothing about how that heading should appear when the document is displayed. It depends on the displaying software how the type of heading will be signified and distinguished it from other headings and other parts of the document (if at all). The display software might centre the text and use a large, bold typeface, or it might use centring and underlining. Different software might use different renderings. The point is that the creator of the document can make no assumptions about the rendering; the creator's task is just to define the structure.

One of the significant developments associated with the Web is the degree of platform independence. Whereas other software has suffered from incompatibility between versions for different hardware - or even the unavailability of software for one platform - Web browsers have been developed for the complete range of systems. This is thanks to the standarization which underlies the Web; the formats and protocols are the same on all platforms. The emphasis on tagging structure is part of this. Taking the example of formatting headings, a browser for a character-based display might use all capital letters instead of bold face type.

This emphasis on structure and not on rendering is a very good one as far as the designer of accessible texts is concerned. Unlike the OCR scanned text, information about the structure of the document is available. This means that tools, or browsers, can be developed which allow the user to navigate around the document and extract the required information in a controlled manner (Bookmanager , Full, 1991). There is additional information encoded in the structure of a document. For example, the text in a level 1 heading carries different meaning from the same words in the middle of a sentence. When available in a structural description, this information can also be communicated in a non-visual form.

Visual dominance

Visual dominance is a well-known psychological observation (Mayes, 1992) in which it refers to the phenomenon whereby if a person receives conflicting information on different senses it will usually be the visual signal that is heeded. More generally in everyday life we can see that sighted people attach greater value to the visual properties of objects. For instance, even in these days of so-called multi-media computers, it is the visual display which is furthest developed; sound output of computers is still relatively crude. The fact of a visually dominated world is evident from the language of day-to-day conversation; `I'll see you tomorrow', said the television presenter.

A picture may be worth a thousand words. It is only right that documents should include illustrations which make the most of the information capacity of diagrams. Yet some print texts use pictures for purely decorative purposes. The danger with Web page design is that such an essentially decorative picture may also embody information with is otherwise inaccessible.

HTML allows good facilities for the integration of graphical information in Web pages. Page designers from the outset have made use of these facilities, but as the Web has become increasingly popular greater have been the efforts to make the pages visually pleasing. It was suggested above that the author of an HTML document can and should make no assumptions about the visual rendering of (say) a level-2 heading. However, if one is to draw attractive graphic designs on a screen then there are assumptions that must be made about that screen. Graphic designers are accustomed to designing for pieces of paper of standard sizes and cannot cope with computer screens of arbitrary size containing windows that can be resized at the whim of the user. The designer cannot know that what looks good on one screen will look as good on any other.

At first the constraints of HTML forced designers to employ tricks to achieve the designs they desired. Tables provide a greater degree of control over formatting that free text. So it is that designers sometime put non-tabular information into tables. An example is the Times newspaper, the layout of the Web version of which mimics the printed paper - but only through use of tables.

Unfortunately the trend is towards allowing the designer increased control over the appearance of pages, at the cost of accessibility. Browser designers have invented their own extensions to HTML (warnings on pages such as `This page can only be view using Netscape 3.0 or above' ought to sound alarm bells). And frequently those extensions are visually orientated.

A clear example of this trend is the use of typefaces. Early versions of HTML allowed the user to specify `semantic' tags. For instance, the designer could mark a section of text as requiring emphasis (<EM> </EM>). This is commonly mapped into italics. However, as suggested previously, no assumption should be made about the way that the tagging will be rendered. Again, on a basic terminal an alternative rendering such as underlining might be used instead. This does represent a loss of control for the designer. So it is that more recent versions of HTML allow the page designer to explicitly specify the style of letters (i.e. italic or bold).

Indeed, it is possible to specify the typeface to be used. In this way the page designer can have some reassurance as to the appearance of the page - but not complete certainty. Platform independence has been lost in that a particular typeface can be used only if that font is available on the host machine. If it is not, then an unpredictable alternative will be used. Such a prescriptive approach to page design also affects accessibility.

Another visual feature is the image map. These allow a picture to be displayed but for the picture to be active in that the user can click on it. Depending where the user clicks, different actions occur. In other words, image maps can be used instead of links, but whereas links at least consist of text that can be interpreted, the image map is simply a bitmap, the meaning of which is only apparent to the sighted viewer. Here is a classic example where the visual presentation may be more aesthetically pleasing without adding functionality. This is not to suggest that image maps should not be used, but where they are there must be a standard textual alternative; it should never be the case that an image map contains information essential to page access which is not available through other routes. For instance, whereas an image map of a geographical map is a good visual interface to information on different regions of the country displayed, a list of the names of the regions should also be available as textual links.

There is no doubt that good visual design facilitates access to documents and improves comprehension, but good design means more than aesthetics. It is interesting to note that while many sets of guidelines to good Web page design exist many of them do not attempt to give any assistance on good use of graphics and images.

Tables and frames

One of the components of HTML pages which non-visual browsers handle badly is tables. These have long been a problem for blind people. Braille gives a good representation of the two-dimensional structure of tables, but it is nevertheless difficult to extract the required information from a braille table. For those who access documents through speech (either synthetic speech rendering of machine-readable documents or human readers creating tape recordings) representation of and access to tables is difficult. However, this is not an insurmountable problem. Bufton (1991) developed some simple ideas for speech-based access to tables which could be incorporated into speech-based browsers.

Frames represent a rather more difficult problem. The essence of using frames is to achieve greater parallelism. That is to say that the user can have two (or more) sets of information on their screen at a time and quickly move between them. Typically one frame is a map (or table of contents) of the other. The former frame remains static while the user moves through the contents of the other one. Such concurrently available information and quick switching is precisely the sort of interaction which is possible with the speed and accuracy of vision but is difficult to achieve in braille or sound [1].

Accessibility

There are two components of Web accessibility. The first is the design of the page to be accessed. Pages may be designed in such a way as to facilitate access through non-visual channels. For instance the Web Page Accessibility Self-Evaluation Test and the Speech-Friendly Ribbon Award present guidelines to good accessible page design. A problem with guidelines is that they are unenforceable. (See Edwards, 1997 for a discussion of the question of enforceability). While some accessibility requirements incur no `cost' (adding an alt caption to an image does not detract from that image, for instance) others may interfere with the visual layout and presentation and hence be rejected or ignored by page designers. For instance, there is a direct conflict between guidelines on visual Web design in that guidelines often encourage the designer to concentrate on the narrative and to incorporate hypertext links into the text and to avoid devices such as

Click here for more information

On the other hand, the Speech-Friendly Guidelines state that `Links which are embedded in paragraphs are placed one to a line and clearly labelled', an apparent conflict.

One `compromise' is to maintain parallel sets of pages. That is to say that a graphically oriented set of pages is available but then so is a text-only parallel set containing the same information. The obvious disadvantage of this approach is the work incurred in maintaining two sets of pages and maintaining their consistency. This need not be the case if the pages are truly designed in a structurally-orientated way with minimal visual assumptions. In that case it would be possible to have tools which would automatically present the information in different ways. For example Zajicek and Powell (1997) have developed a browser, WebChat, which extracts information from a page which it presents through a set of speech-based menus. The user could choose between a display that has been optimized for visual presentation or one which is designed for a non-visual interface.

The second component of accessibility is the browser. There are two approaches to non-visual browsers. The first is to use a standard visual browser adapted by a screen reader. The second approach is to use a browsing program developed from the start for non-visual (speech) access. Webspeak is such a browser.

The advantages of the screen reader approach is that the blind person can use a standard browser. It has all the features of the browser and will be the same one as used by colleagues. It will be the same screen reader as they use with other applications and so the user will be very familiar with it. The disadvantage is that the screen reader may not be well matched with the browser. This may lead to software incompatibilities but also the user interface may be less than optimal. For instance, the only indication of a link in a text may be that it is displayed in a different colour or typeface. While screen readers can be configured to be aware of such changes, their signalling of them to the user is unlikely to be in a very useful manner. Other features may be even more awkward. Some screen readers will not recognize the presence of a graphical item in the browser, for instance. The essential problem with a screen-reader approach is that the structural information has been lost again, just as it is when an OCR scanner reads from the print of a book.

A specialized speech-based browser will overcome many of these problems. The disadvantage is that the browser will perform only that one function; the user will still need a screen reader for accessing other software. However, the browser has access to the raw HTML and hence to the structural information it encodes.

Conclusions

The advent of the personal computer has been a boon for many blind people. Suitable adaptations such as screen readers made them accessible and so opened new opportunities for them. This was true as long as computer interfaces were text-based, but the advent of the graphical user interface meant that blind people were in danger of becoming disenfranchised. It is only comparatively recently that screen readers have started to become available which make this new generation of interfaces accessible. (Edwards, 1996, gives more details of the story of this development). Unless steps are taken now, the Web is likely to be follow a similar track - except that its graphical interface may prove even more difficult to curb.

The development and use of visually oriented HTML features should be discouraged. Where these do exist they should be used only with alternatives. Web page designers must be encouraged to develop their pages based on the structure of the information to be presented and to be aware of the problems of visual extensions. This can be promulgated through design guidelines and training.

With well designed, structurally-oriented pages, tools can be provided to present the information in a variety of formats, particularly non-visual renderings. Ideally such tools should be integrated with existing visual browsers, rather than being separate stand-alone programs.

If moves to integrate HTML with SGML more closely lead to a stricter language with a natural emphasis on structure over presentatio, then so much the better.

If these objectives can be realized then the Web can become a vital resource for blind people (as for others); if they are not then blind people may be cut off from it.

References

Bufton, S. (1991). Reading text tables for blind people, Department of Computer Science, University of York, Final-year Project Report

Edwards, A. D. N. (1991). Speech Synthesis: Technology for disabled people. London: Paul Chapman.

Edwards, A. D. N. (1996). The rise of the graphical user interface. Library Hi Tech 14(1): pp. 46-50.

Edwards, A. D. N. (1997). Legislation and access to the World-Wide Web. in Web Accessibility '97, (Santa Clara, California),

Full, A. (1991). Text Access for Blind People, Department of Computer Science, University of York, Final-year Project Report

ISO (1993). Information and Documentation - Electronic Manuscript Preparation and Markup, International Standards Organisation, No. ISO12083.

Mayes, T. (1992). The `M' word: Multimedia interfaces and their role in interactive learning systems. in A. D. N. Edwards and S. Holland (ed.) Multimedia Interface Design in Education. Berlin: Springer-Verlag. pp. 1-22.

Weber, G. (1995). Reading and pointing - New interaction methods for braille displays. in A. D. N. Edwards (ed.) Extra-ordinary Human-Computer Interaction: Interfaces for Users with Disabilities. New York: Cambridge University Press. pp. 183-200.

Zajicek, M. and Powell, C. (1997). Enabling visually impaired people to use the Internet. in Computers in the Service of Mankind: Helping the Disabled, (London), IEE. Digest number 97/117 pp. 11/1-11/3.

[1] It is often suggested that sound is a strictly serial medium. This is not true. There is massive parallelism in the music of an orchestra, for instance. Nevertheless it is true that it is rather more difficult to switch attention between multiple auditory sources presented in parallel than it is to move ones eyes between parallel visual displays.