Medical Terminologies: Integrating into the whole system

Using Medical Terminologies

The preceeding discussion has focussed mostly on the search for medical terminologies that can work as 'unnatural' languages, intended for recording detail about individual patients. We've looked at how the intrinsic technical properties of existing schemes - particularly those optimised for statistical aggregation - mean that such schemes are not really fit for this purpose.

But medical terminologies don't exist in isolation. As part of a 'total terminology solution' they must interface with a range of other quite different objects in a computer system. A terminology that is very good - very complete and correct - within itself can still fail if these interfaces are poor.

This section gives an overview of the problems encountered at a selection of such interfaces. What is presented here is, perhaps, at the cutting edge of thinking about terminology in medical systems. The important point to take away is that terminology has become a software device and, like all components, it must interface with the other devices in the finished system. Terminology, in the context of an electronic patient record, is part of the software. It is no longer a book.

Terminology and the data model

Terminology, the data model and expressivity

When a medical terminology becomes part of an electronic record, an additional layer of complexity is introduced that provides considerable scope for confusion: the relationship between the schema of the terminology on the one hand, and the information models that underpin the EPR database or EPR architecture on the other.

The fact that there is a relationship is often overlooked. The nature of the relationship, and the problems that arise from not taking it into account, might be best illustrated by the following reductio ad absurdum argument (with thanks to Kent Spackman):

First, imagine a terminology that was completely expressive, both theoretically and in actual use by naive untrained users. Everything that any user ever wanted to say, however detailed and complex, could be constructed within the terminology and then assigned a single unique identifier.There would be gazillions of different identifiers.

In this world, however, the EPR database itself would need only one field - perhaps called 'clinical detail' - in the table for the clinical information. A single identifier for a huge, complex and detailed concept describing the patient would go in this field.

Now imagine a terminology that is extremely unexpressive. In fact, it consists of exactly two terms:

0001 'present'
0002 'absent'

With such a small terminology, the EPR database might attempt to compensate for the lack of expressivity by having zillions of fields in its database table - for example, one field for each of 20,000 different diagnoses you might want to record. Each consultation event would then result in a new row in the table, most of the fields (columns) of which would be empty except one or two that would contain the code for either 'present' or 'absent'. By this method, a small terminology and a huge data model, you could record on one row of the database table that a patient had tonsillitis but did not have a chest infection.

These two extremes demonstrate how the content of the terminology can affect the content of the data model - how the choice of where to put expressivity can be arbitrary. The medical record architecture, and the EPR messaging architecture, are similarly affected. There is obvious scope for chaos and confusion, not least if both the terminology and the data model overlap such that a number of concepts appear in both.

In reality, however, both extremes presented here are undesirable. The second example has obvious drawbacks - not least the enormous size that the database would grow to.

The first example, in which there really is a code for everything, might appear to be what everybody wants. Closer inspection shows that it is not. Its all a question of encapsulation.

Terminology, the data model and encapsulation

The ability to collapse any given concept - especially a compositional one - into a single canonical form, and to be certain that when two concepts collapse to the same canonical form then this means they are conceptually identical, is very useful. You need to be able to do this in order to merge or analyse data coming from different users or sites.

But, just because you can collapse everything into one code when you come to merge data doesn't mean that collapsed codes are necessarily the best thing to actually store locally. Most local sites would prefer to split the data they store across a few fields in their database, so that they can make important local database queries run more efficiently. For example, a cancer hospital might prefer not to store the single code:

B4700 'Malignant neoplasm of ectopic testis'

...and would instead store the clinical details across three database fields:

FIELD_NAME	VALUE
[!pathology]	neoplasm
[!site]	testis (ectopic)
[!histology]	malignant

Storing the data in this less encapsultaed form makes it much quicker to retrieve, for example, the set of patients with malignant disease.

This example shows that local users will want different levels of 'encapsulation' - the extent to which the clinical detail is fragmented across a number of data containers - and for perfectly good reasons. The job of the terminology is both to support and allow those different choices. It must allow data that is encapsulated locally in one way to be moved to another site where the encapsulation is different. This might be achieved by exchanging the detail in fully canonised, completely encapsulated (not fragmented) form and then fragmenting it again when it arrives at the destination. And, of course, the options for fragmentation at any one site are to some extent determined (or, at least, expressed) in terms of the local database information model - another example of how terminology and information models are intertwined: in order to un-encapsulate a completely encapsulated concept received from somewhere (so that it can be stored locally), it is necessary to know what sort of un-encapsulation the local data model supports.

Terminology and other knowledge in the system

Decision Support Criteria

One of the main reasons that we continuously seek to record more clinical detail, and in more consistent and controlled ways, is in order to allow computers to analyse the data we record. Perhaps the most complex form of analysis that we aspire to is decision support, in which the computer navigates its way through a forest of IF..THEN rules in order to reach some recommendation about what to do next.

In many cases, the things being tested for in the 'IF..' part of these rules (also known as the criterion) are whether or not the patient has particular conditions or symptoms or is taking certain kinds of drug. Operationally this means that, when you actually run the decision support program, there must be a way of translating each criterion into a patient record database query for the host system.

For example, the decision support criterion:

If patient taking 'author_term:antianginal' then...

..might become a host system EPR database query of the form:

SELECT * from Patient WHERE [Patient.Medication] IS 'X2001:Propranolol' OR 'X2002:Atenolol' OR etc

The important point in this example is that the person who wrote the decision support rule criterion has used a local term (author_term:antianginal) that was not taken from the terminology used separately to record the EPR itself. This typically happens because the obvious abstractions that the criterion author would like to use - like 'antianginal drug' - often don't actually exist in the terminologies used for EPR capture. This in turn happens because the EPR terminology is optimised for EPR capture (not surprisingly) or for statistical aggregation, but not for writing decision support rules.

The problem is this: if the criterion author develops their own local terminology at author time, this local terminology must still be mapped to the EPR terminology at run time. And, if that author wants to be able to run the rules on lots of different host systems, each of which may be using a different EPR terminology (or the same one, but at different levels of encapsulation) then this mapping problem soon becomes more than a trivial exercise for the interested reader.

Interface Terminologies

Recently people have begun to talk of the idea of an 'interface' terminology as being separate from aggregation terminologies, and also that these two might possibly be linked via a third entity: a reference terminology.

An Interface terminology might be thought of as a set of linked concepts where the links are optimised primarily to help drive data capture interfaces. Links to help later analysis of any data are more or less excluded from interface terminologies.

The more advanced interface terminologies have adopted a different kind of navigational paradigm from the usual 'navigate through a hierarchy' term finding approach that we all know and love. Instead terms are linked such that, once you've chosen the first term that you want to record, the terminology links point you at other terms that - because of the task you are doing - you are likely to also want to record.

So, for example, an interface terminology might take a term directly from the reference or aggregation terminology, such as:

R062. Cough

...and link it to other terms in the same terminology, like:

137.. 'Tobacco consumption'
173.. 'Breathlessness'
R061. Stridor
R063. Haemoptysis
R064. Abnormal sputum
R065. Chest pain
23D.. 'O/E - adventitious sounds'

A user interface data entry program may then be driven by these links in the interface terminology. Once the user had picked 'cough', the program would offer a reduced picking list made up of the other seven options listed above as being things the user might very likely want to record next.

The problem with authoring this kind of interface terminology is similar to the problem encountered by the decision support criterion author (above): the reference terminology simply doesn't contain the abstract terms that the author needs in order to write the links most efficiently. This similarity isn't surprising since an interface terminology can also be thought of as a rule base for a decision support program - a data entry expert system that predicts what you're going to want to say next - where the criterion are of the form:

IF user selects 'author_term:cough' THEN display 'Stridor' AND 'Haemoptysis' etc

But in the example given, the interface terminology author would probably more likely want to say that the data entry display should prompt with 'breathlessness' if the term originally entered was any kind of respiratory symptom or disease. But abstractions like 'respiratory disease' don't always reliably exist in aggregation or reference terminologies. For example, we've already seen that ICD10 doesn't actually include a single concept that equates to the notion of 'Respiratory Disease'.

So, again, the interface terminology author is stuck with the task of maintaining mappings from a set of local concepts and abstractions to those in one or more external terminologies.