OUTLINE OF A NEW MODEL OF INFORMATION CONTENT AND STRUCTURE
October 7, 1998 Draft
by David Wojick (dwojick@shentel.net)
Introduction
This is the information age, but what is information? Several
years ago, while working on an information system design problem,
I realized that there is no scientific definition of "information
content", as that term is ordinarily used. Traditional information
theory does not provide one; because information theory does not
take into account that information has content. It accepts random
strings of letters as information.
So I set out to do what in mathematical logic is called a "rational
reconstruction". This means to develop a formal definition of
an ordinary language concept. In fact, I have found not one, but
3 definitions of information content, nested one within another.
These 3 definitions turn out to be quite powerful. They provide
the basis for a new model of information content and structure,
one that is applicable to any information design activity. In
fact, the model supports several new design methods, including
actually measuring and optimizing content in various interesting
ways.
In my vision people cruise through the structure of their information
as one would fly a starship. They visit strange and beautiful
worlds, worlds that are in a sense already real. We just can't
see them yet.
Some aspects of this model are merely conjectural at present.
A research program will be required to test and refine its many
facets. However, because the model grows out of 25 years of study
of the structure of complex situations, many of its applications
have already been proven in areas like engineering, management,
and public policy. In what follows the examples will be from my
experience in information system design.
(Historical note: The information system referred to above was
a hypermedia system to support training for the Standard Army
Automated Contracting Software. My goal was to develop a taxonomy
of hyperlink types, so I asked myself "what are all the ways that
two or more pieces of information can be related?" It was at this
point that I realized I did not know what information content
was. The model proposed here is the beginning of a precise answer
to my question.)
What is information content?
The 3 proposed definitions, or types, of information content are:
Type 1: The propositional content of expressed thought.
Type 2: The propositional content of symbolic expression or display.
Type 3: The propositional content of meaningful expression or
display.
In all three types the information content is propositional in
nature. Propositions are the fundamental objects in mathematical
logic, first postulated by Boole in 1845. At its simplest, a proposition
is the meaning of an statement independent of the language used
to express that meaning. You can say that snow is white in many
languages, so there is something --the proposition that snow is
white -- that is independent of the words you use.
While unfamiliar to most people, this concept is explained in
detail in most logic texts. In fact, many practitioners of mathematical
logic consider that discipline to be the science of propositions.
So if these definitions are accepted as capturing the ordinary
language concept of information content, then we have a solid
grounding in a well developed discipline.
The basic difference between the three types of information content
is this. Type 1 information requires an expression of thought,
so it can only be produced by thinking things using language.
This is the core definition, as in "he informed us that he was
leaving". Type 2 information includes type 1. It can be produced
by devices as well as by people, but still requires something
like language. This captures the fact that we get information
from speedometers and spreadsheets. Type 3 information allows
for non symbolic sources of content such as a video.
Relation of my model of information content to information theory
Before proceeding, it is important to explain how my model differs
from what has traditionally been called "information theory".
Simply put, information theory is about the transmission of information,
not its content. This is stated quite nicely by James Gleick in
his book CHAOS: Making a New Science (Viking, New York, 1987,
p. 255). Gleick is referring to fundamental research done at the
University of California at Santa Cruz:
"The most characteristically Santa Cruzian imprint on chaos research
involved a piece of mathematics cum philosophy known as information
theory, invented in the late 1940s by a researcher at the Bell
Telephone Laboratories, Claude Shannon. Shannon called his work
'The Mathematical Theory of Communication', but it concerned a
rather special quantity called information, and the name information
theory stuck. The theory was a product of the electronic age.
Communication lines and radio transmissions were carrying a certain
thing, and computers would soon be storing this same thing on
punch cards or magnetic cylinders, and the thing was neither knowledge
or meaning. Its basic units were not ideas or concepts or even,
necessarily, words or numbers."
"This thing could be sense or nonsense -- but the engineers and
mathematicians could measure it, transmit it, and test the transmission
for accuracy. Information proved as good a word as any, but people
had to remember that they were using a specialized value-free
term without the usual connotations of facts, learning, wisdom,
understanding, enlightenment."
Unlike the information theory described above, my model of information
content has precisely to do with information as a meaningful thing.
I would argue that something that is not meaningful has no information
content. In fact, propositions are units of meaning. That is why
all three of the proposed information types are defined so as
to be propositional in nature. This how we use the words "information"
and "information content".
(Caveat: The model has nothing per se to do with issues of the
utility of information. It does not distinguish good information
from bad, true from false, important from trivial. Using the model
should make it easier to make these distinctions, but they are
not specifically addressed. The focus is on the nature and structure
of content, not its quality.)
The power of these definitions derives from the fact that expressing
a proposition is in a way a very simple act. Indeed, I argue below
that any such act can be considered as consisting of just three
basic elements. Understanding these elements leads us to a new
understanding of information content, as well as to the discovery
of a rich world of information structures that underlies all bodies
of information.
The basic elements of information content.
From the point of view of mathematical logic, any body of expressed
propositions is made up of the following three basic elements:
1. The context of the expression.
For type 1 this is typically who said or wrote what--the actual
language used, when, where, why, etc. For type 2 devices this
will include actual displays, printouts, etc. For type 3 non symbolic
expressions it is the facts about the video, etc.
Note that according to our three definitions information is always
a tangible thing created by a specific act of expression at a
specific time and place. It may be sounds, marks on paper, a dial
reading, a video, even an action, but it is always tangible. Thus
on this view information is never an intangible something in someone's
head. The latter may be knowledge or belief, but it is not information.
We are therefore talking about something that always has a physical
aspect.
Note too that information content has to be expressed. Thus simply
seeing that a tree is in the road, an act of perception, does
not involve information.
2. The propositions expressed.
Already discussed.
3. The things referred to by these propositions.
These things are called in logic the "referents" of the propositions.
Reference is discussed in most logic textbooks. Note that referents
can be activities, properties, or anything that can be talked
about. To say that the snow is white is to refer to snow the stuff
and whiteness the attribute. To say that snow is melting is to
refer to snow the stuff and the activity of melting. Referents
do not have to be real, nor do the propositions have to be true.
Novels and lies have content.
Thus any instance of information content, call it a piece of information,
involves (1) propositions, (2) expressed in a given context, and
(3) referring to certain things. This sounds abstract, but in
any given case these elements are pretty obvious. (Or some of
them are obvious, others are not, but I will not go into that
issue here.) What is important is that different pieces of information
can be fit together according to how their elements are related.
This leads to a rich new science that I call information structures.
Information structures.
Given the three basic elements of information content, we note
the following. Any two pieces of information content can be related
to one another by features of one, or a combination, of these
basic elements. The question then becomes: what are the most important
relationships that underlie a given body of information? How do
the pieces fit together? As it turns out, some of the most important
relations are well known, while others are less so.
Moreover, there may be a number of very different relationships
that are important in a given body of information. If so then
when we try to understand that information we are in fact trying
to do several things at once, that is, to master several relationships
at once. I believe this is one of the chief obstacles to efficient
learning. We try to grasp several different relationships without
differentiating them.
I call the array of information units that are related by a given
relationship an "information structure". I also conjecture that
any body of important information includes a number of different,
yet significant, structures. Certainly this is true for every
body of information I have analyzed so far. By using the model
you can systematically identify the most important information
structures that underlie a given body of information. One can
even quantify and measure, as well as display, these structures
in various useful ways.
Let us consider briefly some of the more common and typically
important systems of relations, i.e. structures, that obtain among
the three basic elements in a body of information.
a. Context-based relations.
Alphabetical listing is a common way to relate pieces of information
content that is based on their physical form (i.e., the language
used). Chronologies of the events that produce information, such
as speeches or scientific articles, are another example. So are
logs of gauge readings, for that matter.
b. Proposition based relations.
Relations of mathematics and logic are typically proposition based.
The former includes spreadsheet information, business or engineering
calculations, computer software functional designs, etc. (Of course
the referents of these manipulations are also going to be related
in important ways, as discussed below). Logic relations include
implication and contradiction, both of which are important in
computer science, problem solving, the law, etc. Some of these
relations are well known.
Less well known is the system of propositional relations displayed
by the issue tree diagram I developed at Carnegie Mellon in the
1970's. I now conjecture that this diagram displays the fundamental
relationship between the propositions expressed in most bodies
of information. If our definitions are correct the issue tree
is the fundamental structure of information content.
c. Referent-based relations.
Identical reference (i.e., being about the same thing) is a common
relation among some of the pieces of content in a body or system
of information. However, it is seldom the case that all of the
pieces of information content in a given body of information are
related by identical reference.
Rather, the propositions typically refer to various members of
one or more systems of related referents. This is because things
can be related to one another in so many different and important
ways. Great care is necessary to distinguish the different kinds
of relations between the referents in a body of information, because
they define different information structures. Moreover, in many
cases we are ignorant of the important ways that things are related.
That's what science is about after all.
d. Hybrid relations.
There are some important relations among pieces of information
that are hybrids of the above. For example, we sometimes express
propositions that refer to other, previously expressed, propositions
or groups of propositions. In logic the former are called meta
level expressions. Likewise, poetry seems often to depend on relations
that combine the properties of a physical language with propositional
properties.
The case of information system design.
Our model of information content and structure has major implications
for the design of information systems. For example, because of
a failure to distinguish between the various important structures
that relate information content, many information systems reflect
a jumble of partial structures.
At the other extreme are systems designed around a single structure,
say a telephone book. This design tends to minimize the availability
of the other, often equally important, structures to the user.
Kinship or geographical relations of the listed parties, for example.
These structures are often available only through laborious analysis.
Information system as I use it here is a very broad term, encompassing
such diverse creations as data bases, financial systems, management
and executive information systems, office automation systems,
interactive courseware, command and control systems, etc. Even
books and magazines. While many system design issues are different
for these different sorts of systems, two fundamental issues are
always present:
Design issue#1. What information is to be included in the system?
Design issue#2. How should this information be organized?
To vastly oversimplify the matter, using our model these two issues
really come down to the question of which structures to incorporate
(issue #2), and how much of each (issue #1)? But what would traditionally
be the second design issue is in fact now the first. Thus we seem
to be designing our information systems in a backward fashion.
Moreover, once the significant structures are selected, the question
of what information to include becomes largely one of level of
detail and unit cost.
Then there is the issue of display of information. I envision
the visual navigation of important information structures in virtual
reality. Indeed I have already started experimenting with such
navigation, using simple 3D navigation software. In my vision
people cruise through the structure of their information as one
would fly a starship. They visit strange and beautiful worlds,
worlds that are in a sense already real.
We just can't see them yet.
David Wojick