Notion System and Topic Maps

Ronald Poell <rapoell@notionsystem.com>
Revised $Date: 2001/06/24 12:00:00$

Introduction Basics Perception Managing knowledge Core & Extensions History Network Topic Maps Data, Information and Knowledge - WMDR About the samples Samples XML-Samples

Introduction

If you are not familiar with Topic Maps you should read some basic documentation about it first before reading this article. You can find some references at the end of this article. If Notion System is new, you should consider reading the other articles on this site and have a close look at the sample data. In June 2000 I discovered the existence of Topic Maps. The first impression was correct and Topic Maps are so close to the concepts behind Notion System that we could call them almost identical. Notion System and Topic Maps follow the same ideas on information (or knowledge) representation. Below I'll describe the correspondences and differences between these concepts. As my knowledge about Topic Maps is quite young, and surely not complete, I would appreciate if you address me your comments on this article. My thumb tells me that this story is not yet ended, so keep watching out.

Parallel history?

Topic Maps can be considered as a natural evolution of HyTime which itself was initiated in 1991 by Steve Newcomb. In 1996 the so-called "Topic Navigation Maps" were accepted as a work item in ISO's SGML workshop. In this period Fred Dalrymple, Michel Biezunski and Steve Newcomb worked out the basic ideas around Topic Maps (for a short history description you can have a look at the article by Steve Pepper, cited below). Finally in 1998 Topic Maps were presented for ISO certification. I worked out the basic concepts of Notion System in the period 1988-1991. The first version was released in 1991. In 1992 I introduced the distributed data parts and in 1993-94 the system had a major evolution as the implementation of contextual information (meta-data) for relations was changed. In the past six years the core of Notion System hasn't basically changed. The system was expanded with new functionality but no changes were made on de concepts behind it. The only thing I still wanted to do is to add meta-data (other then languages) to the notion names and not at least rebuild it in Java to make it platform independent.

Comparison

In table below you will see the correspondence of basic terms from Notion System and Topic Maps. This gives a good idea about how close these concepts are.

Topic Maps	Notion System
topic	notion
topic type	notion class
topic occurrence	notion
occurrence role	relation
topic association	knowledge
association type	relation
topic name	notion name
name type	doesn't exist
scope	relation meta-data and notion meta-data
theme	runtime context (not in the core)
public subject	notion
facet and facet value	relation meta-data and notion meta-data
doesn't exist?	relation auxiliary data
topic without a name	doesn't exist

Some conceptual differences

The primary goal of a Topic Map is to represent the contents of a document. Topics and associated elements help to achieve the goal. In Notion System the notions are the primary goals. What do you know about "Topic Maps" might be a good question to ask the system. Or you may ask what knowledge is available for a particular document. Most of the differences between Notion System and Topic Maps are more or less directly related to this aspect.

Identification

In Topic Maps the public subject is necessary for identifying clearly a particular topic. Within Notion System this difference doesn't exist: each notion stands for the concept it represents and has its own ID. How clear a concept (notion or topic) can be communicated from the system to the user depends on how much is known for a particular one. For example a notion for which the only thing we know is its name is obviously not clearly defined and can't be communicated without ambiguity. The more you know about a notion the better the system can communicate the concept behind it to the user (human or program). This is also true for Topic Maps. The difference between Notion System and Topic Maps in this point is inherent to the way Topic Maps are created in comparison to how the knowledge in Notion System is created.

One or several ?

Topic Maps can be defined (in a simplified way) as an index for documents, which presents much more power than the traditional indexing mechanisms. The separation into two layers (the map and the occurrence) is a direct consequence of this. Within Notion System the notions and their relations form together one semantic network. There is no distinction between things like topics and occurrences. The consequence of this is that there can't be two layers in Notion System. The implication of this difference is quite big. Different Topic Maps can exist besides each other without any relation between them. At the moment they should be merged, identification between topics must be realized in order to present a coherent picture to the user. Notion System on the other hand has mechanisms for identifying notions at the moment new information is added. This needs of course universal identifiers and a network connection between different instances of the applications. In case a standalone Notion System connects at a later moment to the network of interrelated Notion Systems the same identifying technique is used to find potential identical notions (identical concepts). A merging mechanism allows two distinct notions, which appear to represent the same concept, to be fused (the same happens when two Topic maps are merged). You might see it in the following way: all the Notion Systems connected contain the same information (knowledge) as all the merged Topic Maps. Each Notion System is responsible for a particular subset of notions. An "active" notion exists only once in the network. (The term "active" indicates the instance of the notion that should be used. Other instances can be backup(s) or clones.) When a user looks at a notion he can have a "complete" picture of what that notion is about. In different Topic Maps, when they are not merged, can exist the same topics not related to a particular public subject although they should be. So as long as a particular topic is not merged with all its other occurrences in other maps, it will give only a far from perfect image of itself to the user. As stated above in both cases the picture is never entirely complete. It will always be as complete as the information given and as good as the quality of the information. Of course information can be filtered for a particular context if necessary in both cases. In my experience it is better not to filter things out while we are creating knowledge. You never know what references are necessary to decide whether your handling the good concept (whether it is a human or something like an agent). On the other hand, while using the knowledge/information filtering en clustering is essential. A small example might help here.

The expert agent

One of the agents in Notion System is like a small expert system specialized in analyzing web-page content. Let's associate it with a male robot. He has some rules at his disposition, which are notions too. These are grouped in sets for convenience. He is not yet very clever because he hasn't many rules and in particular he is not very good in semantic and syntactic analyses. But he has the whole knowledge base with "facts" available. While looking at the different text elements (words, phrases) he will notice that some of them correspond to one or more notions in the knowledge base. Based on pure text correspondence the system will give him a probability factor. But in the case of e.g. a position where he expects an author, he tells the system to look in particular for notions of the super type "person" (man, woman, boy or girl).There will be a better match and some of them will have a higher probability. A man will have a higher probability then a boy and a person will have a higher probability then a car (though the car might be the good one because he shouldn't have asked for an author there, remember he is not very good in semantic and syntactic). He will then ask the system for what is known about the candidate concepts of that particular page. This is the place where it is important that there is as much information as possible available. He will compare the relations and the so-called target notions with the other text elements in the page. This comparison will enforce in some cases the probability for a particular notion. Eventually a particular candidate will reach the green flag (its probability is high enough, his owner told him where the limit is) and the candidate is no longer a candidate and has become a fact. A fact means here that the agent will create a relation. In the case of an author he will create the relation "has been written by/has written". In Topic Maps terms there will be an occurrence of a particular topic within the document with an occurrence type of "has been written by/has written". What when there are no candidates? This means that the subject doesn't exist in the knowledge base or that the language of the text is not well represented. He then might decide e.g. to have a look at other pages in same site. If there are two identical documents in different languages he might discover a new name (in a different language) for a particular notion. Perhaps he can create a new concept by using virtual notions and relations. This last step is quite similar to "classic" indexing. Human intervention is almost always appreciated for the final decision. Two particular aspects come floating on top of this. If our little agent looks at a page at some time T0 and again some later at T1, he might not give the same conclusions if in the mean time the knowledge base has more "facts". And all the conclusions he can make never reach "certainty". The probabilities he gets back from the system depend on the contextual factor "how sure" a particular information is. And after a long period where the knowledge base and these kind of agents are left alone there will no longer be new knowledge added because none of the available facts will be "good enough". Human intervention is surely a way to get around this, but other techniques might be good also. Compared to Topic Maps there are two things possible in Notion System and not in Topic Maps (as far as I can see for the moment). The first one has already been indicated. The more information the agent can get the better the conclusions might be. Topics in different topic maps, which are not merged and the topics identified, only increases the number of candidates. The second one is a little more hidden and I'm not sure this isn't possible with Topic Maps. As our agent is short of candidates and goes looking for other documents to compare with, he will do, in terms of Topic Maps, a virtual merging between the topic maps of several documents before the topics have been defined. I think that it might depend on the way they are implemented whether this is possible or not.

Names

Notion System only knows language meta-data for names. Besides that, all the names are equivalent. Topic Maps go much further in this point. Names are typed: base name, display name and sort name. In Notion System only the "display name" exists and corresponds to "best name" which is a runtime property depending on the user's context. The language information is in Topic Maps part of the scope of a topic. Another point in which the two concepts differ is the fact that in Topic Maps topics might not have a name. In Notion System this is not allowed. "A simple cross reference, such as see page 97, is considered to be a link to a topic that has no explicit name" [Pepper, 1999]. In Notion System relations with no target notion and only auxiliary data store this kind of information. Each relation can take this form. E.g. the relation <person>[is born]<geographic area>(date and time) might also be <person>[is born](date and time) (<person> and <geographic area> are notions, [is born] is a relation and (date and time) are auxiliary data). The missing notion is simply missing not an abstract notion without a name.

Relations versus association types

Both concepts define relations/associations as bi-directional. In Notion System the only relations that are not bi-directional are the ones which don't have a target notion. More precisely I should say a particular instance of a relation, because potentially every relation might have a missing target notion. Topic Maps distinguishes associations which are symmetrical or not and those which are transitive or not. In Notion System it is of no importance whether a relation is symmetrical or not. The concept or the core of Notion System does not define the transitiveness of a relation. It is the user or agent that knows whether he should use a particular relation as a transitive one or not. Look at the following example of notion-relation chain: <person>[is born]<geographic area>[is located in]<geographic area>[is located on]<planet>[is a planet of]<star system> An agent who has to determine in which star system a person is born will use 3 different relations all in transitive way. If the same agent has to give the address of that same person he will use only one and will stop at a notion which has a type of town or country (depending on the context of the address).

Scope, themes and facets

With the scopes from Topic Maps things get a bit more complicated. In Notion System we don't need a scope to make the difference between two distinct notions with the same name. Using the example Steve used [Pepper, 1999] all the Granada's mentioned in the example would clearly identify themselves by showing their relations in Notion System. The user (human or agent) must have some contextual information in order to be able to choose one of them. This contextual information could be distillated in Notion System from something like the following chain <Steve Pepper>[has written]<SGML Buyer's Guide>[handles]<XML products>[uses]<XML>[is handled in] <XML Europe `99 conference>[was held in]<Granada> There are though meta-data in Notion System (comparable to scopes in Topic Maps) that might invalidate a notion or a relation in a particular context. If your Knowledge-date and time, that is the period you want the existing knowledge for, is before my birthday you will never see me. The system acts as if I don't exist. So again it is the user's context that defines which information is valid or not (context matching). The same context can also provide a filtering of knowledge (relevance). The same thing happens for relations: four years ago my household was, among others, composed of four cats, today there are no cats anymore. All the sample pages are generated with a Knowledge-date and time of the moment of generation (real-time) so you will not see these cats. In Topic Maps the same thing is done by the scope of a topic characteristics. A theme from Topic Maps corresponds most closely to one of the user's context elements (the glasses through which the knowledge is looked upon = the scope of a particular Topic Map). But themes are topics and are used to specify the scope of topics. So on one side the scope is associated to the information (the scope of a topic, association etc.) and on the other side it represents a user's context (a particular view on a map). The facets of Topic Maps provide a filtering possibility for the information resources. As the information resources are notions in Notion System, facets are quite comparable to meta-data. In some cases I think they will be represented by relations. In Notion System there is a clear difference between meta-data belonging to a notion or a relation and the context of the user which causes invalidation and filtering. In Topic Maps, a map seems to provide a specific view on the available information. It might be that my perception on Topic Maps here is not quite correct and induced by artifacts of the articles I read.

Short bibliography

Here are some reference articles on topic maps (they are linked to the corresponding notions):

Steve Pepper, Euler, Topic Maps, and Revolution

Michel Biezunski, Topic Maps at a glance

ISO/IEC FCD 13250:1999 - Topic Maps

Introduction Basics Perception Managing knowledge Core & Extensions History Network Topic Maps Data, Information and Knowledge - WMDR About the samples Samples XML-Samples