Learn more
- Jun 26, 2008
Combining Closed and Open Data Classification Mechanisms in an Extended Thesaurus

On the other end of the spectrum, within open data classification mechanisms, we have social tagging. Tagging (in general) means that a user asigns labels to content items. The advantage here is that content is immediately classified; as such, tagging is an easy way to provide metadata for content, in particular as the user does not to have think about (arbitrary, system-dictated) structures. However, this leads to problems if singulars and plurals are used simultaneously, if synonyms are used, spelling mistakes occur etc etc. With tags, the exact same spelling has to be used if items are to be assigned to the same group. But if done collectively (and that is what social tagging is about), the wisdom of crowds can improve the signal to noise ratio significantly – see the miracle of the tag cloud.
What Rolf proposed in his thesis was to combine the two approaches. In his design, he used an extended thesaurus as an instrument to achieve vocabulary control – we’re looking at an extended thesaurus here, because it’s not simply built around a taxonomy, but expanded by tags that were assigned by users and integrated using a vocabulary management tool.
This extended thesaurus can be applied in multiple ways. During a tag event, for instance, the user can be assisted by questions like “Did you mean…” if a term is ambiguous:
Search can be improved, too: If a user makes a search query, related terms can be suggested, drawing on the thesaurus. E.g., the term ‘jaguar’ would call up similar terms, allowing the user to specify the query and clarify that he (or she) is looking for a predatory animal (i.e. not the car).
In the long term, using an extended thesaurus as a light-weight ontology can reduce the amount of work needed to maintain a vocabulary. What’s special in Rolf’s proposal is that the controlled vocabulary also contains the terminology of the community. The user is thus able to navigate within the communal information space and, as a result, problems with homonyms, synonyms and different languages would be reduced.
A paper in which Rolf and two of his colleagues explain this approach in more detail is currently being prepared for publication: Güntner, G., Sint, R., Westenthaler, R. (2008): “Ein Ansatz zur Unterstützung traditioneller Klassifikation durch Social Tagging”. Tagungsband des ExpertInnenworkshops “Social Tagging in der Wissensorganisation – Perspektiven und Potenziale”, 2008 (im Druck). Further details about the publication can be obtained from Rolf.



