Packing my bags for VoCamp Oxford

Conferences & Events, Life Sciences, Uncategorized, Vocabularies & Languages

(by Matthias Samwald)

I am packing my bags once again: The first VoCamp (hosted at Oxford University, UK) is about to start this week. So, what is a VoCamp supposed to be? The official definition reads like this: “A VoCamp is a series (hopefully) of informal events where people can spend some dedicated time creating lightweight vocabularies/ontologies for the Semantic Web/Web of Data. The emphasis of the event(s) is not on creating the perfect ontology in a particular domain, but on creating vocabs that are good enough for people to start using for publishing data on the Web.”

I always thought that the lack widely established vocabularies/ontologies has been very damaging to the developent of the Semantic Web. The VoCamp initiative could help changing this situation for the better, so I really hope that this is the start of a long series of events.

My topics of main interest are: 1) Associative Tags; 2) Agreement, Disagreement, discourse; 3) Corporate Semantic Web, 4) â€œAre upper level ontologies/vocabularies not so bad after all?â€, 5) â€œ Cleaner schemas and ontologiesâ€. These interests are motivated partly by use-cases from the â€œKiWi â€“ Knowledge in a Wikiâ€ EU project, and partly by developments in the area of biomedical research at DERI Galway and the W3C Interest Group for Health Care and Life Science. Details below.

__Associative Tags__

Tagging is one of the key components of the ‘Web 2.0’, and Semantic Web technologies will help to make tagging even more powerful. Schemas such as SCOT or MOAT have already been established, and make it possible to ‘tag’ not only with simple strings, but with entities. These entities (such as concepts described in SKOS) can be associated with clear semantics and can be further described with RDF statements, to describe hierarchies of entities, or to link entities to rich data sources such as DBpedia. This enables sophisticated data-integration and cross-data source queries that would not have been able with simple, string-based tags.

On the other hand, Semantic Web developers can learn from the simplicity that has made tagging so successful. Creating useful tags is very simple, and good user interfaces can further improve the simplicity of creating useful tag with feature such as autocompletion and tag recommendation. This simplicity should server as a role model for many Semantic Web applications.

Specifically, I am interested in what I call ‘associative tags’, bundles of tags/entities/concepts that can be used for the simple representation of facts. The primary intention of creating aTags is not the categorization of the document, but the representation of the key facts inside the document. Key facts in the biomedical domain might be, for example,

â€œProtein A interacts with protein Bâ€ (which can be represented with an aTag comprising of the three entities â€œProtein Aâ€, â€œMolecular interactionâ€ and â€œProtein Bâ€) or

â€œOverexpression of protein A in tissue B is the cause of disease Câ€ (an aTag comprising of the four entities â€œOverexpressionâ€, â€œProtein Aâ€, â€œTissue Bâ€ and â€œDisease Câ€).

Once the aTags from these different sources are aggregated, it is possible to pose a query such as â€œshow me molecules that are associated with molecules that are associated with disease Câ€, yielding â€œprotein Aâ€ as an answer. Hierachies (in the form of rdfs:subClassOf and skos:narrower) can be used to expand queries based on background knowledge (e.g., that â€œdisease Dâ€ is a subclass of â€œdisease Câ€).

In many cases (especially with some ontologies in the biomedical domain), creating such associative tags can be much simpler than the creation of ‘real’ statements, i.e., relations between individuals and property restrictions of classes.

__Agreement, Disagreement, discourse__

Many people in the Semantic Web community are interested in the representation of argumentation structures on the web. For example: stating that one snippet of text contains statements that are in disagreement with another snippet of text, which is in agreement with yet another snippet of text. This can be of use for many knowledge domains, such as news articles, biomedical publications or reports submitted to a software bug tracker. Of special interest in this context are extensions of established schemas, especially SIOC. There is also another ontology called SWAN that is specifically tailored to the biomedical domain, and efforts to align SWAN with SIOC have started recently.

__Corporate Semantic Web__

As Semantic Web technologies are finally getting mature enough to allow industrial uptake, it is becoming clear that ontologies for describing organization structures and business processes are still lacking maturity. FOAF allows us to represent basic information about persons, organizations and their relationships, but lacks vocabulary for stating that one person is the boss of another person, that a project consists of several subtasks, et cetera. While there are some small projects that try to create such schemas/ontologies, a solution of widespread acceptance does not seem to be in sight at the moment.

__Are upper level ontologies/vocabularies not so bad after all?__

FOAF seemingly tried it a long time ago â€“ foaf:Person is a subclass of, â€œhttp://xmlns.com/wordnet/1.6/Personâ€, foaf:Document â€œhttp://xmlns.com/wordnet/1.6/Documentâ€ and so on. Linking to external schemas/ontologies (or making use of their classes and properties directly) can definitly help in facilitating semantic interoperability. For a long time, many web developers were very skeptical about such ‘top-down’ approaches of data integration, but recently the recognition of the potential values of such resources seems to be increasing. In parallel, the recent 1-2 years brought us some very large upper ontologies that are available as linked data, such as:

Wordnet 2.0, hosted by the W3C
Yago/DBpedia
OpenCyc (now with new URIs)
UMBEL (derived from OpenCyc and others).

I think the practice of re-using and linking to such upper ontologies as should become popular (again). It helps in creating a highly interlinked Semantic Web, and helps to avoid re-inventing the wheel for each new schema/ontology. This linking should not be done post-hoc, but should be a central part of the early stages of vocabulary/ontology/data creation.

__Cleaner schemas and ontologies__

Working with established ontologies and schemas in ontology editors can be a chore. Most have dependencies on other ontologies, but don’t use owl:imports. Most use an awkward mix of OWL statements and RDF(S), resulting in ontologies that are OWL Full. Many require some OWL reasoning to make use of sameAs statements and inverse properties, but at the same time reasoning is complicated because the ontologies are OWL Full or even contain logical inconsistencies. Often enough, there seems to be no practical reason for the design choices that caused the trouble: some minor changes can turn a messy OWL Full ontology into an OWL lite or OWL DL ontology. At the moment, many different working groups have created local versions of schemas such as FOAF or Dublin Core that are valid OWL-DL to fix that problem.

It doesn’t have to be this way.

Trying to adhere to OWL lite/DL and adding owl:imports statements can help building cleaner, modular and more sustainable ontologies, and does not require significant additional effort during the creation of ontologies. Maybe we can find a consensus that this would be a worthwhile goal, and develop plans towards reaching that goal.

Packing my bags for VoCamp Oxford

Follow us on