What's Wrong with Linked Data?
Earlier this year, we posted a special call for Linked Dataset descriptions, to be published in the Semantic Web journal. This kind of call and paper type is a novelty in the Semantic Web community. We did this to provide another outlet for research enabling work (as opposed to research work as such), because Linked Data is currently one of the drivers of the Semantic Web effort. However creators and curators of datasets can rarely get acknowledgement for their contributions by publishing in high-profile conference proceedings or journals.
We expected a very good response to this call, and indeed we received 27 submissions. Consequently, we have now made Linked Dataset descriptions a standing paper type for the Semantic Web journal, which means that these types of papers can now be submitted at any time to the journal. In addition to the submissions, we also received very encouraging communication in response to our new paper type, some researchers even reported that “the call already prompts some people to improve their datasets” (we are in no position to verify this, though).
Following our own policy for the journal, the calls include a very crisp formulation of the review criteria which are to be applied to papers of this type. We strongly recommend to reviewers to directly reference these criteria in their reviews. For Linked Dataset papers, the criteria are as follows.
- Quality of the dataset
- Usefulness (or potential usefulness) of the dataset
- Clarity and completeness of the descriptions
When we set these criteria, we thought that they should be easy to meet. The papers are supposed to be short (recommendation is 6 pages), and the third of the criteria is really only about doing a good job when writing the paper. Assuming that the publishers of a Linked Dataset were doing a good job, we thought that there should be no “quality of the dataset” problem. Assuming that people would not go through the trouble of publishing a dataset without it being useful (or at least potentially useful), we also thought that the “usefulness” condition should be easy to meet.
However, we were in for a bit of a surprise. At this point in time, we have completed the first-round reviews for 26 of the 27 papers, and half of them had to be rejected due to issues with dataset quality or usefulness. For 5 of the datasets, the reviewers even indicated that “they are in fact not linked data.”
Clearly, our sample is not necessarily representative for all of Linked Data. For some of the most prominent datasets we have not received papers (most likely because they are already published elsewhere). However, it may not be unreasonable to take our findings as an indication that often Linked Datasets may have substantial issues with quality and usefulness.
We will know more after the second round of reviews. And we’re looking forward to receiving more submissions of Linked Dataset descriptions.
We would be more than happy if the state of Linked Data would turn out to be better than our limited sample indicates and hope that our call and the paper type will contribute to the effort of improving the quality and especially applicability of Linked Data. We are optimistic, that with more experience, best practice, and application focus, Linked Data will become more than just more data.
Pascal Hitzler, http://www.pascal-hitzler.de/
Krzysztof Janowicz, http://geog.ucsb.edu/~jano/
Editors-in-Chief, Semantic Web journal, http://www.semantic-web-journal.net/