Taxonomy-enhanced Document Retrieval with Dense Representations
Authors: Victor Mireles, Artem Revenko, Ioanna Lytra, Anna Breit, Julia Klezl
Presenter: Artem Revenko
Time: 2.5.2023 11:30am
Document retrieval is a task that powers several downstream applications such as search and question answering. One way to approach this task is to take embeddings of the documents to be retrieved, and of the query, and use a similarity function to rank results. In this work, we extend this approach by incorporating knowledge about entities mentioned in either the document or the query, in the form of taxonomic relations and canonical labels of said entities. The method, when applied to a domain-specific corpus, improves retrieval recall over a state of the art method trained on a general domain corpus. It does so without requiring any further retraining of the machine learning models involved, thus making it applicable for use cases where training is not feasible because of data or infrastructure limitations.
Exploratory Analysis of the Applicability of Formalised Knowledge to Personal Experience Narration
Authors: Victor Mireles, Stephanie Billib, Artem Revenko, Stefan Jänicke, Frank Uiterwaal, Pavel Pecina
Presenter: Victor Mireles
Time: 3.5.2023 12:00pm
Some of the victims of Nazi prosecution have consigned their personal experiences in the form of diaries of their internment in concentration camps. Such human-centric texts may contrast with the organisation of knowledge about such events that, for example, historians and archivists make. In this work, we analyse six such narrations with the use of Entity Extraction and Named Entity Recognition techniques, present the results of the corresponding exploration, and discuss the suitability of such tools on this corpus. We show that knowledge tools, that have been successfully used to organise documents, can be lacking when describing personal accounts, and we suggest ways to alleviate this.