Our main research contribution is a novel mathematical model that incorporates the semantics of similarity links in the entity graph into the ranking mechanism in a scalable way. A part of this work is presented in the 4-page poster paper Zhiltsov, N., Agichtein, E. Improving Entity Search over Linked Data by Modeling Latent Semantics accepted for CIKM 2013. The remaining part will be published (hopefully!) as a full paper early next year.
From the engineering perspective, this project gave birth to a bunch of open source software spin-offs that might be interesting for researchers/developers from related communities (i.e., machine learning, information retrieval, Semantic Web):
- Ext-RESCAL is a memory efficient and scalable implementation of the sparse tensor factorization algorithm RESCAL
- Anduin is for processing RDF/N-Quads data on Apache Hadoop
- Lucene-MLM adds support of mixture of language models to Apache Lucene 4.0
- a large data set of labeled entity search queries and entities from Yahoo! SemSearch Challenge.