Exploring a world of networked information built from free-text metadata
Most of the current interfaces to digital libraries are built on keyword-based search and list-based presentation. For users who do not have specific items to search for but would rather explore not-yet-familiar topics, it is not easy to figure out to what extend and on which aspects the returned records match the query. Users have to try different combinations of keywords to narrow down or broaden the search space in the hope of getting useful results in the end. In this talk, we will present a web interface that provides users an opportunity to interactively and visually explore the context of queries. In this interface, after entering a query, a contextual view about the query is visualised, where the most related journals, authors, subject headings, publishers, topical terms, etc. are positioned in 2D based on their relatedness to the query and among each other. By clicking any of these nodes, a new visualisation about the selected one is presented. With this click-through style, the users could get visual contexts about their selected entities (journal, author, topical terms, etc.) and shift their interests by choosing interested (types of) entities to investigate further. At any stop, a search in WorldCat.org with the currently focused entity (a topical word, a author or a journal) will return the most matched results (judged by the standard WorldCat search engine).
We implemented this interface, available at http://thoth.pica.nl/demo/relate, over WorldCat, the world largest bibliographic database. To guarantee the responsiveness of this interactive interface, we adopt a two-step approach: an off-line preparation phase with an on-line process. Off-line, we build the semantic representation of each entity where Random Projection is used to vigorously reduce dimensionality (from 6 million to 600). In the on-line interface terms from a query are compared to entities in the reduced semantic matrix where reciprocal relatedness is used to select genuine matches. The number of hits is further reduced to render a network layout easy to overview and navigate. In the end, we can investigate the relations between roughly 6 million topical terms, 5 million authors, 1 million subject headings 1000 Dewey decimal codes and 1.7 million publishers.
Shenghui Wang – Research scientist in OCLC (Leiden office)
Rob Koopman – Innovation agent in OCLC (Leiden office)