Explore
Dataset
Download
Embeddings
Query
Interface
Download
Dataset

Welcome To LD Connect,
Our Linked Data Portal

Note: This project is in BETA, if you have any feedback, please don't hesitate to get back to us.

IOS Press is pleased to welcome you to the BETA release of LD Connect, its linked data portal. This portal contains linked metadata of all IOS Press books and journals.

We invite you to explore the data through our data browser, download our entire dataset or just subsets, download our word embeddings trained on the full text of all IOS Press publications or use the SPARQL endpoint search box on this platform.

Would you like to see our data connected to yours or use our dataset to power one of your applications? We are very open to collaborations and love to hear about your project. Please email us or fill in the form below and we will get back to you at our earliest convenience.

What Is Linked Data?

Linked data is a method of publishing structured data on the Web in a human and machine readable way, thereby breaking apart data silos and fostering the interlinking of data.

It builds upon standardized Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read and understood automatically by computers.

Tim Berners-Lee coined the term in 2006 and described the following principles:

  1. Use URIs as names for things.
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL).
  4. Include links to other URIs so that they can discover more things.

Semantic Web technologies and ontologies, i.e., formal vocabularies, provide the means to query that data, draw inferences using vocabularies, and combine data across different sources. To make the Web of Data a reality, it is important to have the data available in a standard format, reachable and manageable by Semantic Web tools as well as to link to other data. This collection of interrelated datasets on the Web can also be referred to as linked data.

For example, metadata about research articles can be published as linked data together with data about authors, their affiliations, and so forth. Links between these data, enable queries such as for all authors across different journals that work on Alzheimer's disease. Next, establishing links between different datasets makes it possible to integrate these bibliographic data with a wide variety of other data such as from demographic datasets, e.g., to establish a relationship between the affiliation of a research team, the research topic, and the region studied in their work.

Why Linked Data?

As a science publishing house operating in an era of digital transformation, we felt it was imperative for us to apply best practices to all aspects of our workflow. The potential of linked data is not lost on us. By offering our datasets in machine-readable form to third parties and semantic tools, we hope to contribute in a meaningful way to scientific progress.

Dr. Einar Fredriksson, founder and director of IOS Press

The un-siloing of data leads to improved retrieval, accessibility, reusability, and interoperability. Structured data can be searched, shared, reused, data mined and linked to other data sources. Contextual relations between authors, institutions and research areas can be made visible. Downstream applications such as abstracting and indexing databases can use the data portal to ensure their own datastores are always up to date with latest research published by IOS Press. Furthermore, authors who publish their work with IOS Press can do so under the assurance that their work is disseminated through both human and and machine accessible channels and following web-friendly standards.

The portal currently contains millions of triples, i.e., individual statements, and maps connections between metadata of journal articles, book chapters, authors, affiliations, keywords and other biographic metadata to provide a complete ecosystem of the IOS Press scholarly relationships.

More statistics on the current number of data can be found below. New data is continuously added and new data points will also be added to further enrich the portal. Tools that visualize the data for human consumption as well as tools for knowledge mining include a visual linked data browser, a SPARQL query endpoint, and a visual query interface. Further tools, e.g., for semantic search, are in development.

The linked data portal was developed in collaboration with STKO Lab in Santa Barbara, CA, USA.

Information About Our Data

We spent a considerable amount of time cleaning up our data and constructing a conversion pipeline to transform all IOS Press article and book metadata to RDF-based linked data.

We use the Bibo Ontology and web standards extensively while describing our data, in order to make our data even more discoverable, accessible, linkable and interoperable with other datasets. The affiliations are geocoded and authors as well as affiliations are disambiguated using our co-reference resolution script. With the help of machine learning techniques, the data conversion pipeline keeps on improving the more data are added. Co-reference resolution was developed in collaboration with the DaSe Lab at Wright State University.

Our datasets contain a.o. metadata of journal articles, volumes, issues, book chapters, published dates, ISSNs, DOIs, authors, affiliations, keywords, pages and abstracts.

The most recent release of the dataset is January 2018, the next release is scheduled for April 2018.

The compressed dataset is around 45MB, unpacked around 470MB.

Pre-trained Doc2Vec Models

The two files linked below contain pre-trained Doc2Vec models of all English journal articles and book chapters published by IOS Press over the years and are based on their full-text content, not just abstracts. In total, the dataset used was made up of >120000 papers, all of them are also matched to entities in the IOS Knowledge Graph. The corresponding word embedding model has a vocabulary size of 95888. The embedding dimension of both of these two models is 200. The Doc2Vec model is trained using the Python gensim@3.3.0 library.

  1. "IOS-Doc2Vec.zip" can be loaded directly into the gensim library.
  2. "IOS-Doc2Vec-TXT.zip" contains the Doc2Vec model and its corresponding Word2Vec model as plain-text files (“doc2vec.txt”, “w2v.txt”). The “doc2vec_voc.txt” contains a list of all the paper entity URLs of the Doc2Vec model. The “w2v_voc.txt” contains a list the word vocabulary of the corresponding word2Vec model. This version can, therefore, be used to work that required a direct integration with the IOS knowledge graph.

Word2Vec model as plain-text files (“doc2vec.txt”, “w2v.txt”). The “doc2vec_voc.txt” contains a list of all the paper entity URLs of the Doc2Vec model. The “w2v_voc.txt” contains a list the word vocabulary of the corresponding word2Vec model. This version can, therefore, be used for work that requires a direct integration with the IOS knowledge graph.

For questions, please contact - Krzysztof Janowicz at janowicz-at-ucsb.edu.

The compressed Doc2Vec is around 230MB, unpacked around 270MB. The compressed Doc2Vec-TXT is around 180MB, unpacked around 410MB.

IOS Knowledge Graph Embedding

The IOS Knowledge Graph (KG) Embedding files are trained on the IOS Knowledge Graph by using the TransE algorithm. The algorithm utilizes each triple with object properties to training an embedding model for each entity and each predicate in the KG. As for a triple <s, p, o>, TransE learns k-dimensional embeddings for the entity s, o as well as relation p to make s + p aproximately zero.

Note TransE_ent.txt and TransE_relation.txt follows the word embedding format defined by python’s gensim package.

For questions, please contact - Krzysztof Janowicz at janowicz-at-ucsb.edu.

The compressed KG Embedding is around 162MB, unpacked around 470MB.

Unleashing The Potential

Providing machine-readable, interlinked metadata that is publicly available opens up a wide range of opportunities.

On the one hand we offer our linked data to the public, so it can enrich third party datasets, further unsilo research data, and incentivize new discoveries.

On the other hand we are currently working on services and tools built on top of our linked data. Our tools can be used to:

The possibilities are endless and we are only at the start of it. Connect your dataset to other datasets out there and new potential is unleashed, time and again.

Would you like to see our data connected to yours or use our dataset to power one of your applications? We are very open to collaborations and love to hear about your project. Please email us or fill in the form below and we will get back to you at our earliest convenience.

LD Connect currently contains

>120000

Articles & Chapters

>250000

Authors

>6100000

Triples

Feedback form

Please let us know what you think, so we can further improve our linked data platform!


Stay informed about LD Connect