Open Science, Research Infrastructure, Scientific Data Management
Technology Readiness Level
Multimedial objects, especially images and figures, are essential for the visualization and interpretation of research findings. The distribution and reuse of these scientific objects is significantly improved under open access conditions, for instance in Wikipedia articles, in research literature, as well as in education and knowledge dissemination, where licensing of images often represents a serious barrier.
Whereas scientific publications are retrievable through library portals or other online search services due to standardized indices there is no targeted retrieval and access to the accompanying images and figures yet. Consequently there is a great demand to develop standardized indexing methods for these multimedial open access objects in order to improve the accessibilty to this material. To address the objective it is necessary to develop a process for automatic harvesting and indexing of multimedial open access objects. Wikimedia Commons, operated by the Wikimedia Foundation, is one of the most important open access media collections and a platform and service for the harvesting and distribution thereof. Wikimedia Commons collects freely licensed objects, mainly images and figures, including their metadata and provides them to Wikipedia and all other web users.
It is the aim of this project to develop a process for the automatic harvesting and indexing of articles including their images and figures derived from quality controlled open access journals, using the infrastructure of Wikimedia Commons and Wikidata. To reach this goal it is planned to harvest scientific publications and subsequently extract and enrich the metadata of the accompanying images and figures in order to generate a standardized index.
The NOA search engine is the entry point to find the corresponding images. The search results include metadata, caption and license of the images.