The Image Data Resource: a scalable resource for FAIR biological imaging data

Abstract number
73
Presentation Form
Poster
DOI
10.22443/rms.elmi2021.73
Corresponding Email
[email protected]
Session
Poster Session 2
Authors
Frances Wong (1), Sebastien Besson (1), Jean-Marie Burel (1), Dominik Lindner (1), Josh Moore (1), Will Moore (1), Petr Walczysko (1), Ugis Sarkans (2), Alvis Brazma (2), Jason Swedlow (1)
Affiliations
1. Division of Computational Biology, University of Dundee, Dundee, United Kingdom
2. EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, United Kingdom
Abstract text

Access to primary research data is fundamental for the advancement of science. Much of the published research in the life sciences is based on image datasets that sample 3D space, time, and the spectral characteristics of detected signal to provide quantitative measures of cell, tissue and organismal processes and structures. However, the sheer size and heterogeneity of original image data sets– multi-dimensional image stacks combined with experimental metadata and analytic results– makes image data handling and publication extremely complex and, in practice, rarely achieved.

 

To address this challenge, we have built a next-generation imaging database, the Image Data Resource (IDR; http://idr.openmicroscopy.org). IDR is an added-value resource that combines and integrates data from multiple independent imaging experiments and from many different imaging modalities, into a single public resource. IDR supports browsing, search, visualisation and computational processing within and across datasets acquired from a wide variety of imaging domains. IDR stores, publishes and integrates >260 TB of super-resolution, high content screening, timelapse and histological whole slide imaging data with metadata related to experimental design, image acquisition, downstream analysis and interpretation. Data from >85 studies are available for search and query through a user-friendly web interface, with links from imaging data to reagents, methods and phenotypes via published ontologies. Cloud-based re-analysis of IDR data is enabled using JupyterHub. Reference image data submitted to IDR is also published in EMBL-EBI’s BioImage Archive, assuring sustainability and long-term data availability.

 

We will show recent updates to the IDR including separation between domains of image data, new cloud-optimised file formats for reading large datasets, and the appearance of a new national level for independent federated IDRs.