As data analysis is common place in life sciences, we need to establish scalable ways to develop and share analysis workflows and train researchers to make use of them. The latter entails an end-to-end approach from access to data over selection and proper usage of the appropriate workflow and deploying this on available (cloud) resources.
Recent progress in sequencing technologies has produced several large scale data sets for crops. The insights gained by this data have been published in high profile scientific articles, but the underlying raw genotype data and the associated sample and population metadata have not been routinely submitted to appropriate archives. The aim of this implementation study is to provide this wealth of data according to FAIR principles ensuring an interoperable link with the phenotypic data that is stored in distributed institutional repositories which is crucial for accelerated crop breeding.
This Strategic Implementation study around Container Orchestration aims to coordinate the ELIXIR Platforms (Compute & Tools) expertise within the Nodes, related projects and resources to establish ELIXIR-wide standards, protocols and processes for the orchestration of containerised applications provided by ELIXIR Communities.
Software containers are a key element in the frame of Open Science & Open Source which is strongly supported and advocated by ELIXIR. Software containers guarantee data provenance when described as part of scientific workflows and are an important element towards reproducibility. This study is divided into three work-packages that complement each other:
Bioschemas leverages Schema.org, a widely implemented community effort supported by the main search engines to provide a way to add semantic markup to webpages. By enriching webpages with Bioschemas annotation, independently published content can be harvested and used by other resources without the need for APIs. As such, Bioschemas has the potential to boost Open and FAIR science.
ELIXIR-CONVERGE is a project funded by the European Commission to help standardise life science data management across Europe. To achieve this standardisation, the project will develop a data management toolkit for life scientists. The toolkit will help ensure more research data is in the public domain, which will give scientists access to more data. This will allow them to discover new insights into the challenges facing society, such as food security and health in old age, and help stimulate innovation in biomedicine and biotechnology.
The structural Bioinformatics Community (3D-BioInfo) has the mission to better integrate protein structure-based data and tools across Europe, and to improve standardisation through better ontologies and agreed benchmarking. The ties with the structural biology research communities in Europe will be strengthened and dedicated training and outreach efforts will be taken. Four major topics form the basis of this study:
WP1: Infrastructure for FAIR structural and functional annotations
Following previous work by the ELIXIR Proteomics Community, PRIDE public proteomics datasets, as well as some open proteomics data analysis pipelines are starting to become available.This follow-on ELIXIR implementation study will use previous results as the base to develop a set of open and user-friendly analysis pipelines, which will be applied to assess the possibilities for performing more automated re-analyses using the metadata SDRF-encoded annotations of public datasets. Additionally, common ideas in this context and in others overlapping topics of interest will be explored e.g.
The increasingly well-documented role of intrinsic disorder in protein behavior and function requires infrastructure improvements to enable enhanced researcher access to related tools and data. A key existing infrastructure is MobiDB, which provides sequence-based predictions for the entire set of UniProtKB proteins from a number of different prediction tools.
The ELIXIR Tools Platform Ecosystem, initiated by the ELIXIR Tools Platform, is a diverse and open initiative focused on the metadata exchange across registries and repositories. Its goal is to facilitate coordination among them, enhance interoperability, reduce mismatching information, promote good practices for resources enabling community curation and contribute towards the sustainability of each of its components over time. Hence, this project seeks to sustain and support this “Ecosystem” through three complimentary work packages: