As data analysis is common place in life sciences, we need to establish scalable ways to develop and share analysis workflows and train researchers to make use of them. The latter entails an end-to-end approach from access to data over selection and proper usage of the appropriate workflow and deploying this on available (cloud) resources.
Recent progress in sequencing technologies has produced several large scale data sets for crops. The insights gained by this data have been published in high profile scientific articles, but the underlying raw genotype data and the associated sample and population metadata have not been routinely submitted to appropriate archives. The aim of this implementation study is to provide this wealth of data according to FAIR principles ensuring an interoperable link with the phenotypic data that is stored in distributed institutional repositories which is crucial for accelerated crop breeding.
This Strategic Implementation study around Container Orchestration aims to coordinate the ELIXIR Platforms (Compute & Tools) expertise within the Nodes, related projects and resources to establish ELIXIR-wide standards, protocols and processes for the orchestration of containerised applications provided by ELIXIR Communities.
Software containers are a key element in the frame of Open Science & Open Source which is strongly supported and advocated by ELIXIR. Software containers guarantee data provenance when described as part of scientific workflows and are an important element towards reproducibility. This study is divided into three work-packages that complement each other:
Bioschemas leverages Schema.org, a widely implemented community effort supported by the main search engines to provide a way to add semantic markup to webpages. By enriching webpages with Bioschemas annotation, independently published content can be harvested and used by other resources without the need for APIs. As such, Bioschemas has the potential to boost Open and FAIR science.
The structural Bioinformatics Community (3D-BioInfo) has the mission to better integrate protein structure-based data and tools across Europe, and to improve standardisation through better ontologies and agreed benchmarking. The ties with the structural biology research communities in Europe will be strengthened and dedicated training and outreach efforts will be taken. Four major topics form the basis of this study:
WP1: Infrastructure for FAIR structural and functional annotations
Following previous work by the ELIXIR Proteomics Community, PRIDE public proteomics datasets, as well as some open proteomics data analysis pipelines are starting to become available.This follow-on ELIXIR implementation study will use previous results as the base to develop a set of open and user-friendly analysis pipelines, which will be applied to assess the possibilities for performing more automated re-analyses using the metadata SDRF-encoded annotations of public datasets. Additionally, common ideas in this context and in others overlapping topics of interest will be explored e.g.
The increasingly well-documented role of intrinsic disorder in protein behavior and function requires infrastructure improvements to enable enhanced researcher access to related tools and data. A key existing infrastructure is MobiDB, which provides sequence-based predictions for the entire set of UniProtKB proteins from a number of different prediction tools.
The ELIXIR Tools Platform Ecosystem, initiated by the ELIXIR Tools Platform, is a diverse and open initiative focused on the metadata exchange across registries and repositories. Its goal is to facilitate coordination among them, enhance interoperability, reduce mismatching information, promote good practices for resources enabling community curation and contribute towards the sustainability of each of its components over time. Hence, this project seeks to sustain and support this “Ecosystem” through three complimentary work packages:
This project focuses on the enhancement of Galaxy's data management features to provide additional provenance information and improve the integration of Galaxy in the existing data management ecosystem. Existing technologies and services in ELIXIR will be supported and ongoing international projects (ELIXIR-CONVERGE, the COVID-19 Data Portal, EOSC-Life, etc.) will be complemented while building on national initiatives (German NFDI, ELIXIR Belgium strategy, UK BioFAIR, etc.).