ELIXIR Implementation Study

Learning paths

With the growing number of life-science training resources currently available through TeSS, it can often be a struggle to choose which may be the most relevant to learning needs or an appropriate match to existing skills and competency level.

Guidance is therefore needed to help users to identify which competencies they need to acquire, which training resources will allow them to acquire such competencies and which learning path they should follow in order to move from their existing competency level to a higher one.

Therefore, the aims of this implementation study are to: 1) identify competency frameworks that are relevant to the ELIXIR user community, in order to derive a set of core competencies and a curriculum for bioinformatics and data science, 2) map ELIXIR training resources to such competencies and expose this information through TeSS, and 3) build learning paths for selected use cases, and make them available in TeSS, to help guide users in moving from one competency level to another by following such learning paths.

The study is a joint effort of ELIXIR UK (Gabriella Rustici, Terri Attwood, Niall Beard), ELIXIR-NL (Celia Van Gelder), ELIXIR France (Victoria Dominguez del Angel), ELIXIR Luxemburg (Roland Krause), ELIXIR Estonia (Hedi Peterson), ELIXIR Switzerland (Patricia Palagi), EMBL-EBI (Cath Brooksbank, Sarah Morgan), ELIXIR Italy (Allegra Via), ELIXIR Finland (Eija Korpelainen), ELIXIR Belgium (Alexander Botzki).

The study will also interact with ELIXIR Industry (develop learning path(s) that specifically address the needs of industry users), global players in bioinformatics (BioExcel, CORBEL, BD2K, H3ABionet, GOBLET) and data-science (CODATA-RDA, EDISON, Software/Data Carpentry) training, and the Bioschemas project (improve the metadata that are exposed through the Bioschemas specifications for training materials).

1/6/2018 - 31/5/2019

Beacon and Beacon Network as a Service

This study builds on the Beacon-2016 and Beacon-2017 ELIXIR implementation studies. Current Beacon-2018 study aims to evolve the ELIXIR Beacons into a GA4GH Driver project with full alignment to GA4GH Technical Work Streams and consolidate and establish a fully costed process of “lighting a Beacon” for any ELIXIR node. Moreover, this study has the ambition to deliver Beacon/Beacon Network as an established ELIXIR infrastructure service. In addition to the goals above, improvements to the Beacon APIs and security will also be a core focus within this implementation study. A further priority for 2018 is to increase strategic partnering with national data owners to enable data flow to the Beacon service.

By end 2019, a Beacon Reference Implementation with an updated v2.0 that supports enhanced metadata response types and query interfaces will be available. The project will also develop and deploy the Beacon Network to address more (‘real world’) use cases. This will ensure that the Beacon, Registry and Network APIs evolve and expand to address clinical and clinical research user needs.

Ten ELIXIR nodes are involved in this study: Spain (Jordi Rambla), Finland (Ilkka Lappalainen, Tommi Nyrönen), Switzerland (Michael Baudis), EMBL-EBI (Dylan Spalding), Sweden (Niclas Jareborg), Belgium (Yves Moreau), France (Macha Nikolski), the Netherlands (Morris Swertz), Italy (David Horner), ELIXIR Hub (Serena Scollen, Susheel Varma).  

1/1/2018 - 31/12/2018

ELIXIR integration from a User perspective

The goal of this study is to provide the life-scientist a powerful tool to find and use ELIXIR resources - across the spectrum - based on intuitive graphical diagrams of the most prevalent scientific workflows.  

Currently, a scientist can use TeSS to find training events and materials and then, in a separate search, use bio.tools to find relevant tools, and BioSharing to find standards and databases. Linking TeSS and bio.tools to ELIXIR’s computer resources via common workflow diagrams would enable end-users to discover and learn about the prevalent bioinformatics workflows. This study will link TeSS and bio.tools via most prevalent bioinformatics workflows and will lay the foundation to incorporate other ELIXIR platforms in a later stage, thereby providing an even more useful service for the researcher.

Eight ELIXIR nodes are involved in this study: Belgium (Frederik Coppens, lead), UK (Terri Attwood), Estland (Hedi Peterson), Denmark (Jon Ison), Swiss (Heinz Stockinger), EMBL-EBI (Sarah Morgan), Norway (Matus Kalas), France (Hervé Menager).

1/10/2017 - 30/6/2018

Development of Architecture for Software Containers at ELIXIR and its use by EXCELERATE use-case communities

The aim of this study is to provide a stable infrastructure for unifying software containers solutions within ELIXIR. This infrastructure will provide an access point for end-users to find, generate, store, monitor and even benchmark software containers solutions.

Software containers are a key technology which enables as well the rapid deployment of software resources including workflows across a variety of systems e.g. HPC, cloud environments and local computers, as the connection with existing database repositories. This technology will also be used to support ELIXIR training activities allowing trainers to focus on the content rather than on the technical framework of the training.

The study is conceived along four lines of development, and a validation on two selected ELIXIR use cases: 1) Development of Bioinformatics Containers Central Service (BCCS) to support the use of software containers in ELIXIR, 2) BioContainers integration with bio.tools and OpenEBench, 3) BioContainers Registry 2.0 and Command Line tool, 4) BioContainers for training and support, 5) Demonstration of the use of software containers in selected ELIXIR use cases (EGA integration and POC implementation of a human genomics variant calling pipeline using BioContainers and Galaxy).

The study is a joint effort of 3 ELIXIR Platforms and 7 ELIXIR nodes: EMBL-EBI (Yasset Perez Riverol, Steven Newhouse), Germany (Björn Grüning), Spain (Salvador Capella, Josep Gelpi, Sergi Beltran, Jordi Rambla), Belgium (Frederik Coppens), France (Francois Moreews, Olivier Collin, Victoria Dominguez), Denmark (John Ison), Italy (Rita Casadio, Giuseppe Profiti).

1/1/2018 - 31/12/2018

Extending open proteomics data analysis pipelines in the cloud: Additional tools and focus on scalability, supporting the dramatic growth of public proteomics data

This study builds on the ELIXIR implementation study started in February 2017 as a collaboration between EMBL-EBI and ELIXIR Germany. The initial study aimed at developing open, robust, scalable and reproducible proteomics data analysis workflows based on OpenMS, directly connected to the PRIDE database and deploying these pipelines in the EMBL-EBI "Embassy Cloud" as a proof of concept.

Current study, involving significantly more partners across Europe, has three objectives: (i) the inclusion of additional open tools developed by other ELIXIR nodes; (ii) the improvement of the overall infrastructure supporting the implementation of proteomics data analysis pipelines; and (iii) the inclusion of quality control pipelines. The overarching goal is that these tools can be deployed in other cloud infrastructures, and can be easily reused by anyone in the community, thus bringing the users closer to the tools, and the tools closer to the data.

It is a joint effort of ELIXIR Belgium (Lennart Martens, lead), ELIXIR Germany (Oliver Kohlbacher), ELIXIR France (David Bouyssié), ELIXIR Spain (Fernando Corrales, Eduard Sabido,) and EMBL-EBI (Juan Antonio Vizcaíno, Steven Newhouse).

1/8/2018 - 31/7/2019

Crowd-sourcing the annotation of public proteomics datasets to improve data reusability

This study aims to improve the re-usability of public proteomics datasets. This is achieved by substantially increasing the amount and quality of technical and biological annotations for datasets stored in the PRIDE database.

The PRIDE database is the world-leading repository for mass spectrometry proteomics data and is one of ELIXIR’s core data resources. PRIDE received 2.443 dataset submissions in 2017. The large amounts of data downloaded from PRIDE in 2017, 295Tbs, illustrates the growing reuse and reanalysis of this data. However, proteomics data reuse is currently still constrained by the limited technical and biological annotations that are available for PRIDE datasets. A user-friendly mechanism is required for users to improve the quality of the annotation of the datasets.

The current implementation study has four objectives. First, the ELIXIR nodes will develop an a-posteriori annotation system for PRIDE, for technical and biological metadata, which will leverage the unique synergies of already existing tools and pipelines developed by different ELIXIR nodes. Second, they will create data structures that can capture the most-frequently used experimental designs in proteomics studies. Third, an appropriate API will be built to allow annotation tools to be developed easily. Fourth, they will reach out to actively involve the whole proteomics community in the annotation process. Taken together, this should dramatically improve the reusability of public proteomics datasets.

Ten ELIXIR nodes are involved in this study: EMBL-EBI (Juan Antonio Vizcaíno), Belgium (Lennart Martens), Germany (Oliver Kohlbacher, Martin Eisenacher), the Netherlands (Magnus Palmblad, Peter Horvatovich), Denmark (Veit Schwammle, Jon Ison), Switzerland (Lydie Lane, Frederique Lisacek), France (David Bouyssié, Christophe Bruley), Sweden (Fredrik Levander), Spain (Fernando Corrales, Eduard Sabido), Norway (Harald Barsnes).

1/6/2018 - 31/5/2019

Data validation

The purpose of this study is to determine the requirements for validation in order to build prototype open validation services for databases and knowledgebases. Existing domain-specific validators and generic validation services will be assessed for their utility for operability.

Validation has many contexts in interoperability, the specific scopes for this study are: 1) content validation according to minimum information checklists present in primary archives, 2) syntactic format validation according to a standard format in conjunction with the GA4GH file formats team, 3) syntactic format validation for phenotyping data and 4) semantic validation according to a publicly available ontology.  Generally, this will improve data quality and interoperability for ELIXIR resources. 

Four ELIXIR nodes are involved in this study: Belgium (Frederik Coppens, co-lead), EMBL-EBI (Thomas Keane, Helen Parkinson), UK (Philippe Rocca-Serra, Alasdair Gray), France (Sarah Cohen-Boulakia, Cyril Pommier).

1/1/2018 - 31/12/2018

Apple as a Model for Genomic Information Exchange

This study aims to integrate the high quality apple reference genome and its associated catalogue of genetic diversity, representing the most widely cultivated apple varieties around the world. Apple will be used as a case study for managing the growing number of ‘multi-genome’ fruit projects, testing and where necessary, improving tools to streamline data import and exchange between ELIXIR supported resources, specifically BioSamples, ENA, EVA, ORCAE and Ensembl Plants.

The study is run by ELIXIR Italy (Alessandro Cestaro), ELIXIR Belgium (Lieven Sterck) and EMBL-EBI (Paul Kersey).

1/6/2018 - 31/3/2019