Increasing the translational value of public proteomics datasets: Automatic metadata-driven reanalysis in cloud infrastructures

31 May 2021 - 30 May 2023

Following previous work by the ELIXIR Proteomics Community, PRIDE public proteomics datasets, as well as some open proteomics data analysis pipelines are starting to become available.This follow-on ELIXIR implementation study will use previous results as the base to develop a set of open and user-friendly analysis pipelines, which will be applied to  assess the possibilities for performing  more automated re-analyses using the metadata SDRF-encoded annotations of public datasets. Additionally, common ideas in this context and in others overlapping topics of interest will be explored e.g. in a joint gap analysis performed between the ELIXIR Proteomics and the Intrinsically Disordered Proteins (IDP) and 3D-BioInfo ELIXIR Communities, to further serve the overall ELIXIR goals.

This study therefore intends to provide a use case, which will motivate users to perform SDRF-annotations of public datasets. By developing and providing the community with data processing and analysis pipelines as well as by helping to standardise data management and annotation, two goals of the proteomics community will be addressed through the studies four work packages:

  • WP1: Metadata provision and processing
  • WP2: Workflow adaptation and development
  • WP3: Automating workflows for cloud environments
  • WP4: Intercommunity ‘gap analysis’ between the IDP, 3D-Bioinfo and  Proteomics Communities

This is a joint study between EMBL-EBI, ELIXIR Denmark, ELIXIR Belgium (Lennart Martens), ELIXIR Czech, ELIXIR Germany, ELIXIR France, ELIXIR Finland, ELIXIR Hungary, ELIXIR Italy, ELIXIR The Netherlands, and ELIXIR Sweden.