This study aims to improve the re-usability of public proteomics datasets. This is achieved by substantially increasing the amount and quality of technical and biological annotations for datasets stored in the PRIDE database.
The PRIDE database is the world-leading repository for mass spectrometry proteomics data and is one of ELIXIR’s core data resources. PRIDE received 2.443 dataset submissions in 2017. The large amounts of data downloaded from PRIDE in 2017, 295Tbs, illustrates the growing reuse and reanalysis of this data. However, proteomics data reuse is currently still constrained by the limited technical and biological annotations that are available for PRIDE datasets. A user-friendly mechanism is required for users to improve the quality of the annotation of the datasets.
The current implementation study has four objectives. First, the ELIXIR nodes will develop an a-posteriori annotation system for PRIDE, for technical and biological metadata, which will leverage the unique synergies of already existing tools and pipelines developed by different ELIXIR nodes. Second, they will create data structures that can capture the most-frequently used experimental designs in proteomics studies. Third, an appropriate API will be built to allow annotation tools to be developed easily. Fourth, they will reach out to actively involve the whole proteomics community in the annotation process. Taken together, this should dramatically improve the reusability of public proteomics datasets.
Ten ELIXIR nodes are involved in this study: EMBL-EBI (Juan Antonio Vizcaíno), Belgium (Lennart Martens), Germany (Oliver Kohlbacher, Martin Eisenacher), the Netherlands (Magnus Palmblad, Peter Horvatovich), Denmark (Veit Schwammle, Jon Ison), Switzerland (Lydie Lane, Frederique Lisacek), France (David Bouyssié, Christophe Bruley), Sweden (Fredrik Levander), Spain (Fernando Corrales, Eduard Sabido), Norway (Harald Barsnes).
1/6/2018 - 31/5/2019