Focus on... Laura Fancello and proteogenomics

on the February 15, 2021

Laura Fancello is a bioinformatician working at the “Exploring the Dynamics of Proteomes” (Edyp) team (Univ. Grenobles Alpes, CEA, Inserm, IRIG-BGE). In 2019, she obtained a Junior Chair from the Data Institute and she is investigating the use of transcriptome information to enhance protein identification in proteomics.
During her PhD she characterized viral diversity using metagenomics, i.e. the high-throughput sequencing of the set of genomes present in a sample. This was her first contact with the “omics”, the comprehensive characterization and quantification by high-throughput assays of various sets of biological molecules, such as DNA (genomics), RNA (transcriptomics) or proteins (proteomics). In her postdoc, she studied the role of some ribosomal protein mutations in cancer, using transcriptomics and translatomics (the set of all RNAs associated with ribosomes for protein synthesis). At Edyp, she could integrate her expertise in genomics and transcriptomics with proteomics and started to investigate how to use transcriptomics to enhance protein identification in proteomics, in particular to disambiguate the source proteins of shared peptides.

Shared peptides, which is peptides that might originate from different proteins due to sequence homology, different transcript isoforms or post-translational modifications, are a major issue in proteomics. The most widely used strategy, bottom-up proteomics, consists in an enzymatic digestion of proteins into peptides, their separation by liquid chromatography and their characterization by mass spectrometry. Peptide are identified from their mass spectra by comparison to the theoretical spectra from all possible sequences of the reference protein database. Then, they are mapped back to the most plausible set of proteins from which they could originate. Protein identification is not trivial, due to several factors, including the presence of shared peptides.

Proteogenomics, the combined analysis of genomes and/or transcriptomes with proteomes, has been widely used to identify variant or novel proteins. These candidate proteins, inferred from transcriptomics or genomics, are added to the canonical protein sequence database for mass spectra search, to allow their identification. Other studies used proteogenomics to better estimate the probability of a protein’s presence in a sample given the identified peptides and the corresponding transcript presence. Indeed, although it is debatable to which extent we can predict protein levels from transcript levels, we know that according to the “central dogma of biology” there can be no protein translation without the corresponding transcript. While several studies used a proteogenomic approach to enhance sensitivity in peptide and protein identification, little is known about its potential to reduce ambiguity of shared peptide identifications. In her project, Laura Fancello is investigating how to reduce uncertainty in the source proteins of shared peptides, based on the presence or absence of the corresponding transcript.

Published on February 15, 2021