One of the main goals of human genome project was to identify all the protein-coding genes. There are ~ 20,500 protein-coding genes annotated in human reference databases. However, in the last few years, proteogenomics studies have predicted thousands of novel protein-coding regions including low molecular weight proteins encoded by small open reading frames (ORFs) in untranslated regions of messenger RNAs and non-coding RNAs. Most of these predictions are based on bioinformatics analysis and ribosome footprints. The validity of some of these small ORF (sORF) encoded proteins (SEPs) has been established following functional characterization. With the growing number of predicted novel proteins, a strategy to identify reliable candidates that warrant further studies is needed. We developed an integrated proteogenomics workflow to identify reliable set of novel protein-coding regions in the human genome based on their recurrent observations across multiple samples. Publicly available ribo...
Authors | Kore, H; Okano, S; Datta, KK; Thorp, J; Periasamy, P; Divate, M; Liyanage, U; Hartel, G; Nagaraj, SH; Gowda, H |
---|---|
Journal | Genomics, proteomics & bioinformatics |
Pages | |
Volume | |
Date | 7/03/2025 |
Grant ID | |
Funding Body | |
URL | http://www.ncbi.nlm.nih.gov/pubmed/?term=10.1093/gpbjnl/qzaf004 |