
Some proteins (artist’s illustration) are being reclassified in databases as a result of the latest findings.Credit: Christoph Burgstedt/Getty
The human genome contains around 20,000 genes that hold instructions for making working proteins, as most genetic databases now indicate. However, some scientists say there might be thousands more ‘dark proteins’ with unknown but potentially important roles in cells.
These proteins, the code for which has been translated from portions of the genome that weren’t thought to produce proteins, were excluded from official genome and protein counts.
An effort announced today in Nature1 gives thousands of these molecules encoded by the human genome an official, new name — peptideins — and marks their inclusion in major gene and protein databases used by the life-sciences community.
Researchers say the rebranding will bring much-needed attention and effort to working out what different peptideins do in cells. Some have been implicated in diseases including childhood cancers, as well as in basic cellular functions.
‘Dark proteins’ hiding in our cells could hold clues to cancer and other diseases
But what most of them do is unknown, although there is some evidence that many peptideins — previously called microproteins or non-canonical, ‘dark’ proteins — are cellular by-products without a clear function.
“This is a major breakthrough,” says Christoph Dietrich, a bioinformatician at the University of Heidelberg, Germany. “These microproteins have the potential to really open up a new wave of research.”
Short and mysterious
Dark proteins tend to be very short in amino acid length and lack evolutionary relatives in other organisms, which is part of the reason they have been omitted from protein-coding gene and proteome databases. In many cases, they are encoded by genes that are very near to, or in some cases overlapping with known, protein-coding genes.
In today’s Nature paper, a new effort called the TransCODE Consortium analysed experimental data on thousands of potential dark proteins. Starting with a list of 7,264 DNA sequences suspected to encode dark proteins, the consortium found that just 15 had enough experimental support to be considered for official catalogues of protein-coding genes.
Move over, DNA: ancient proteins are starting to reveal humanity’s history
Portions of thousands more could be detected in cells, but the experimental evidence was less strong; their functions were almost entirely unknown. These were dubbed peptideins, a portmanteau of peptide — a short stretch of amino acids — and protein.
“It’s made of amino acids, but we don’t know what it does in terms of function. We don’t necessarily know that it does anything at this point. But we know it exists,” says Jonathan Mudge, a bioinformatician who works on the GENCODE database of protein-coding genes at the European Molecular Biology Laboratory’s European Bioinformatics Institute in Hinxton, UK, and a consortium member.
Biology’s dwarf planets
