Domain of unknown function explained

A domain of unknown function (DUF) is a protein domain that has no characterised function. These families have been collected together in the Pfam database using the prefix DUF followed by a number, with examples being DUF2992 and DUF1220. As of 2019, there are almost 4,000 DUF families within the Pfam database representing over 22% of known families. Some DUFs are not named using the nomenclature due to popular usage but are nevertheless DUFs.[1]

The DUF designation is tentative, and such families tend to be renamed to a more specific name (or merged to an existing domain) after a function is identified.[2] [3]

History

The DUF naming scheme was introduced by Chris Ponting, through the addition of DUF1 and DUF2 to the SMART database.[4] These two domains were found to be widely distributed in bacterial signaling proteins. Subsequently, the functions of these domains were identified and they have since been renamed as the GGDEF domain and EAL domain respectively.[2]

Characterisation

Structural genomics programmes have attempted to understand the function of DUFs through structure determination. The structures of over 250 DUF families have been solved. This (2009) work showed that about two thirds of DUF families had a structure similar to a previously solved one and therefore likely to be divergent members of existing protein superfamilies, whereas about one third possessed a novel protein fold.[5]

Some DUF families share remote sequence homology with domains that has characterized function. Computational work can be used to link these relationships. A 2015 work was able to assign 20% of the DUFs to characterized structural superfamilies.[6] Pfam also continuously perform the (manually-verified) assignment in "clan" superfamily entries.[1]

Frequency and conservation

More than 20% of all protein domains were annotated as DUFs in 2013. About 2,700 DUFs are found in bacteria compared with just over 1,500 in eukaryotes. Over 800 DUFs are shared between bacteria and eukaryotes, and about 300 of these are also present in archaea. A total of 2,786 bacterial Pfam domains even occur in animals, including 320 DUFs.

Role in biology

Many DUFs are highly conserved, indicating an important role in biology. However, many such DUFs are not essential, hence their biological role often remains unknown. For instance, DUF143 is present in most bacteria and eukaryotic genomes.[7] However, when it was deleted in Escherichia coli no obvious phenotype was detected. Later it was shown that the proteins that contain DUF143, are ribosomal silencing factors that block the assembly of the two ribosomal subunits. While this function is not essential, it helps the cells to adapt to low nutrient conditions by shutting down protein biosynthesis. As a result, these proteins and the DUF only become relevant when the cells starve. It is thus believed that many DUFs (or proteins of unknown function, PUFs) are only required under certain conditions.

Essential DUFs

Goodacre et al. identified 238 DUFs in 355 essential proteins (in 16 model bacterial species), most of which represent single-domain proteins, clearly establishing the biological essentiality of DUFs. These DUFs are called "essential DUFs" or eDUFs.

External links

Notes and References

  1. El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer EL, Hirsh L, Paladin L, Piovesan D, Tosatto SC, Finn RD . The Pfam protein families database in 2019 . Nucleic Acids Research . 47 . D1 . D427–D432 . January 2019 . 30357350 . 6324024 . 10.1093/nar/gky995 .
  2. Bateman A, Coggill P, Finn RD . DUFs: families in search of function . Acta Crystallographica. Section F, Structural Biology and Crystallization Communications . 66 . Pt 10 . 1148–52 . October 2010 . 20944204 . 2954198 . 10.1107/S1744309110001685 .
  3. Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, Pang N, Forslund K, Ceric G, Clements J, Heger A, Holm L, Sonnhammer EL, Eddy SR, Bateman A, Finn RD . The Pfam protein families database . Nucleic Acids Research . 40 . Database issue . D290-301 . January 2012 . 22127870 . 3245129 . 10.1093/nar/gkr1065 .
  4. Schultz J, Milpetz F, Bork P, Ponting CP . SMART, a simple modular architecture research tool: identification of signaling domains . Proceedings of the National Academy of Sciences of the United States of America . 95 . 11 . 5857–64 . May 1998 . 9600884 . 34487 . 10.1073/pnas.95.11.5857 . 1998PNAS...95.5857S . free .
  5. Jaroszewski L, Li Z, Krishna SS, Bakolitsa C, Wooley J, Deacon AM, Wilson IA, Godzik A . Exploration of uncharted regions of the protein universe . PLOS Biology . 7 . 9 . e1000205 . September 2009 . 19787035 . 2744874 . 10.1371/journal.pbio.1000205 . free .
  6. Mudgal R, Sandhya S, Chandra N, Srinivasan N . De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods . Biology Direct . 10 . 1 . 38 . July 2015 . 26228684 . 4520260 . 10.1186/s13062-015-0069-2 . free .
  7. Häuser R, Pech M, Kijek J, Yamamoto H, Titz B, Naeve F, Tovchigrechko A, Yamamoto K, Szaflarski W, Takeuchi N, Stellberger T, Diefenbacher ME, Nierhaus KH, Uetz P . RsfA (YbeB) proteins are conserved ribosomal silencing factors . PLOS Genetics . 8 . 7 . e1002815 . 2012 . 22829778 . 3400551 . 10.1371/journal.pgen.1002815 . Hughes . Diarmaid . free .