The phi X 174 (or ΦX174) bacteriophage is a single-stranded DNA (ssDNA) virus that infects Escherichia coli. This virus was isolated in 1935 by Nicolas Bulgakov [1] in Félix d'Hérelle's laboratory at the Pasteur Institute, from samples collected in Paris sewers. Its characterization and the study of its replication mechanism were carried out from the 1950s onwards. It was the first DNA-based genome to be sequenced. This work was completed by Fred Sanger and his team in 1977.[2] In 1962, Walter Fiers and Robert Sinsheimer had already demonstrated the physical, covalently closed circularity of ΦX174 DNA.[3] Nobel prize winner Arthur Kornberg used ΦX174 as a model to first prove that DNA synthesized in a test tube by purified enzymes could produce all the features of a natural virus, ushering in the age of synthetic biology.[4] [5] In 1972–1974, Jerard Hurwitz, Sue Wickner, and Reed Wickner with collaborators identified the genes required to produce the enzymes to catalyze conversion of the single stranded form of the virus to the double stranded replicative form.[6] In 2003, it was reported by Craig Venter's group that the genome of ΦX174 was the first to be completely assembled in vitro from synthesized oligonucleotides.[7] The ΦX174 virus particle has also been successfully assembled in vitro.[8] In 2012, it was shown how its highly overlapping genome can be fully decompressed and still remain functional.[9]
This bacteriophage has a [+] sense circular single-stranded DNA genome of 5,386 nucleotides. The genome GC-content is 44% and 95% of nucleotides belong to coding genes. Because of the balance base pattern of the genome, it is used as the control DNA for Illumina sequencers.
ΦX174 encodes 11 genes, named as consecutive letters of the alphabet in the order they were discovered, with the exception of A* which is an alternative start codon within the large A genes. Only genes A* and K are thought to be non-essential, although there is some doubt about A* because its start codon could be changed to ATT but not any other sequence.[10] It is now known that the ATT is still likely capable of producing protein[11] within E. coli and therefore this gene may in fact be essential.
The first half of the ΦX174 genome features high levels of gene overlap[12] with eight out of 11 genes overlapping by at least one nucleotide. These overlaps have been shown to be non-essential although the refactored phage with all gene overlaps removed had decreased fitness from wild-type.
Phage ΦX174 has been used to try to establish the absence of undiscovered genetic information through a "proof by synthesis" approach.[13]
In 2020, the transcriptome of ΦX174 was generated.[14] Notable features of the ΦX174 transcriptome is a series of up to four relatively weak promoters in series with up to four Rho-independent (intrinsic) terminators and one Rho-dependent terminator.
ΦX174 encodes 11 proteins.
Protein | Copies | Function[15] | |
---|---|---|---|
A | — | Nicks RF DNA to initiate rolling circle replication; ligates ends of linear phage DNA to form single-stranded circular DNA | |
A* | — | Inhibits host cell DNA replication; blocks superinfecting phage; not essential | |
B | Internal scaffolding protein involved in procapsid assembly | ||
C | — | DNA packaging | |
D | 240 in procapsid | External scaffolding protein involved in procapsid assembly | |
E | — | Host cell lysis | |
F | 60 in virion | Major capsid protein | |
G | 60 in virion | Major spike protein | |
H | 12 in virion | DNA pilot protein (or minor spike protein) | |
J | 60 in virion | Binds to new single-stranded phage DNA; accompanies phage DNA into procapsid | |
K | — | Optimizes burst size; not essential |
Identification of all ΦX174 proteins using mass spectrometry has recently been reported.[16]
Infection begins when G protein binds to lipopolysaccharides on the bacterial host cell surface. H protein (or the DNA Pilot Protein) pilots the viral genome through the bacterial membrane of E.coli bacteria[17] most likely via a predicted N-terminal transmembrane domain helix.[18] However, it has become apparent that H protein is a multifunctional protein.[19] This is the only viral capsid protein of ΦX174 to lack a crystal structure for a couple of reasons. It has low aromatic content and high glycine content, making the protein structure very flexible and in addition, individual hydrogen atoms (the R group for glycines) are difficult to detect in protein crystallography. Additionally, H protein induces lysis of the bacterial host at high concentrations as the predicted N-terminal transmembrane helix easily pokes holes through the bacterial wall. By bioinformatics, this protein contains four predicted coiled-coil domains which has a significant homology to known transcription factors. Additionally, it was determined that de novo H protein was required for optimal synthesis of other viral proteins.[20] Mutations in H protein that prevent viral incorporation, can be overcome when excess amounts of protein B, the internal scaffolding protein, are supplied.
The DNA is ejected through a hydrophilic channel at the 5-fold vertex.[21] It is understood that H protein resides in this area but experimental evidence has not verified its exact location. Once inside the host bacterium, replication of the [+] ssDNA genome proceeds via negative sense DNA intermediate. This is done as the phage genome supercoils and the secondary structure formed by such supercoiling attracts a primosome protein complex. This translocates once around the genome and synthesizes a [−]ssDNA from the positive original genome. [+]ssDNA genomes to package into viruses are created from this by a rolling circle mechanism. This is the mechanism by which the double stranded supercoiled genome is nicked on the positive strand by a virus-encoded A protein, also attracting a bacterial DNA polymerase (DNAP) to the site of cleavage. DNAP uses the negative strand as a template to make positive sense DNA. As it translocates around the genome it displaces the outer strand of already-synthesised DNA, which is immediately coated by SSBP proteins. The A protein cleaves the complete genome every time it recognises the origin sequence.
As D protein is the most abundant gene transcript, it is the most abundant protein in the viral procapsid. Similarly, gene transcripts for F, J, and G are more abundant than for H as the stoichiometry for these structural proteins is 5:5:5:1. The primosomes are protein complexes which attach/bind the enzyme helicase on the template. Primosomes gives RNA primers for DNA synthesis to strands.
ΦX174 is closely related to other microviridae, especially the NC phage (e.g. NC1, NC7, NC11, NC16, NC37, NC5, NC41, NC56, NC51, etc.) and more distantly related to the G4-like phages and even more distantly related to the α3-like phage. Rokyta et al. 2006 presented a phylogenetic tree of their relationships.[22]
ΦX174 has been used as a model organism in many evolution experiments.[23]
ΦX174 is regularly used as a positive control in DNA sequencing due to its relatively small genome size in comparison to other organisms, its relatively balanced nucleotide content — about 23% G, 22% C, 24% A, and 31% T, i.e., 45% G+C and 55% A+T, see the accession NC_001422.1 for its 5,386 nucleotide long sequence. Illumina's sequencing instruments use ΦX174 as a positive control,[24] and a single Illumina sequencing run can cover the ΦX174 genome several million times over, making this very likely the most heavily sequenced genome in history.
ΦX174 is also used to test the resistance of personal protective equipment to bloodborne viruses.[25]
ΦX174 has also been modified to enable peptide display (phage display) from the viral capsid G protein.[26]
The ΦX174 genome was the first phage to be cloned in yeast,[9] which provides a convenient drydock for genome modifications.[27] ΦX174 was also the first genome to be fully decompressed, having all gene overlaps removed. The effect of these changes resulted in significantly reduced host attachment, protein expression dysregulation, and heat sensitivity.[16]