Rolling hairpin replication explained

Rolling hairpin replication (RHR) is a unidirectional, strand displacement form of DNA replication used by parvoviruses, a group of viruses that constitute the family Parvoviridae. Parvoviruses have linear, single-stranded DNA (ssDNA) genomes in which the coding portion of the genome is flanked by telomeres at each end that form hairpin loops. During RHR, these hairpin loops repeatedly unfold and refold to change the direction of DNA replication so that replication progresses in a continuous manner back and forth across the genome. RHR is initiated and terminated by an endonuclease encoded by parvoviruses that is variously called NS1 or Rep, and RHR is similar to rolling circle replication, which is used by ssDNA viruses that have circular genomes.

Before RHR begins, a host cell DNA polymerase converts the genome to a duplex form in which the coding portion is double-stranded and connected to the terminal hairpins. From there, messenger RNA (mRNA) that encodes the viral initiator protein is transcribed and translated to synthesize the protein. The initiator protein commences RHR by binding to and nicking the genome in a region adjacent to a hairpin called the origin and establishing a replication fork with its helicase activity. Nicking leads to the hairpin unfolding into a linear, extended form. The telomere is then replicated and both strands of the telomere refold back in on themselves to their original turn-around forms. This repositions the replication fork to switch templates to the other strand and move in the opposite direction. Upon reaching the other end, the same process of unfolding, replication, and refolding occurs.

Parvoviruses vary in whether both hairpins are the same or different. Homotelomeric parvoviruses such as adeno-associated viruses (AAV), i.e. those that have identical or similar telomeres, have both ends replicated by terminal resolution, the previously described process. Heterotelomeric parvoviruses such as minute virus of mice (MVM), i.e. those that have different telomeres, have one end replicated by terminal resolution and the other by an asymmetric process called junction resolution. During asymmetric junction resolution, the duplex extended form of the telomere reorganizes into a cruciform-shaped junction, and the correct orientation of the telomere is replicated off the lower arm of the cruciform. As a result of RHR, a replicative molecule that contains numerous copies of the genomes is synthesized. The initiator protein periodically excises progeny ssDNA genomes from this replicative concatemer.

Background

Parvoviruses are a family of DNA viruses that have single-stranded DNA (ssDNA) genomes enclosed in rugged, icosahedral protein capsids 18–26 nanometers (nm) in diameter.^[1] Unlike most other ssDNA viruses, which have circular genomes that form a loop, parvoviruses have linear genomes with short terminal sequences at each end of the genome. These termini are capable of being formed into structures called hairpins or hairpin loops and consist of short, imperfect palindromes. Varying from virus to virus, the coding region of the genome is 4–6 kilobases (kb) in length, and the termini are 116–550 nucleotides (nt) in length each. The hairpin sequences provide most of the cis-acting information needed for DNA replication and packaging.^[1] ^[2]

Parvovirus genomes may be either positive-sense or negative-sense. Some species, such as adeno-associated viruses (AAV) like AAV2, package a roughly equal number of positive-sense and negative-sense strands into virions, others, such as minute virus of mice (MVM), show preference toward packaging negative-sense strands, and others have varying proportions.^[2] Because of this disparity, the 5′-end (usually pronounced "five prime end") of the strand that encodes the non-structural proteins is called the "left end", and the 3′-end (usually pronounced "three prime end") is called the "right end". In reference to the negative-sense strand, the 3′-end is the left side and the 5′-end is the right side.^[2] ^[3]

Parvoviruses replicate their genomes through a process called rolling hairpin replication (RHR), which is a unidirectional, strand displacement form of DNA replication. Before replication, the coding portion of the ssDNA genome is converted to a double-strand DNA (dsDNA) form, which is then cleaved by a viral protein to initiate replication. Sequential unfolding and refolding of the hairpin termini acts to reverse the direction of synthesis, which allows replication to go back and forth along the genome to synthesize a continuous duplex replicative form (RF) DNA intermediate. Progeny ssDNA genomes are then excised from the RF intermediate.^[2] While the general aspects of RHR are conserved across genera and species, the exact details likely vary.^[4]

Parvovirus genomes have distinct starting points of replication that contain palindromic DNA sequences. These sequences are able to alternate between inter- and intrastrand basepairing throughout replication, and they serve as self-priming telomeres at each end of the genome. They also contain two key sites necessary for replication used by the initiator protein: a binding site and a cleavage site.^[5] Telomere sequences have significant complexity and diversity, suggesting that they perform additional functions for many species.^[1] In MVM, for example, the left-end hairpin contains binding sites for transcription factors that modulate gene expression from an adjacent promoter. For AAV, the hairpins can bind to MRE11/Rad50/NBS1 (MRN) complexes and Ku70/80 heterodimers, which are involved in sensing and repairing DNA.^[3] In general, however, they have the same basic structure: imperfect palindromes in which a fully or primarily basepaired region terminates into an axial symmetry. These palindromes can fold into a variety of structures such as a Y-shaped structure and a cruciform-shaped structure. During replication, the termini act as hinges in which the imperfectly basepaired or partial cruciform regions surrounding the axis provide a favorable environment for unfolding and refolding of the hairpin.^[2]

Some parvoviruses, such as AAV2, are homotelomeric, meaning the two palindromic telomeres are similar or identical and form part of larger (inverted) terminal repeat sequences. Replication at each terminal ending is therefore similar. Other parvoviruses, such as MVM, are heterotelomeric, meaning they have two physically different telomeres. As a result, heterotelomeric parvoviruses tend to have a more complex replication process since the two telomeres have different replication processes.^[2] In general, homotelomeric parvoviruses replicate both ends via a process called terminal resolution, whereas heterotelomeric parvoviruses replicate one end by terminal resolution and the other end by an asymmetric process called junction resolution.^[2] ^[3] Whether a genus is hetero- or homotelomeric, along with other genomic characteristics, is shown in the following table.^[2]


Subfamily	Genus	Genome length	Hetero- / Homotelomeric	Terminal repeat (TR) and hairpin (HP) length
Densovirinae^[6]	Aquambidensovirus	~6 kb	homotelomeric	~550 nt TRs; ~139 nt HPs
	Blattambidensovirus	~6 kb	homotelomeric	~550 nt TRs; ~139 nt HPs
	Hemiambidensovirus	~6 kb	homotelomeric	~550 nt TRs; ~139 nt HPs
	Iteradensovirus	~5 kb	homotelomeric	~250 nt TRs; ~165 nt HPs
	Miniambidensovirus^[7]	~4.9 kb	homotelomeric	~199 nt TRs; ~113 nt HPs
	Pefuambidensovirus	~6 kb	homotelomeric	~550 nt TRs; ~139 nt HPs
	Protoambidensovirus	~6 kb	homotelomeric	~550 nt TRs; ~139 nt HPs
	Scindoambidensovirus	~6 kb	homotelomeric	~550 nt TRs; ~139 nt HPs
Hamaparvovirinae	Brevihamaparvovirus^[8]	~4.2 kb	heterotelomeric	~135 nt (L); ~180 nt (R)
	Chaphamaparvovirus^[9]	~4.4 kb	probably heterotelomeric	~145 nt (L); ~117 nt (R)
	Hepahamaparvovirus^[10]	~6.3 kb	heterotelomeric	~0.2 kb HPs
	Ichthamaparvovirus^[11]		probably heterotelomeric
	Penstylhamaparvovirus^[12]	~4 kb	unknown	unknown
Parvovirinae	Amdoparvovirus	~4.8 kb	heterotelomeric	~116 nt (L); ~240 nt (R)
	Artiparvovirus^[13]	~5 kb	unknown	unknown
	Aveparvovirus	~5.3 kb	homotelomeric	~206 nt TRs; ~39 nt HPs
	Bocaparvovirus	~5.5 kb	heterotelomeric	~140–180 nt (L); ~120–200 nt (R)
	Copiparvovirus	~5.6 kb	probably homotelomeric	unknown
	Dependoparvovirus	~4.7 kb	homotelomeric	~145–454 nt TRs; ~125–415 nt HPs
	Erythroparvovirus	~5.6 kb	homotelomeric	~383 nt TRs; ~365 nt HPs
	Loriparvovirus^[14]	~4.8 kb	unknown	unknown
	Protoparvovirus	~5.1 kb	heterotelomeric	~140–180 nt (L); ~180–200 nt (R)
	Tetraparvovirus	~5.3 kb	unknown	unknown

General process

The entire process of rolling hairpin replication, which has distinct, sequential stages, can be summarized as follows:^[2] ^[3] ^[4]

1. The coding portion of the genome is replicated, starting from the 3′-end of the 3′ hairpin, which acts as a primer, and continues until the newly synthesized strand is connected to the 5′-end of the 5′ hairpin, producing a duplex DNA molecule that has two strands of the coding portion of the genome.
2. mRNA that encodes the viral replication initiator protein is transcribed and subsequently translated to synthesize the protein.
3. The initiator protein binds to and cleaves the DNA within a region called the origin, which results in the hairpin unfolding into a linear, extended form. At the same time, the initiator protein establishes a replication fork with its helicase activity.
4. The extended-form hairpin is replicated to create an inverted copy of the telomere on the newly synthesized strand.
5. The two strands of that end refold back into two hairpins, which repositions the replication fork to switch templates and move in the opposite direction.
6. DNA replication continues in a linear manner from one end to the other using the opposite strand as a template.
7. Upon reaching the other end, that end's hairpin is unfolded and refolded to replicate the terminus and once again swap templates and change the direction of replication. This back-and-forth replication is continually repeated, producing a concatemer of multiple copies of the genome.
8. The viral initiator protein periodically excises individual genomic strands of DNA from the replicative concatemer.
9. Excised ssDNA genomes are packaged into newly constructed viral capsids.

Preparation for replication

Upon cell entry, a tether about 24 nucleotides in length that attaches the viral protein NS1, essential in replication, to the virion is cleaved off the virion to be reattached later. After cell entry, virions accumulate in the cell nucleus while the genome is still contained within the capsid. These capsids may be reconfigured to an open or transitioned state during entry. The exact mechanism by which the genome leaves the capsid is unclear. For AAV, it has been suggested that nuclear factors disassemble the capsid, whereas for MVM, it appears as if the genome is ejected in a 3′-to-5′ direction from an opening in the capsid called a portal.^[3]

Parvoviruses lack genes capable of inducing resting cells to enter their DNA synthesis phase (S-phase). Additionally, naked ssDNA is likely to be unstable, perceived as foreign by the host cell, or improperly replicated by host DNA repair. For these reasons, the genome must either be converted rapidly to its less obstructive, more stable duplex form or retained within the capsid until it is uncoated during S-phase. Typically, the latter occurs and virion remains silent in the nucleus until the host cell enters S-phase by itself. During this waiting period, virions may make use of certain strategies to evade host defense mechanisms to protect their hairpins and DNA to reach S-phase, though it is unclear how this occurs.^[2] Since the genome is packaged as ssDNA, creation of a complementary strand is necessary before gene expression.^[3]

DNA polymerases are only able to synthesize DNA in a 5′ to 3′ direction, and they require a basepair primer to begin synthesis. Parvoviruses address these limitations by using their termini as primers for complementary strand synthesis. A 3′ hydroxyl end of the left-hand (3′) terminus pairs with an internal base to prime initial DNA synthesis, resulting in the conversion of the ssDNA genome to its first duplex form.^[1] ^[4] This is a monomeric double-stranded DNA molecule in which the two strands are covalently cross-linked to each other at the left-end by a single copy of the viral telomere. Synthesis of the duplex form precedes NS1 expression so that when the replication fork during initial complementary strand synthesis reaches the right (5′) end, it does not displace and copy the right-end hairpin. This allows the 3′-end of the new DNA strand to be covalently ligated to the 5′-end of the right hairpin by a host ligase, thereby creating the duplex molecule. During this step, the tether sequence that was present before viral entry into the cell is resynthesized.

Essential viral proteins and initiation

Once an infected cell enters S-phase, parvovirus genomes are converted to their duplex form by host replication machinery, and mRNA that encodes non-structural (NS) proteins is transcribed starting from a viral promoter (P4 for MVM).^[2] ^[3] One of these NS proteins is usually called NS1 but also Rep1 or Rep68/78 for the genus Dependoparvovirus, which AAV belongs to.^[2] NS1 is a site-specific DNA binding protein that acts as the replication initiator protein via nickase activity. It also mediates excision of both ends of the genome from duplex RF intermediates via a transesterification reaction that introduces a nick into specific duplex origin sequences.^[2] Key components of NS1 include an HUH endonuclease domain toward the N-terminus of the protein and a superfamily 3 (SF3) helicase toward the C-terminus,^[15] as well as ATPase activity.^[1] It binds to ssDNA, RNA, and site-specifically on duplex DNA at reiterations of the tetranucleotide sequence 5′-ACCA-3′.^[1] These sequences are present in the viral replication origin sites and repeated at multiple sites throughout the genome in more or less degenerative forms.

NS1 nicks the covalently-closed right-end telomere via a transesterification reaction that liberates a basepaired 3′ nucleotide as a free hydroxyl (-OH).^[2] This reaction is assisted by a host DNA-binding protein from the high mobility group 1/2 (HMG1/2) family and is made in the replication origin, OriR, which was created by sequences in and immediately adjacent to the right hairpin. The left-end telomere of MVM, a heterotelomeric parvovirus, contains sequences that can give rise to replication origins in higher-order duplex intermediates, but these sequences are inactive in the hairpin terminus of the monomeric molecule, so NS1 always initiates replication at the right end.

The 3′-OH that is freed by nicking acts as a primer for the DNA polymerase to start complementary strand synthesis^[5] while NS1 remains covalently attached to the 5′-end via a tyrosine residue.^[1] Consequently, a copy of NS1 remains attached to the 5′-end of all RF and progeny DNA throughout replication, packaging, and virion release.^[2] NS1 is only able to bind to this specific site by assembling into homodimers or higher order multimers, which happens naturally with the addition of adenosine triphosphate (ATP) that is likely mediated by NS1's helicase domain. In vivo studies have shown that NS1 can form into a variety of oligomeric states, but it most likely assembles into hexamers to fulfill the functions of both the endonuclease domain and helicase domain.

Starting from the location at the nick, it is thought that NS1 organizes a replication fork and acts as the replicative 3′-to-5′ helicase. Near its C-terminus, NS1 contains an acidic transcriptional activation domain. This domain acts to upregulate transcription starting from a viral promoter (P38 for MVM) when NS1 is bound to a series of 5′-ACCA-3′ motifs, called the tar sequence, positioned upstream (toward the 5′-end) of the promoter unit, and via interaction with NS1 and various transcription factors. NS1 also recruits the cellular replication protein A (RPA) complex, which is essential for establishing the new replication fork and for binding and stabilizing displaced single strands.

While NS1 is the only non-structural protein essential for all parvoviruses, some have other individual proteins that are essential for replication. For MVM, NS2 appears to reprogram the host cell for efficient DNA amplification, single-strand progeny synthesis, capsid assembly, and virion export, though it seems to lack direct involvement in these processes. NS2 initially accumulates up to three times more quickly than NS1 in the early S-phase but is turned-over rapidly by a proteasome-mediated pathway. As the infectious cycle progresses, NS2 becomes less common as P38-driven transcription becomes more prominent. Another example is the nuclear phosphoprotein NP1 of bocaviruses, which, if not synthesized, results in non-viable progeny genomes.^[3]

As viral NS proteins accumulate, they commandeer host cell replication apparati, terminating host cell DNA synthesis and causing viral DNA amplification to begin. Interference with host DNA replication may be due to direct effects on host replication proteins that are not essential for viral replication, by extensive nicking of host DNA, or by the restructuring of the nucleus during viral infection. Early in infection, parvoviruses establish replication foci in the nucleus that are termed autonomous parvovirus-associated replication (APAR) bodies. NS1 co-localizes with replicating viral DNA in these structures with other cellular proteins necessary for viral DNA synthesis, while other complexes not required for replication are sequestered from APAR bodies. The exact manner by which proteins are included or excluded from APAR bodies is unclear and appears to vary from species to species and between cell types.^[3] As infection progresses, APAR microdomains begin to coalesce with other, formerly distinct, nuclear bodies to form progressively larger nuclear inclusions where viral replication and virion assembly occur. After S-phase begins, the host cell is forced to synthesize viral DNA and cannot leave S-phase.

MVM right-end origin

The right-end hairpin of MVM contains 248 nucleotides organized into a cruciform shape.^[1] This region is almost perfectly basepaired, with just three unpaired bases at the axis and a mismatched region positioned 20 nucleotides from the axis. A three nucleotide insertion, AGA or TCT, on one strand separates opposing pairs of NS1 binding sites, creating a 36 basepair-length palindrome that can assume an alternate cruciform configuration. This configuration is expected to destabilize the duplex, which facilitates its ability to function as a hinge. The mismatch of the unpaired bases, rather than the three-nucleotide sequence itself, may help to promote instability of duplex DNA.

Fully-duplex linear forms of the right-end hairpin sequence also function as NS1-dependent origins. For many parvoviral telomeres, however, only an initiator binding site next to the nick site is required for the origin function so that the minimal sequences required for nicking are less than 40 basepairs in length. For MVM, the minimal right-end origin is around 125 basepairs in length and includes most of the hairpin sequence because at least three recognition elements are involved: the nick site 5′-CTWWTCA-3′ (element 1), positioned seven nucleotides upstream from a duplex NS1-binding site (element 2) that is oriented to have the attached NS1 complex extending over the nick site, and a second NS1-binding site (element 3), which is adjacent to the hairpin axis.

The second binding site is over 100 basepairs away from the nick site but is required for NS1-mediated cleavage. In vivo, there is slight variation in the position of the nick, plus or minus one nucleotide, with one position preferred. During nicking, this site is likely exposed as a single strand and is potentially stabilized as a minimal stem-loop by the tetranucleotide inverted repeats to the sides of the site. Optimal forms of the NS1-binding site contain at least three tandem copies of the 5′-ACCA-3′ sequence. Modest alterations to these motifs only have a small effect on affinity, which suggests that each tetranucleotide motif is recognized by different molecules in the NS1 complex. The NS1-binding site that positions NS1 over the nick site in the right-end origin is a high affinity site.

With ATP, NS1 binds asymmetrically over the aforementioned sequence, protecting a region 41 basepairs in length from digestion. This footprint extends just five nucleotides beyond the 3′-end of the ACCA repeat but 22 nucleotides beyond the 5′-end so that the footprint ends 15 nucleotides beyond the nick site, placing NS1 in position to nick the origin. Nicking only occurs if the second, distant NS1-binding site is also present in the origin and the entire complex is activated by addition of HMG1.

In the absence of NS1, HMG1 binds the hairpin sequence independently, causing it to bend, without protecting any region from digestion. HMG1 can also directly bind to NS1 and mediates interactions between NS1 molecules bound to their recognition elements in the origin, so it is essential for formation of the cleavage complex. The ability of the axis region to reconfigure into a cruciform does not appear to be important in this process. Cleavage is dependent on the correct spacing of the elements of the origin, so additions and deletions can be lethal, whereas substitutions can be tolerated. Addition of HMG1 appears to only slightly adjust the sequences protected by NS1, but the conformation of the intervening DNA changes, folding into a double helical loop that extends about 30 basepairs through a guanine-rich element in the hairpin stem. Between this element and the nick site there are five thymidine residues included in the loop, and the site has a region to its side containing many alternating adenine and thymine residues, which likely increases flexibility. The creation of the loop likely allows the terminus to assume a specific 3-dimensional structure required to activate the nickase since origins that fail to reconfigure into a double-helical loop once HMG1 is added are not nicked.

Terminal resolution

Following nicking, a replication fork is established at the newly exposed 3′ nucleotide that proceeds to unfold and copy the right-end hairpin through a series of melting and reannealing reactions. This process begins once NS1 nicks the inboard end of the original hairpin. The terminal sequence is then copied in the opposite direction, which produces an inverted copy of the original sequence. The end result is a duplex extended-form terminus that contains two copies of the terminal sequence. While NS1 is required for this, it is unclear if unfolding is mediated by its helicase activity in front of the fork or by destabilization of the duplex following DNA binding at one of its 5′-(ACCA)-3′ recognition sites. This process is usually called terminal resolution but also hairpin transfer or hairpin resolution. Terminal resolution occurs with each round of replication, so progeny genomes contain an equal number of each terminal orientation. The two orientations are termed "flip" and "flop",^[3] and may be represented as R and r, or B and b, for the flip and flop of the right-end telomere and L and l, or A and a, for the flip and flop of the left-end telomere.^[4] Since parvoviral terminal palindromes are imperfect, it is easy to identify which orientation is which.^[1]

The extended-form duplex telomeres generated during terminal resolution are melted, mediated by NS1 with ATP hydrolysis, causing individual strands to fold back on themselves to create hairpin "rabbit ear" structures that have the flip and flop of the termini. This requires the NS1 helicase activity as well as its site-specific binding activity, the latter of which enables NS1 to bind to symmetrical copies of NS1-binding sites that surround the axis of the extended-form terminus.

Rabbit ear formation allows the 3′ nucleotide of the newly synthesized DNA strand to pair with an internal base, which repositions the replication fork in a strand-switching maneuver that primes synthesis of additional linear sequences. Switching from DNA synthesis to rabbit-ear formation at the end of terminal resolution may require different types of NS1 complexes. Alternatively, the NS1 complex may remain intact during this switch, being ready to start stand displacement synthesis following refolding into rabbit ears. After the replication fork is repositioned, replication continues toward the left end, using the newly synthesized DNA strand as a template.^[4]

At the left end of the genome, NS1 is probably required to unfold the hairpin. NS1 appears to be directly involved in melting-out and reconfiguring the resulting extended-form left-end duplexes into rabbit ear structures, though this reaction seems to be less efficient than at the right-end terminus. Dimeric and tetrameric concatemers of the genome are generated successively for MVM. In these concatemers, alternating unit-length genomes are fused through a palindromic junction in left-end to left-end and right-end to right-end orientations.^[1] In total, RHR results in coding sequences of the genome being copied twice as often as the termini.^[1] ^[4] Both linear and hairpin configurations of the right-end telomere support initiation of RHR, so resolution of duplex right-end to right-end junctions can occur symmetrically on the basepaired duplex sequence or after this complex is melted and reconfigured into two hairpins. It is unclear which of these two reactions is more common since both appear to produce identical results.

For AAV, each telomere is 125 bases in length and capable folding into a T-shaped hairpin. AAV contains a Rep gene that encodes for four Rep proteins, two of which, Rep68 and Rep78, act as replication initiator proteins and fulfill the same functions, such the nickase and helicase activities, as NS1. They recognize and bind to a (GAGC) sequence in the stem region of the terminus and nick a site 20 bases away termed trs. The same process of terminal resolution as MVM is done for AAV, but at both ends. The other two Rep proteins, Rep52 and Rep40, are not involved in DNA replication but are implicated in synthesis of progeny. AAV replication is dependent on a helper virus that is either an adenovirus or a herpesvirus that coinfects the cell. In the absence of coinfection, the AAV genome is integrated into the host cell's DNA until coinfection occurs.^[1]

A general rule is that parvoviruses with identical termini, i.e. homotelomeric parvoviruses such as AAV and B19, replicate both ends by terminal resolution, generating equal numbers of flips and flops of each telomere.^[1] ^[2] Parvoviruses that have different termini, i.e. heterotelomeric parvoviruses like MVM, replicate one end by terminal resolution and the other end by asymmetric junction resolution, which conserves a single-sequence orientation and requires different structural arrangements and cofactors to activate NS1's nickase.^[2] AAV DNA intermediates containing covalently linked sense and antisense strands yield genomic concatemers under denaturing conditions, indicating that AAV replication also synthesizes duplex concatemers that require some form of junction resolution.

MVM left-end origin

In negative-sense MVM genomes, the left-end hairpin is 121 nucleotides in length and exists in a single flip sequence orientation. This telomere is Y-shaped and contains small internal palindromes that fold into the "ears" of the Y, a duplex stem region 43 nucleotides in length that is interrupted by an asymmetric thymidine residue, and a mismatched "bubble" sequence in which the 5′-GAA-3′ sequence on the inboard arm is opposite of 5′-GA-3′ in the outboard strand.^[1] Sequences in this hairpin are involved in both replication and regulation of transcription. The elements involved in these two functions separate the two arms of the hairpin.

The left-end telomere of MVM, and likely of all heterotelomeric parvoviruses, cannot function as a replication origin in its hairpin configuration. Instead, a single origin on the lower strand is created when the hairpin is unfolded, extended, and copied to form a duplex basepaired sequence that spans adjacent genomes in the dimer RF. Within this structure, the sequence from the outboard arm that surrounds a GA/TC^[1] dinucleotide serves as an origin, OriL. The equivalent GAA/TTC sequence on the inboard arm that contains the bubble trinucleotide, called OriL, does not serve as an origin. The inboard arm and hairpin configuration of the terminus instead appear to function as upstream control elements for the viral transcriptional promoter P4. Additionally, the ability to segregate one arm from nicking appears essential for replication.

The minimal linear left-end origin is about 50 basepairs long and extends from two 5′-ACGT-3′ motifs, spaced five nucleotides apart at one end, to a position seven basepairs beyond the nick site. The bubble's GA sequence itself is relatively unimportant, but the space that it occupies is necessary for the origin to function.^[1] Within the origin, there are three recognition sequences: an NS1-binding site that orients the NS1 complex over the nick site 5′-CTWWTCA-3′, which is located 17 nucleotides downstream (toward the 3′-end), and the two ACGT motifs. These motifs bind a heterodimeric cellular factor called either parvovirus initiation factor (PIF) or glucocorticoid modulating element-binding protein (GMEB).

PIF is a site-specific DNA-binding heterodimeric complex that contains two subunits, p96 and p79, and functions as a transcription modulator in the host cell. It binds DNA via a KDWK fold and recognizes two ACGT half-sites. The spacing between these sites can vary significantly for PIF, from one to nine nucleotides, with an optimal spacing of six. PIF stabilizes the binding of NS1 on the active form of the left-end origin, OriL, but not on the inactive form, OriL, because the two complexes are able to establish contact over the bubble binucleotide. The left-end hairpin of all other species in the Protoparvovirus genus,^[16] of which MVM belongs, have bubble asymmetries and PIF-binding sites, though with slight variation in spacing. This suggests that they all share a similar origin segregation mechanism.

Asymmetric junction resolution

Due to the location of the active origin OriL in the dimer junction, synthesis of new copies of the left-end hairpin in the correct, i.e.flip, orientation is not straightforward since a replication fork moving from this site through the linear bridge structure should synthesize new DNA in the flop orientation. Instead, the left-hand MVM dimer junction is resolved asymmetrically in a process that creates a cruciform intermediate. This maneuver accomplishes two things: it allows synthesis of the new DNA in the correct sequence orientation, and it creates a structure that can be resolved by NS1. This "heterocruciform" model of synthesis suggests that resolution is driven by the NS1 helicase activity and depends on the inherent instability of the duplex palindrome, a property that allows it to switch between its linear and cruciform configurations.

NS1 initially introduces a single-strand nick in OriL in the B ("right") arm of the junction and becomes covalently attached to the DNA on the 5′ side of the nick, exposing a basepaired 3′ nucleotide. Two outcomes can then occur, depending on the speed with which a replication fork is assembled. If assembly is rapid, then while the junction is in its linear configuration, "read-through" synthesis copies the upper strand, which regenerates the duplex junction and displaces a positive-sense strand that feeds back into the replicative pool. This promotes MVM DNA amplification but does not lead to synthesis of new terminal sequences in the correct orientation or to junction resolution.

To create a resolvable structure, the initial nicking must be followed by melting and rearrangement of the dimer junction into a cruciform. This is driven by the 3′-to-5′ helicase activity of the 5′-linked NS1 complex. Once this cruciform extends to include sequences beyond the nick site, the exposed primer at the nick site in OriL undergoes template switching by annealing with its complement in the lower arm of the cruciform. If a fork assembles after this point, then the subsequent synthesis unfolds and copies the lower cruciform arm. This creates a heterocruciform intermediate that contains the newly synthesized telomere in the flip sequence orientation that is attached to the lower strand of the B arm. This modified junction is called MJ2.

The lower arm of MJ2 is an extended-form duplex palindrome that is essentially identical to those generated during terminal resolution. Once MJ2 is synthesized, the lower arm becomes susceptible to rabbit-ear formation. This repositions the 3′ nucleotide of the newly synthesized copy of the lower arm so that it pairs with inboard sequences on the junction's B arm to prime strand displacement synthesis. If a replication fork is created at this 3′ nucleotide, then the lower strand of the B arm is copied, creating an intermediate junction called MJ1 and progressively displacing the upper strand. This leads to the release of the newly synthesized B turn-around (B-ta) sequence. The residual cruciform, called δJ, is partially single-stranded at the upper part of the B arm and contains the intact upper strand of the junction paired to the lower strand of the A ("left") arm, with an intact copy of the left-end hairpin, ending in a 5′ NS1 complex. Since δJ carries the NS1 helicase, it is presumed to periodically alter configuration.

The next step is less certain but can be inferred based on what is known about the process thus far. The NS1 helicase is expected to create a dynamic structure in which the nick site in δJ in the normally inactive A side is temporarily but repeatedly exposed in a single-stranded form during duplex-to-hairpin rearrangements, which allows NS1 to engage the nick site in the origin OriL without the help of a cofactor. The nick would leave NS1 covalently attached to the positive-sense "B" strand of δJ and lead to the release of this strand. Nicking also leaves open a basepaired 3′ nucleotide on the "A" strand of δJ to prime DNA synthesis. If a replication fork is established here, then the A strand is unfolded and copied to create its duplex extended form.

When MVM genomes replicate in vivo, the aforementioned nick may not occur because both ends of the dimer replicative form contain an efficient number of right-end hairpin origins. Therefore, replication forks may progress back toward the dimer junction from the genome's right end, copying the top strand of the B arm before the final resolution nick. This bypasses dimer bridge resolution and recycles the top strand into a replicating duplex dimer pool. In a closely related virus, LuIII, the single-strand nick releases a positive-sense strand with its left-end hairpin in the flop orientation. Unlike MVM, LuIII packages strands of both sense with equal frequency. In the negative-sense strands, the left-end hairpins are all in the flip orientation, while in the positive-sense strands, there are an equal number of flip and flop orientations. Compared to MVM, LuIII contains a two-base insertion immediately 3′ of the nick site in the right origin, which impairs its efficiency. Because of this, the reduced efficiency of replication fork assembly in the genome's right end may favor single-strand nicking by giving it more time to occur.

Synthesis of progeny

Individual progeny genomes are excised from genomic replicative concatemers starting by introducing breaks in replication origins, usually by the replication initiator protein. This results in the establishment of new replication forks that replicate the telomeres in a combination of terminal resolution and junction resolution and displaces individual ssDNA genomes from the replicative molecule.^[4] At the end of this process, the telomeres are folded back inwards to form hairpins on excised genomes. The extended-form termini created during excision resemble the extended-form molecules prior to terminal resolution, so they can be melted out and refolded into rabbit ears for additional rounds of replication.^[1] Within an infected cell, numerous replicative concatemers are therefore able to arise.^[4]

Displacement of progeny ssDNA genomes either occurs: predominantly or exclusively during active DNA replication, or when cells are assembling viral particles. Displacement of single strands may therefore be associated with packaging viral DNA into capsids. Earlier research suggested that the preassembled viral particle may sequester the genome in a 5′-to-3′ direction as it is displaced from the fork, but more recent research suggests that packaging is performed in a 3′-to-5′ direction driven by the NS1 helicase using newly synthesized single strands.

It is not clear if these single strands are released into the nucleoplasm so that packaging complexes are physically separate from replication complexes or if the replication intermediates serve as both replication and packaging substrates. In the latter case, newly displaced progeny genomes would be kept in the replication complex via interactions between their 5′-linked NS1 molecules and NS1 or capsid proteins that are physically associated with replicating DNA. Genomes are inserted into the capsid via an entrance called a portal situated at one of the icosahedral 5-fold axes of the capsid,^[2] which is possibly opposite of the opening from which genomes are expelled early in the replication cycle.^[3]

Strand selection for encapsidation likely does not involve specific packaging signals but may be predictable by the Kinetic Hairpin Transfer (KHT) mathematical model, which explains the distribution of the strands and terminal conformations of packaged genomes in terms of the efficiency with which each terminus type can undergo reactions that allow it to be copied and reformed. In other words, the KHT model postulates that the relative efficiency with which two genomic termini are resolved and replicated determines the distribution of amplified replication intermediates created during infection and ultimately the efficiency with which ssDNAs of characteristic polarity and terminal orientations are excised, which will then be packaged with equal efficiency.^[2]

Preferential excision of particular genomes is only apparent during packaging. Therefore, among parvoviruses that package strands of one sense, replication appears to be biphasic. At early times, both sense strands are excised. This is followed by a switch in the replication mode that allows for exclusive synthesis of a single sense for packaging. A modified form of the KHT model, called the preferential strand displacement model, proposes that the aforementioned switch in replication is caused by the onset of packaging because the substrate for packaging is probably a newly displaced DNA molecule. For heterotelomeric parvoviruses, imbalance of origin firing leads to preferential displacement of negative sense strands from the right-end origin. The relative frequency of sense strands in packaged virions can therefore be used to infer the type of resolution mechanism used during excision.^[3]

Shortly after the start of S-phase, translation of viral mRNA leads to the accumulation of capsid proteins in the nucleus. These proteins form into oligomers that are assembled into intact empty capsids. After encapsidation, complete virions may be exported from the nucleus to the exterior of the cell before disintegration of the nucleus. Disruption of the host cell environment may also occur later on in infection. This results in cell lysis via necrosis or apoptosis, which releases virions to the outside of the cell.^[2]

Comparison to rolling circle replication

Many small replicons that have circular genomes such as circular ssDNA viruses and circular plasmids replicate via rolling circle replication (RCR), which is a unidirectional, strand displacement form of DNA replication similar to RHR. In RCR, successive rounds of replication, which proceeds in a loop around the genome, are initiated and terminated by site-specific single-strand nicks made by a replicon-encoded endonuclease, variously called the nickase, relaxase, mobilization protein (mob), transesterase, or replication protein (Rep). The replication initiator protein of parvoviruses is genetically related to these other endonucleases.

RCR initiator proteins contain three motifs considered to be important for replication. Two of these are retained within parvovirus initiator proteins: an HUHUUU cluster, which is presumed to bind to a ion required for nicking, and a YxxxK motif that contains the active-site tyrosine residue that attacks the phosphodiester bond of target DNA. In contrast to RCR initiator proteins, which can join together DNA strands, RHR initiator proteins have only vestigial traces of being able to perform ligation.

RCR begins when the initiator protein nicks a DNA strand at a specific sequence in the replication origin region. This is done through a transesterification reaction that forms a 5′-phosphate bond that connects the DNA to the active-site tyrosine and frees the 3′-end hydroxyl (3′-OH) adjacent to the nick site. The 3′-end is then used as a primer for the host DNA polymerase to begin replication while the initiator protein remains attached to the 5′-end of the "original" strand. After one loop of replication around the circular genome, the initiator protein returns to the nick site, i.e. the original initiator complex, while still attached to the parent strand and attacks the regenerated duplex nick site, or a nearby second site in some cases, by means of a topoisomerase-like nicking-joining reaction.

During the aforementioned reaction, the initiator protein cleaves a new nick site and is transferred across the analogous phosphodiester bond. It thereby becomes attached to the new 5′-end while ligating the 5′-end of the first strand to which it was originally attached to the 3′-end of the same strand. This second mechanism varies depending on the replicon. Some replicons such as the virus ΦX174 contain a second active tyrosine residue in the initiator protein. Others use the analogous active-site tyrosine in a second initiator protein that is present as part of a multimeric nickase complex.

This second nicking reaction may occur after one loop or successive loops may occur in which a concatemer containing multiple copies of the genome is created. The result of this nick is that displaced genomes become detached from the replicative molecule. These copies of the genome are ligated and may either be encapsidated into progeny capsids, provided they are monomeric, or converted to a covalently-closed double-stranded form by a host DNA polymerase for further replication. While RHR generally involves replication of both sense strands in a continuous process, RCR has complementary strand synthesis and genomic strand synthesis occur separately.^[4]

The strategies used in RHR to engage the nick site are also present in RCR. Most RCR origins are in the form of duplex DNA that has to be melted before nicking. RCR initiators accomplish this by binding to specific DNA-binding sequences in the origin next to the initiation site. The latter site is then melted in a process that consumes ATP and which is assisted by the ability of the separated strands to reconfigure into stem-loop structures. In these structures, the nick site is presented on an exposed loop. Like RHR initiator proteins, many RCR initiator proteins contain helicase activity, which allows them to melt the DNA prior to nicking and serve as the 3′-to-5′ helicase in the replication fork.

References

Bibliography

Book: Kerr J, Cotmore S, Bloom ME . 25 November 2005 . Parvoviruses . CRC Press . 171–185 . 9781444114782.

Notes and References

Cotmore SF, Tattersall P . 1996 . Parvovirus DNA replication . Cold Spring Harbor Monograph Archive . 31 . 799–813 . 10.1101/0.799-813 . 1 November 2024 . 14 January 2021.
Cotmore SF, Agbandje-McKenna M, Canuti M, Chiorini JA, Eis-Hubinger AM, Hughes J, Mietzsch M, Modha S, Ogliastro M, Pénzes JJ, Pintel DJ, Qiu J, Soderlund-Venermo M, Tattersall P, Tijssen P . March 2019 . ICTV Virus Taxonomy Profile: Parvoviridae . J Gen Virol . 100 . 3 . 367–368 . 10.1099/jgv.0.001212 . 6537627 . 30672729 . 14 January 2021.
Cotmore SF, Tattersall P . 1 February 2013 . Parvovirus diversity and DNA damage responses . Cold Spring Harb Perspect Biol . 5 . 2 . a012989 . 10.1101/cshperspect.a012989 . 3552509 . 23293137.
Martin DP, Biagini P, Lefeuvre P, Golden M, Roumagnec P, Varsani A . September 2011 . Recombination in eukaryotic single stranded DNA viruses . Viruses. 3 . 9 . 1699–1738 . 10.3390/v3091699 . 3187698 . 21994803. free .
Wawrzyniak P, Plucienniczak G, Bartosik D . 30 November 2017 . The Different Faces of Rolling-Circle Replication and Its Multifunctional Initiator Proteins . Front Microbiol . 8 . 2353 . 10.3389/fmicb.2017.02353 . 5714925 . 29250047. free .
In 2019, the genus Ambidensovirus was split into six genera of the same name prefixed with Aqu-, Blatt-, Hemi-, Pefu-, Proto-, and Scindo-. These genera are included in Cotmore, et al. (2019) under the name of the former genus.
This genus is included in Cotmore, et al. (2019) as the species Orthopteran densovirus 1, which was renamed and assigned as the sole species of this genus.
This genus is included in Cotmore, et al. (2019) under its former name Brevidensovirus.
Lee Q, Padula MP, Pinello N, Williams SH, O'Rourke MB, Fumagalli MJ, Orkin JD, Song R, Shaban B, Brenner O, Pimanda JE, Weninger W, Souza WM, Melin AD, Wong JJ, Crim MJ, Monette S, Roediger B, Jolly CJ . 23 January 2020 . Murine and related chapparvoviruses are nephro-tropic and produce novel accessory proteins in infected kidneys . PLOS Pathog . 16 . 1 . e1008262 . 10.1371/journal.ppat.1008262 . 6999912 . 31971979 . free .
This genus is included in Cotmore, et al. (2019) under its former name Hepadensovirus.
Pénzes JJ, de Souza WM, Agbandje-McKenna M, Gifford RJ . 6 June 2019 . An Ancient Lineage of Highly Divergent Parvoviruses Infects both Vertebrate and Invertebrate Hosts . Viruses. 11 . 6 . 525 . 10.3390/v11060525 . 6631224 . 31174309. free .
This genus is included in Cotmore, et al. (2019) under its former name Penstyldensovirus.
Canuti M, Eis-Huebinger AM, Deijs M, de Vries M, Drexler JF, Oppong SK, Müller MA, Klose SM, Wellinghausen N, Cottontail VM, Kalko EK, Drosten C, van der Hoek L . 2011 . Two novel parvoviruses in frugivorous New and Old World bats . PLOS ONE . 6 . 12 . e29140 . 10.1371/journal.pone.0029140 . 3246463 . 22216187. 2011PLoSO...629140C . free .
Canuti M, Williams CV, Gadi SR, Jebbink MF, Oude Munnink BB, Jazaeri Farsani SM, Cullen JM, van der Hoek L . 1 December 2014 . Persistent viremia by a novel parvovirus in a slow loris (Nycticebus coucang) with diffuse histiocytic sarcoma . Front Microbiol . 5 . 655 . 10.3389/fmicb.2014.00655 . 4249460 . 25520709. free .
Web site: Koonin EV, Dolja VV, Krupovic M, Varsani A, Wolf YI, Yutin N, Zerbini M, Kuhn JH. Create a megataxonomic framework, filling all principal taxonomic ranks, for ssDNA viruses. International Committee on Taxonomy of Viruses. 14 January 2021. en. docx. 18 October 2019.
This genus is included in Kerr, et al. under its former name Parvovirus.