AIDS ("acquired immune deficiency syndrome") is caused by the human immunodeficiency virus (HIV). Individuals with HIV have what is referred to as a "HIV infection". When infected semen, vaginal secretions, or blood come in contact with the mucous membranes or broken skin of an uninfected person, HIV may be transferred to the uninfected person ("horizontal transfer"), causing another infection. Additionally, HIV can also be passed from infected pregnant women to their uninfected baby during pregnancy and/or delivery ("vertical transmission"), or via breastfeeding. As a result of HIV infection, a portion of these individuals will progress and go on to develop clinically significant AIDS.
HIV is a retrovirus, which comprise a large and diverse family of RNA viruses that make a DNA copy of their RNA genome after infection of a host cell. An essential step in the replication cycle of HIV-1 and other retroviruses is the integration of this viral DNA into the host DNA. The RNA genome of progeny virions and the template for translation of viral proteins are made when the integrated viral DNA is transcribed.
The integration of HIV DNA into the host DNA is a critical step in the HIV life cycle. Understanding the integration process will provide a framework for gaining insight into multiple potential sites of therapeutic intervention for HIV infection and AIDS. HIV's enzyme for inserting the DNA version of its genome into the host cell DNA is called its "integrase". HIV-1 integrase catalyzes the “cut-and-paste” action of clipping the host DNA and joining the proviral genome to the clipped ends. This protein, which is 288 amino acids in length, contains three “domains”, in this order:
1. Amino (N)-terminal domain: Sometimes referred to as a "zinc finger", the N-terminal domain is composed of the conserved HHCC, His, and Cys residues, a motif that serves to bind zinc. The function of the N-terminal domain is not completely clear, but is thought to assist the integrase in forming multimers (fixed agglomerations of multiple integrase molecules).
2. The central catalytic domain (or "catalytic core"): The catalytic core encompasses the DDE catalytic triad of amino acids, or acid residues, that manage binding with a divalent metal (usually Mg2+ or Mn2−), forming the active catalytic site. In the case of HIV-1 integrase, the residues are Asp64, Asp116, and Glu152. This domain is also well conserved during evolution.
The HIV-1 catalytic domain appears dimeric in solution and in crystal structures. The vast surface area of the dimer interface indicates that it is biologically significant. The insertion sites on each strand of target DNA are separated by 5 base pairs, which parallel to approximately 15 Å for helical B-form DNA, implying that the catalytic domain (or the functional unit) of integrase should contain a pair of active sites separated by a like spacing. This said, the spacing among the active sites in the virtually spherical dimer is, however, apparently not very well-matched with the spacing among the insertion sites on the two strands of target DNA, as examination of crystal structures appears to reveal that the active sites in the dimers are separated by more than 30 Å when measured in a straight line through the proteins, and by an even greater distance when measured around the circumference of the dimer. Under the assumption that the dimer interface is preserved in the functional integrase multimer, at minimum a tetramer of integrase must be required for the complete integration reaction to proceed.
3. The Carboxy (C)- terminal domain : The C-terminal domain non-specifically binds DNA. Since the sites of integration into the target DNA are relatively non-specific, it is thought that this domain may work together in some fashion with the target DNA. Information retrieved from experiments with chimeric integrases show that recognition of the target site is controlled by the core domain. Cross-linking studies also suggest that the C-terminal domain works together with a subterminal region just inside the very ends of the viral DNA.
During the integration process, the HIV integrase enzyme performs two key catalytic reactions. First is the 3’ processing of the HIV DNA, followed by strand transfer of the HIV DNA into the host DNA. The integration of HIV DNA can occur either in dividing or resting cells, and the HIV integrase enzyme can exist in the form of a monomer, dimer, tetramer, and possibly even higher-order forms (such as octomers). Each HIV particle has an estimated 40 to 100 copies of the integrase enzyme.
Integrase functions are unique to retroviruses; human cells are not required to cut-and-paste pieces of DNA into the genome. For this reason, integrase inhibitors are prime targets for developing drug therapies for HIV infection and AIDS, since inhibition of integrase should not hamper the normal operations in human cells.
HIV integration is the insertion of HIV genetic material into the genome of the infected cell.[1] The process of HIV integration involves six sequential steps:
The first step of the integration process occurs in the cytoplasm of the host cell following the completion of reverse transcription of the HIV RNA into c-DNA. This step involves the binding of integrase - most likely in the dimer form - to each end of the newly formed HIV c-DNA. The binding takes place at specific sequences in the long terminal repeat regions. The integrase-HIV DNA complex is part of an intracellular nucleoprotein particle known as the "preintegration complex" (PIC). This complex consists of linear HIV DNA, viral proteins, and host proteins. The viral proteins include integrase, nucleocapsid, matrix, viral protein R (Vpr), and reverse transcriptase. Several host proteins can also form part of this complex, although it is unclear whether some or all join the preintegration complex prior to nuclear transport.
In the second step of the integration process, which also takes place in the host cytoplasm, the integrase dimer cleaves the viral DNA at each 3’ end. This cleavage reaction removes GT dinucleotides on the 3’-side of a conserved CA dinucleotide region. The cleavage of the dinucleotide at each viral DNA 3’-end generates a dinucleotide 5’ "overhang" and a reactive intermediate that contains a 3’-hydroxyl group. This 3’ processing step is the first of two key catalytic reactions performed by the integrase enzyme, and it prepares the viral DNA for integration into the host DNA. In an alternative view of the DNA binding and 3’-processing reaction, the tetramer form of integrase (not the dimer) binds to the ends of the HIV DNA, and then cleaves the 3’ ends.
In the third step of the integration process, the preintegration complex is transported into the nucleus of the host cell, entering through one of the nuclear pore complexes.
Inside the nucleus, the host protein lens epithelium-derived growth factor/p75, commonly referred to in abbreviated form as LEDGF/p75, binds to the preintegration complex and the host DNA. The LEDGF/p75 serves as a tethering protein (or bridge) between the preintegration complex and the host DNA. The sequence of binding of the LEDGF/p75, the host DNA, and the preintegration complex remains unclear. In one version, the LEDGF/p75 binds first to the preintegration complex and then to the host DNA. On the other hand, LEDGF/p75 may bind first to the host DNA and then to the preintegration complex. Regardless of the sequence, it is believed that the presence of LEDGF/p75 results in the integrase dimers approaching each other to form a tetramer.
The next step, the strand transfer reaction, takes place inside the host cell nucleus and involves the critical step of inserting the HIV DNA into a selected region of the host DNA. The region of insertion contains a weakly conserved palindromic sequence. This strand transfer reaction is initiated as the HIV integrase catalyzes the HIV DNA 3’-hydroxyl group attack on the host DNA. The attack by the HIV DNA occurs on opposite strands of the host DNA in a staggered fashion, typically 4-6 base pairs apart. This reaction leads to separation of the bonds in the host DNA base pairs located between the staggered cuts, and the joining of the HIV 3’-hydroxyl groups with the host DNA 5’ phosphate ends. At this point, the newly joined viral-host DNA region unfolds.
Following the strand transfer process, the HIV-DNA and host DNA junctions have unpaired regions of DNA, referred to as DNA "gaps". In addition, the two base pairs at the end of the 5’ region of the viral DNA remain unpaired after the strand transfer. The insertion of the new HIV DNA and the remaining gaps that flank the integration site is currently thought to induce a host cellular DNA damage response, but much of this mechanism remains speculative. The host DNA damage response is thought to be critical in the final step of integration, known as "gap repair" and may require at least three host enzymes - polymerase, nuclease, and ligase. In the first step of gap repair, it is thought that the polymerase enzyme(s) extend(s) the host DNA on each end and, thus, fill in the gaps. Next, it is possible that host nuclease activity removes the 5’ dinucleotide "flaps" on the HIV DNA. Lastly, it is thought that the DNA ligase enzymes join the remaining unbound segment of the HIV and host DNA strands. Currently this mechanism is largely validated experimentally and is an area under investigation. This gap repair process completes the integration of the HIV DNA into the host DNA, with the fully integrated HIV DNA now being referred to as "proviral DNA".
Recent study suggested that HIV-1 prefers integration into highly spliced genes or genes with more introns.[2] This preference depends on the host chromatin binding protein LEDGF/p75 that interacts with many splicing factors. Another study showed that HIV-1 preference for highly spliced genes depends on another host factor CPSF6, a poly adenylation factor.[3] Cancer genes with high number of introns are highly targeted by HIV-1.