Satellite DNA consists of very large arrays of tandemly repeating, non-coding DNA. Satellite DNA is the main component of functional centromeres, and form the main structural constituent of heterochromatin.[1]
The name "satellite DNA" refers to the phenomenon that repetitions of a short DNA sequence tend to produce a different frequency of the bases adenine, cytosine, guanine, and thymine, and thus have a different density from bulk DNA such that they form a second or "satellite" band(s) when genomic DNA is separated along a cesium chloride density gradient using buoyant density centrifugation.[2] Sequences with a greater ratio of A+T display a lower density while those with a greater ratio of G+C display a higher density than the bulk of genomic DNA. Some repetitive sequences are ~50% G+C/A+T and thus have buoyant densities the same as bulk genomic DNA. These satellites are called "cryptic" satellites because they form a band hidden within the main band of genomic DNA. "Isopycnic" is another term used for cryptic satellites.[3]
Satellite DNA, together with minisatellite and microsatellite DNA, constitute the tandem repeats. The size of satellite DNA arrays varies greatly between individuals.[4]
The major satellite DNA families in humans are called:
Satellite family | Size of repeat unit (bp) | Location in human chromosomes | |
---|---|---|---|
α (alphoid DNA) | 170[5] | All chromosomes | |
β | 68 | Centromeres of chromosomes 1, 9, 13, 14, 15, 21, 22, and Y | |
Satellite 1 | 25-48 | Centromeres and other regions in heterochromatin of most chromosomes | |
Satellite 2 | 5 | Most chromosomes | |
Satellite 3 | 5 | Most chromosomes |
A repeated pattern can be between 1 base pair (bp) long (a mononucleotide repeat) to several thousand base pairs long, and the total size of a satellite DNA block can be several megabases without interruption. Long repeat units have been described containing domains of shorter repeated segments and mononucleotides (1-5 bp), arranged in clusters of microsatellites, wherein differences among individual copies of the longer repeat units were clustered. Most satellite DNA is localized to the telomeric or the centromeric region of the chromosome. The nucleotide sequence of the repeats is fairly well conserved across species. However, variation in the length of the repeat is common.
Low-resolution sequencing-based studies have demonstrated variation in human population satellite array lengths as well as in the frequency of certain sequence and structural variations (11–13, 29). However, due to a lack of full centromere assemblies, base-level understanding of satellite array variation and evolution has remained weak. For example, minisatellite DNA is a short region (1-5 kb) of repeating elements with length >9 nucleotides. Whereas microsatellites in DNA sequences are considered to have a length of 1-8 nucleotides. The difference in how many of the repeats is present in the region (length of the region) is the basis for DNA profiling.
Microsatellites are thought to have originated by polymerase slippage during DNA replication. This comes from the observation that microsatellite alleles usually are length polymorphic; specifically, the length differences observed between microsatellite alleles are generally multiples of the repeat unit length.[6]
Satellite DNA adopts higher-order three-dimensional structures in a naturally occurring complex satellite DNA from the land crab Gecarcinus lateralis, whose genome contains 3% of a GC-rich satellite band consisting of a ~2100 bp "repeat unit" sequence motif called RU.[7] [8] The RU was arranged in long tandem arrays with approximately 16,000 copies per genome. Several RU sequences were cloned and sequenced to reveal conserved regions of conventional DNA sequences over stretches greater than 550 bp, interspersed with five "divergent domains" within each copy of RU.
Four divergent domains consisted of microsatellite repeats, biased in base composition, with purines on one strand and pyrimidines on the other. Some contained mononucleotide repeats of C:G base pairs approximately 20 bp in length. These strand-biased microsatellite domains ranged in length from approximately 20 bp to greater than 250 bp. The most prevalent repeated sequences in the embedded microsatellite regions were CT:AG, CCT:AGG, CCCT:AGGG, and CGCAC:GTGCG These repeating sequences were shown to adopt altered structures including triple-stranded DNA, Z-DNA, stem-loop, and other conformations under superhelical stress.[9] [10] [11]
Between the strand-biased microsatellite repeats and C:G mononucleotide repeats, all sequence variations retained one or two base pairs with A (purine) interrupting the pyrimidine-rich strand and T (pyrimidine) interrupting the purine-rich strand. These interruptions in compositional bias adopted highly distorted conformations as shown by their response to structrural nuclease enzymes including S1, P1, and mung bean nucleases.[9]
The most complex compositionally-biased microsatellite domain of RU included the sequence TTAA:TTAA as well as a mirror repeat. It produced the strongest signal in response to nucleases compared to all other altered structures in experimental observations. That particular strand-biased divergent domain was subcloned and its altered helical structure was studied in greater detail.[9]
A fifth divergent domain in the RU sequence was characterized by variations of a symmetrical DNA sequence motif of alternating purines and pyrimidines shown to adopt a left-handed Z-DNA or stem-loop structure under superhelical stress. The conserved symmetrical Z-DNA was abbreviated Z4Z5NZ15NZ5Z4, where Z represents alternating purine/pyrimidine sequences. A stem-loop structure was centered in the Z15 element at the highly conserved palindromic sequence CGCACGTGCG:CGCACGTGCG and was flanked by extended palindromic Z-DNA sequences over a 35 bp region. Many RU variants showed deletions of at least 10 bp outside the Z4Z5NZ15NZ5Z4 structural element, while others had additional Z-DNA sequences lengthening the alternating purine and pyrimidine domain to over 50 bp.[12]
One extended RU sequence (EXT) was shown to have six tandem copies of a 142 bp amplified (AMPL) sequence motif inserted into a region bordered by inverted repeats where most copies contained just one AMPL sequence element. There were no nuclease-sensitive altered structures or significant sequence divergence in the relatively conventional AMPL sequence. A truncated RU sequence (TRU), 327 bp shorter than most clones, arose from a single base change leading to a second EcoRI restriction site in TRU.[7]
Another crab, the hermit crab Pagurus pollicaris, was shown to have a family of AT-rich satellites with inverted repeat structures that comprised 30% of the entire genome. Another cryptic satellite from the same crab with the sequence CCTA:TAGG[13] [14] [15] was found inserted into some of the palindromes.[16]