Subsequence Explained

In mathematics, a subsequence of a given sequence is a sequence that can be derived from the given sequence by deleting some or no elements without changing the order of the remaining elements. For example, the sequence

\langleA,B,D\rangle

is a subsequence of

\langleA,B,C,D,E,F\rangle

obtained after removal of elements

and

The relation of one sequence being the subsequence of another is a partial order.

Subsequences can contain consecutive elements which were not consecutive in the original sequence. A subsequence which consists of a consecutive run of elements from the original sequence, such as

\langleB,C,D\rangle,

from

\langleA,B,C,D,E,F\rangle,

is a substring. The substring is a refinement of the subsequence.

The list of all subsequences for the word "apple" would be "a", "ap", "al", "ae", "app", "apl", "ape", "ale", "appl", "appe", "aple", "apple", "p", "pp", "pl", "pe", "ppl", "ppe", "ple", "pple", "l", "le", "e", "" (empty string).

Common subsequence

Given two sequences

and

a sequence

is said to be a common subsequence of

and

is a subsequence of both

and

For example, if

X = \langle A,C,B,D,E,G,C,E,D,B,G \rangle \qquad \text

Y = \langle B,E,G,J,C,F,E,K,B \rangle \qquad \text

Z = \langle B,E,E \rangle.

then

is said to be a common subsequence of

and

This would be the longest common subsequence, since

only has length 3, and the common subsequence

\langleB,E,E,B\rangle

has length 4. The longest common subsequence of

and

\langleB,E,G,C,E,B\rangle.

Applications

Subsequences have applications to computer science,^[1] especially in the discipline of bioinformatics, where computers are used to compare, analyze, and store DNA, RNA, and protein sequences.

Take two sequences of DNA containing 37 elements, say:

SEQ₁ = ACGGTGTCGTGCTATGCTGATGCTGACTTATATGCTA

SEQ₂ = CGTTCGGCTATCGTACGTTCTATTCTATGATTTCTAA

The longest common subsequence of sequences 1 and 2 is:

LCS_(SEQ1,SEQ₂) = CGTTCGGCTATGCTTCTACTTATTCTA

This can be illustrated by highlighting the 27 elements of the longest common subsequence into the initial sequences:

SEQ₁ = AGGTGAGGAG

SEQ₂ = CTAGTTAGTA

Another way to show this is to align the two sequences, that is, to position elements of the longest common subsequence in a same column (indicated by the vertical bar) and to introduce a special character (here, a dash) for padding of arisen empty subsequences:

SEQ₁ = ACGGTGTCGTGCTAT-G--C-TGATGCTGA--CT-T-ATATG-CTA-

| || ||| ||||| | | | | || | || | || | |||

SEQ₂ = -C-GT-TCG-GCTATCGTACGT--T-CT-ATTCTATGAT-T-TCTAA

Subsequences are used to determine how similar the two strands of DNA are, using the DNA bases: adenine, guanine, cytosine and thymine.

Theorems

Every infinite sequence of real numbers has an infinite monotone subsequence (This is a lemma used in the proof of the Bolzano–Weierstrass theorem).
Every infinite bounded sequence in

\Rⁿ

has a convergent subsequence (This is the Bolzano–Weierstrass theorem).

For all integers

and

every finite sequence of length at least

(r-1)(s-1)+1

contains a monotonically increasing subsequence of length

a monotonically decreasing subsequence of length

(This is the Erdős–Szekeres theorem).

A metric space

(X,d)

is compact if every sequence in

has a convergent subsequence whose limit is in

Notes and References

In computer science, string is often used as a synonym for sequence, but it is important to note that substring and subsequence are not synonyms. Substrings are consecutive parts of a string, while subsequences need not be. This means that a substring of a string is always a subsequence of the string, but a subsequence of a string is not always a substring of the string, see: Book: Gusfield, Dan . 1997 . 1999 . Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology . Cambridge University Press . USA . 0-521-58519-8 . 4.