서열 Dot Plot
서열 정렬을 통해 개별적으로 일치하는 뉴클레오타이드를 시각화할 수 있고, 반복 및 인버전을 포함하여 DNA 또는 RNA 서열의 일부 큰 규모의 특징을 가릴 수 있습니다. Dot plot은 2차원 플롯에서 서열 정렬을 보여주고, 여기서 하나의 서열은 X축에 배치되고 다른 서열은 Y축에 배치됩니다. 윈도우 크기를 기준으로 서열의 일부를 정렬하여 분석이 수행되며(기본값은 10개 염기마다 분석), 불일치가 한계 이하인 경우(기본 한계는 0), 도구는 정렬된 X 및 Y 좌표에 점을 배치합니다. 이를 통해 각 10개의 염기 세트를 쿼리 서열과 독립적으로 비교하여, 더 복잡한 관계를 강조할 수 있습니다. 예를 들어 역상보는 녹색 점으로 시각화할 수 있으며, 반복은 여러 개의 누적된 대각선으로 나타납니다. Dot plot은 서열 자체에 정렬하여 서열 내에서 직접 또는 역반복, 프레임시프트, 인버전 및 낮은 복잡성이 있는 영역을 식별하는데 자주 사용됩니다.
How to set and visualize dot plots
When studying differences between genes, proteins, or organisms, sequence comparisons can help to predict structural relationships, functions, and evolutionary changes. Standard sequence alignments compare each nucleotide to similar positions on the query sequence, and it is possible to see mutations, insertions, and deletions on the scale of individual nucleotides. However, other changes including inversions, repeats, and translocations cannot be identified using this approach.
Dot plots are a form of alignment that provides a more global perspective using a matrix output. One sequence is placed along the x axis, and the other along the y axis. Regions of each sequence are compared to the entire query sequence, based on the window size. VectorBuilder’s Dot Plot tool has a default window size of 10, so each set of 10 base pairs is aligned to each region on the query sequence. The mismatch limit determines what is considered “aligned,” and our default setting is 0. If the set of 10 base pairs has 0 mismatches with a section of the sequence, then a dot is placed at the appropriate x and y coordinates. When aligning a sequence to itself, you will typically see a straight diagonal line (Figure 1).
Figure 1. Sequence aligned to itself.
Adjusting the window size and/or the mismatch limit will change the stringency of the alignment. For instance, changing the window size to 5 will mean a higher likelihood of alignment at any given point (Figure 2). This will increase the background in the output, but may highlight more subtle or divergent changes.
Figure 2. Sequence aligned to itself with window size of 5.
Changes that can be observed in sequence alignments can also be seen in this wider perspective, though in less detail. Individual mutations that exceed the mismatch limit will appear as a blank space in the line (a), while deletions and insertions will cause the line to shift (b and c, respectively) (Figure 3).
Figure 3. Sequence with mutations and indels.
Why use dot plots?
A major benefit when using dot plots for alignment is the ability to observe changes that occur across sections of the sequence. Repeats within a sequence will not be highlighted in a standard sequence alignment, but because dot plots align a section of the sequence to the entire query, all areas of alignment are noted. Regions that contain repeats appear as stacked diagonal lines (Figure 4).
Figure 4. Alignment of sequence with itself, containing internal repeats.
Other individual events that can appear as divergence on standard alignment can be appreciated using dot plots. Sequence translocation will show no relationship between the corresponding regions in a sequence alignment (Figure 5A), but will be highlighted on a dot plot (Figure 5B).
Figure 5. Sequences with translocation compared using (A) standard sequence alignment and (B) dot plot.
In addition to “cut and paste” movement, sequences can exhibit inversions or inverted repeats. The latter is utilized in a variety of cloning techniques, including shRNA design. As with translocations, this change appears primarily as mismatches in the Sequence Alignment tool (Figure 6A). However, dot plots allow visualization not only of the forward sequence alignment, but also that of the reverse complement. Red lines show forward alignment, and green show the reverse complement. Here, the green line highlights where an inversion has occurred (Figure 6B).
Figure 6. Sequences with inversion compared using (A) standard sequence alignment and (B) dot plot.
- Sequences in both GenBank and FASTA formats can be recognized.
- Decreasing window size or increasing mismatch limit can reduce stringency to reveal more divergent relationships, but this will increase background noise.