What happens during seed stitching when STAR initially gets the wrong MMP?
1
0
Entering edit mode
6.9 years ago
eric.kern13 ▴ 240

Suppose I have the following read:

AAGGAAGGAAGGAAGGACTTCCTT

I want to align it to one of these two reference sequences:

AAGGAAGGAAGGAAGGACAAGGAA
AAGGAAGGAAGGAAGGCCTTCCTT

Clearly, the second reference sequence is where the read belongs: there's only one mismatch. The maximum mappable prefix, though, is in the first read (AAGGAAGGAAGGAAGGA). The remainder of the read would then be aligned correctly.

The STAR paper says "If the MMP search does not reach the end of a read because of the presence of one or more mismatches, the MMPs will serve as anchors in the genome that can be extended, allowing for alignments with mismatches." Does that mean STAR would search over full alignments implied by each seed and thus align the entire read correctly? If so, does this step of STAR tolerate indels?

rna-seq STAR • 2.1k views
ADD COMMENT
0
Entering edit mode
6.9 years ago

"Given a read sequence R, read location i and a reference genome sequence G, the MMP(R,i,G) is defined as the longest substring (Ri, Ri+ 1, … , Ri+MML− 1) that matches exactly one or more substrings of G"

If I understood correctly, the MMP is for per genome. So STAR should get different MMPs for each of them, then extend it according to GAP/MISMATCH penalty. Finally, the best scoring match should be reported, which is the second one.

ADD COMMENT

Login before adding your answer.

Traffic: 2971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6