How does bowtie treat a masked 'N' nucleotide?
1
0
Entering edit mode
3.4 years ago
johnsonn573 ▴ 10

If a read overlaps a region perfectly except for a single masked base pair position, does bowtie consider that position a mismatch in the read? Or does bowtie ignore that position, and consider it a perfect alignment?

I'm thinking about aligning to a reference genome that is hard-masked to SNPs, and I'm wondering if this will cause fewer reads to align due to more mismatches.

bowtie alignment mask • 2.1k views
ADD COMMENT
0
Entering edit mode

How long are your reads?

ADD REPLY
0
Entering edit mode

The reads are 50 base pairs.

ADD REPLY
2
Entering edit mode
3.4 years ago
ATpoint 81k

From the manual: http://bowtie-bio.sourceforge.net/manual.shtml

Alignments involving one or more ambiguous reference characters (N, -, R, Y, etc.) are considered invalid by Bowtie. This is true only for ambiguous characters in the reference; alignments involving ambiguous characters in the read are legal, subject to the alignment policy. Ambiguous characters in the read mismatch all other characters. Alignments that “fall off” the reference sequence are not considered valid.

So any ambiguous character in the reference overlapping potentials alignment locations will render the read unmapped from what I unerstand.

You can easily explore this by simply making a dummy reference genome, e.g.:

#/ Dummy genomes:
echo ">dummy TAGCTGCGCGCTACGATCGATCGACTGATCAGCGGCTNTAGCTGTACATGCA" | tr " " "\n" > dummy_N.fa
echo ">dummy TAGCTGCGCGCTACGATCGATCGACTGATCAGCGGCTCTAGCTGTACATGCA" | tr " " "\n" > dummy_no_N.fa

#/ Index:
for i in dummy*.fa; do bowtie-build $i $i; done

#/ Dummy read (same as the dummy_N.fa sequence)
echo ">dummy_read TAGCTGCGCGCTACGATCGATCGACTGATCAGCGGCTNTAGCTGTACATGCA" | tr " " "\n" > dummy_read.fa

bowtie -f dummy_N.fa dummy_read.fa
# reads processed: 1
# reads with at least one alignment: 0 (0.00%)
# reads that failed to align: 1 (100.00%)
No alignments

bowtie -f dummy_no_N.fa dummy_read.fa
# reads processed: 1
# reads with at least one alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 1 alignments
ADD COMMENT
0
Entering edit mode

My whole interest in this question is because I'm interested in allele-specific expression, and I was wondering if I could mask SNPs in my reference so as to not bias the alignment of the reads toward the reference. But I see that if I replace SNPs with N in the reference, no reads will overlap the SNPs.

When bowtie is aligning a read overlapping a SNP site, is there a way to make bowtie align a read with A, C, G, or T at that SNP site without considering any nucleotide a mismatch?

ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6