Bowtie2 alignment scores for reads that "go beyond" the end of the reference
0
0
Entering edit mode
5.0 years ago
matthias • 0

I'd like to map metagenomes (2 x 150 bp reads) against selected genes (500 - 3000 bp in length) using Bowtie2. I expect it might happen often then that reads only map partially as they "go beyond" the reference gene as illustrated in the example below.

Read:                                 CTCGACTGGGGGATCTCGATTTCGTAGACGGCTTGCGGTT...
                                         ||||||| |||||||| |||||||
Reference Gene:   ...CGTTGACGTTAGAAGGGAAGGACTGGGCGATCTCGACTTCGTAG (end of gene)

I'm wondering if bases of the read that are beyond the end of the gene are treated as mismatches, i.e. penalties are added to the alignment score for these?

My desired behavior would be that the read is treated as if it would be only as long as the overlapping region (27 bases in the above example). Bases on the right end of the read in the example shall be clipped, but not the three bases on the left end of the read.

Does anyone now if this can be set in Bowtie2, or possibly if this is even the default behavior with --end-to-end mode?

I know that there is the --local mode, but this is not exactly what I want as it may clip also the left three bases of the read. Maybe there are settings I could use in combination with --local to achieve the desired behavior?

alignment mapping Bowtie2 • 1.4k views
ADD COMMENT
0
Entering edit mode

Part of the read that is not mapped would be soft-clipped. Sounds like that is what you want?

ADD REPLY
0
Entering edit mode

I think my question was unclear so I amended it a bit. So yes, I'd like the read to be clipped. However, referring to the above example, only it's right end but not it's left end shall be clipped.

ADD REPLY
0
Entering edit mode

Left-end should be soft-clipped as well. Have you looked at the CIGAR strings in your alignment file?

ADD REPLY
0
Entering edit mode

Actually I haven't run the mapping yet. I'd just like to find out which settings are reasonable for mapping against single genes, and start the mapping then.

ADD REPLY
0
Entering edit mode

How many genes are you going to map against? Is this a targeted/capture experiment? You always run the risk of forcing reads to align in places where they should not if you used a reduced reference and the original experiment was done with full genome.

ADD REPLY
0
Entering edit mode

I'm planning to use about 100 single copy genes from a reference genome (bacterial isolate) and map different environmental metagenomes against them. If I use stringent mapping conditions (or eventually filter mapped reads afterwards), I don't expect reads that would e.g. align better to a different region of the genome to map to the reference gene.

ADD REPLY
0
Entering edit mode

Proof is going to be in the pudding so to speak. Let us know how you fare on the alignments. Results would be hard to predict since we have no idea what is in your samples and what fraction of the data will align to the genes you start with. Do keep track of multi-mapping reads just in case.

ADD REPLY
0
Entering edit mode

Sure. My question was not really about predicting results but rather how Bowtie2 scores an alignment as shown above.

ADD REPLY
0
Entering edit mode

If you are looking for specifics on how that particular alignment would be scored (in terms of MAPQ) then you may need to dive into code. You will want to read this blog post about mapping qualities.

I think what you are thinking of is a score like blast+ alignment. That concept is not directly transferable to NGS alignments.

ADD REPLY
0
Entering edit mode

I was not referring to the mapping quality but the alignment score (AS tag in the sam file). I really should have been more specific in my question, I changed it a bit further. To calculate the alignment score, Bowtie2 (default) uses e.g. the following penalties:

mismatch: 2 - 6, gap open: 5, gap extend: 3

The minimum alignment score needed for an alignment to be considered “valid” can be set using --score-min. Now my question is if Bowtie2 scores all the bases at the right end of the read in my example as mismatches? If so, the alignment will never be considered valid with a reasonable score-min. If I use --local mode these bases will be clipped, but then I also loose the penalties of the three left-end bases, which I'd like to avoid. Anyway, I’ll probably just try around with a small set of made-up sequences to see what happens.

ADD REPLY

Login before adding your answer.

Traffic: 2616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6