bowtie: poor mapping with high quality reads
2
0
Entering edit mode
6.8 years ago

Apologies for my inexperience with bowtie.

I have a series of map files all containing reads with very consistent mapping quality: ~35-40.

enter image description here

If not showing go to https://ibb.co/j1fOza

However, when I map them with a fairly generic bowtie command:

bowtie -t -v 2 -p 8 --solexa-quals hg19 -1 end1.fastq -2 end2.fastq out.map

I get consistently poor alignment rates:

the highest is: reads with at least one reported alignment: 537070 (11.58%)

the lowest is: reads with at least one reported alignment: 53707 (0.01%)

There is no documentation on the experiment specifying whether a primer is present in each of these reads. and I am certain it is hg19.

As you can see from the above picture there there is a dip in quality in the first 5 base pairs of the concerned read. This dip is present in all of the reads that I am studying- I thought to get rid of these using the 'Trimmomaster' tools:

fastq_quality_trimmer -t 36 -i end1.fastq -o end1_trim.fastq
fastq_quality_trimmer -t 36 -i end2.fastq -o end2_trim.fastq

However the mappings that resulted from these trimmed reads were consistently even poorer than the originals....

Can anyone critique my use of bowtie to see if I can fix this?

alignment bowtie • 2.3k views
ADD COMMENT
1
Entering edit mode

--solexa-quals

Unless this data is ancient (in NGS terms) it is unlikely to be in solexa (phred+64) format. You are also using an aligner that does not allow gapped alignments. I suggest that you give bbmap.sh from BBMap suite a try instead of bowtie.

ADD REPLY
0
Entering edit mode

Try taking some of the unmapped reads and do a blastn. Afterall, it could be a lot of issues. I've gotten data for someone elses samples before, so rule out that possibility first.

ADD REPLY
0
Entering edit mode

The "dip" is expected in Illumina machines, since the phred score of a base depends on that of the preceding bases and that won't exist at the beginning of reads. Try local alignment instead, bowtie2, and playing with --score-min if needed. Do blast a few reads though too, as suggested by mforde84 .

ADD REPLY
2
Entering edit mode
6.8 years ago

There are lots of potential problems here. For one thing, how did you get 123bp reads? Are they preprocessed in some way? What platform are they from, and what year? What kind of experiment is it? And why are you using Bowtie1 on such long reads?

You do not need to trim the first 5bp; the dip in claimed quality scores for those bases is false. You may or may not need to do trimming, but the first thing you need to do is use the proper aligner; bowtie1 is fairly good for really short reads (30bp and less), but not for longer reads. Try bowtie2 instead. Also, pairs should never be trimmed independently, only together (E.g., using BBDuk) or the pairing gets broken. Also, you are probably setting the quality score flag incorrectly. All modern reads use Sanger (ASCII-33) quality scores, but you specified old Illumina (ASCII-64), so yeah, the trimming is butchering the data.

ADD COMMENT
0
Entering edit mode

Marked accepted since OP didn't.

ADD REPLY
0
Entering edit mode
6.8 years ago

As per Brian Bushnell's suggestion: bowtie2 greatly increased the rate of alignment.

bowtie2 -x hg19 --very-fast -p 8 -1 end1.fastq -2 end2.fastq -S out.sam

Thank you all for your suggestions

ADD COMMENT
0
Entering edit mode

I have moved his reaction to an answer so you can accept it and mark this question as resolved.

ADD REPLY

Login before adding your answer.

Traffic: 2080 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6