Question

bowtie: poor mapping with high quality reads

0

Entering edit mode

6.8 years ago

chrisclarkson100 ▴ 150

Apologies for my inexperience with bowtie.

I have a series of map files all containing reads with very consistent mapping quality: ~35-40.

enter image description here

If not showing go to https://ibb.co/j1fOza

However, when I map them with a fairly generic bowtie command:

bowtie -t -v 2 -p 8 --solexa-quals hg19 -1 end1.fastq -2 end2.fastq out.map

I get consistently poor alignment rates:

the highest is: reads with at least one reported alignment: 537070 (11.58%)

the lowest is: reads with at least one reported alignment: 53707 (0.01%)

There is no documentation on the experiment specifying whether a primer is present in each of these reads. and I am certain it is hg19.

As you can see from the above picture there there is a dip in quality in the first 5 base pairs of the concerned read. This dip is present in all of the reads that I am studying- I thought to get rid of these using the 'Trimmomaster' tools:

fastq_quality_trimmer -t 36 -i end1.fastq -o end1_trim.fastq
fastq_quality_trimmer -t 36 -i end2.fastq -o end2_trim.fastq

However the mappings that resulted from these trimmed reads were consistently even poorer than the originals....

Can anyone critique my use of bowtie to see if I can fix this?

alignment bowtie • 2.3k views

ADD COMMENT • link 6.8 years ago by chrisclarkson100 ▴ 150

1

Entering edit mode

--solexa-quals

Unless this data is ancient (in NGS terms) it is unlikely to be in solexa (phred+64) format. You are also using an aligner that does not allow gapped alignments. I suggest that you give bbmap.sh from BBMap suite a try instead of bowtie.

ADD REPLY • link 6.8 years ago by GenoMax 141k

0

Entering edit mode

Try taking some of the unmapped reads and do a blastn. Afterall, it could be a lot of issues. I've gotten data for someone elses samples before, so rule out that possibility first.

ADD REPLY • link 6.8 years ago by mforde84 ★ 1.4k

0

Entering edit mode

The "dip" is expected in Illumina machines, since the phred score of a base depends on that of the preceding bases and that won't exist at the beginning of reads. Try local alignment instead, bowtie2, and playing with --score-min if needed. Do blast a few reads though too, as suggested by mforde84 .

ADD REPLY • link 6.8 years ago by Devon Ryan 104k

0

Entering edit mode

6.8 years ago

chrisclarkson100 ▴ 150

As per Brian Bushnell's suggestion: bowtie2 greatly increased the rate of alignment.

bowtie2 -x hg19 --very-fast -p 8 -1 end1.fastq -2 end2.fastq -S out.sam

Thank you all for your suggestions

ADD COMMENT • link 6.8 years ago by chrisclarkson100 ▴ 150

0

Entering edit mode

I have moved his reaction to an answer so you can accept it and mark this question as resolved.

ADD REPLY • link 6.8 years ago by WouterDeCoster 47k

score 2 · Accepted Answer · 2017-07-05

There are lots of potential problems here. For one thing, how did you get 123bp reads? Are they preprocessed in some way? What platform are they from, and what year? What kind of experiment is it? And why are you using Bowtie1 on such long reads?

You do not need to trim the first 5bp; the dip in claimed quality scores for those bases is false. You may or may not need to do trimming, but the first thing you need to do is use the proper aligner; bowtie1 is fairly good for really short reads (30bp and less), but not for longer reads. Try bowtie2 instead. Also, pairs should never be trimmed independently, only together (E.g., using BBDuk) or the pairing gets broken. Also, you are probably setting the quality score flag incorrectly. All modern reads use Sanger (ASCII-33) quality scores, but you specified old Illumina (ASCII-64), so yeah, the trimming is butchering the data.