Extremely low mapping rates with bowtie2
0
1
Entering edit mode
5.0 years ago
Sachin ▴ 10

Hi, We have done some sequencing of drosophila dna. I ran fastqc and the results were good except in the duplicate sequences section. I have not done data trimming before alignment. With two sets of data we are getting very different mapping rates. Very low with one:

4378379 reads; of these:
54378379 (100.00%) were unpaired; of these:
51703307 (95.08%) aligned 0 times
1724019 (3.17%) aligned exactly 1 time
951053 (1.75%) aligned >1 times
4.92% overall alignment rate


Better with the other:

64029342 reads; of these:
64029342 (100.00%) were unpaired; of these:
16392556 (25.60%) aligned 0 times
40232444 (62.83%) aligned exactly 1 time
7404342 (11.56%) aligned >1 times
74.40% overall alignment rate


What could the reason for this be?

alignment next-gen • 6.2k views
ADD COMMENT
4
Entering edit mode

One other trick you can try:

samtools view -f 4 mybam.bam | cut -f 10 | sort | uniq -c | sort -nr | head


Will show you the top 10 most common unmapped reads. The command will take some time to finish, but those sequences might be more useful for blasting than randomly chosen unmapped reads.

ADD REPLY
0
Entering edit mode

Did you use correct reference genome for first datasets? Can you please share the command used in the analysis

ADD REPLY
0
Entering edit mode

Yes. I used the same genome for both the datasets. Here is the command I used : bowtie2 -p 12 -x /bowtie2index/dm6 -U File1.fq -S File1.sam

ADD REPLY
3
Entering edit mode

Any time there is unexpected low % mapping, you need to take a small/random selection of reads and blast them at NCBI. If you have a problem with contamination of some sort, it will quickly become apparent.

ADD REPLY
1
Entering edit mode

I did this recently with a mapping rate of 40%. The data was supposed to be mouse data, BLAST matched a subset with mouse and also human. After raising the issue with the sequencing company, they confirmed the sample was contaminated with human DNA (don't even get me started on why they didn't check for this before sending us results!)

ADD REPLY
0
Entering edit mode

Great suggestion genomax!

ADD REPLY
0
Entering edit mode

The alignment for the first sample is pretty shocking (i.e. poor). It's as if the DNA was from a different genus. In fact, I have aligned human DNA to a mouse genome in the past and achieved better alignment.

• Are you using the correct genome version?
• Did you index the genome with the same version of Bowtie that you are using for re-alignment?
• What are your read lengths?
• Which library preparation protocol did you use?
• Are your FASTQ files formatted correctly?
• What is the average base quality in your reads (use FastQC)?
ADD REPLY
0
Entering edit mode

I'm having similar mapping results to dm6 too, some sample is lower than 5% and some is higher than 70%. Did you solve your mapping problem? Any suggestions? Thank you!

ADD REPLY
0
Entering edit mode

What data has been sequenced? genome or transcriptome?

Also, what are you mapping upon? - genome or transcriptome ?

For mapping RNA-seq data onto genome, it is recommended to use HISAT, tophat or STAR aligner

ADD REPLY
0
Entering edit mode

I am having poor alignment rates too (less than 30% with a congeneric species!) Is this what you are supposed to get, considering that my data is GBS (short reads) with a max length of 90 bp (but mostly shorter) ?

Thanks for the help

ADD REPLY
0
Entering edit mode

Probably not. Have you checked some of the non-mapping reads via blast to see what they are?

ADD REPLY

Login before adding your answer.

Traffic: 861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6