Question: Extremely low mapping rates with bowtie2
gravatar for Sachin
2.5 years ago by
Sachin10 wrote:

Hi, We have done some sequencing of drosophila dna. I ran fastqc and the results were good except in the duplicate sequences section. I have not done data trimming before alignment. With two sets of data we are getting very different mapping rates. Very low with one:

4378379 reads; of these:
  54378379 (100.00%) were unpaired; of these:
    51703307 (95.08%) aligned 0 times
    1724019 (3.17%) aligned exactly 1 time
    951053 (1.75%) aligned >1 times
4.92% overall alignment rate

Better with the other:

64029342 reads; of these:
  64029342 (100.00%) were unpaired; of these:
    16392556 (25.60%) aligned 0 times
    40232444 (62.83%) aligned exactly 1 time
    7404342 (11.56%) aligned >1 times
74.40% overall alignment rate

What could the reason for this be?

next-gen alignment • 2.9k views
ADD COMMENTlink modified 2.4 years ago by Biostar ♦♦ 20 • written 2.5 years ago by Sachin10

One other trick you can try:

samtools view -f 4 mybam.bam | cut -f 10 | sort | uniq -c | sort -nr | head

Will show you the top 10 most common unmapped reads. The command will take some time to finish, but those sequences might be more useful for blasting than randomly chosen unmapped reads.

ADD REPLYlink written 2.5 years ago by swbarnes27.5k

Did you use correct reference genome for first datasets? Can you please share the command used in the analysis

ADD REPLYlink written 2.5 years ago by Renesh1.8k

Yes. I used the same genome for both the datasets. Here is the command I used : bowtie2 -p 12 -x /bowtie2index/dm6 -U File1.fq -S File1.sam

ADD REPLYlink written 2.5 years ago by Sachin10

Any time there is unexpected low % mapping, you need to take a small/random selection of reads and blast them at NCBI. If you have a problem with contamination of some sort, it will quickly become apparent.

ADD REPLYlink written 2.5 years ago by genomax80k

I did this recently with a mapping rate of 40%. The data was supposed to be mouse data, BLAST matched a subset with mouse and also human. After raising the issue with the sequencing company, they confirmed the sample was contaminated with human DNA (don't even get me started on why they didn't check for this before sending us results!)

ADD REPLYlink written 2.4 years ago by YaGalbi1.5k

Great suggestion genomax!

ADD REPLYlink written 2.5 years ago by Kevin Blighe56k

The alignment for the first sample is pretty shocking (i.e. poor). It's as if the DNA was from a different genus. In fact, I have aligned human DNA to a mouse genome in the past and achieved better alignment.

  • Are you using the correct genome version?
  • Did you index the genome with the same version of Bowtie that you are using for re-alignment?
  • What are your read lengths?
  • Which library preparation protocol did you use?
  • Are your FASTQ files formatted correctly?
  • What is the average base quality in your reads (use FastQC)?
ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Kevin Blighe56k

I'm having similar mapping results to dm6 too, some sample is lower than 5% and some is higher than 70%. Did you solve your mapping problem? Any suggestions? Thank you!

ADD REPLYlink written 11 months ago by Jingyue30

What data has been sequenced? genome or transcriptome?

Also, what are you mapping upon? - genome or transcriptome ?

For mapping RNA-seq data onto genome, it is recommended to use HISAT, tophat or STAR aligner

ADD REPLYlink written 2.5 years ago by lakhujanivijay4.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1992 users visited in the last hour