Question: Extremely low mapping rates with bowtie2
1
gravatar for Sachin
19 months ago by
Sachin10
Sachin10 wrote:

Hi, We have done some sequencing of drosophila dna. I ran fastqc and the results were good except in the duplicate sequences section. I have not done data trimming before alignment. With two sets of data we are getting very different mapping rates. Very low with one:

4378379 reads; of these:
  54378379 (100.00%) were unpaired; of these:
    51703307 (95.08%) aligned 0 times
    1724019 (3.17%) aligned exactly 1 time
    951053 (1.75%) aligned >1 times
4.92% overall alignment rate

Better with the other:

64029342 reads; of these:
  64029342 (100.00%) were unpaired; of these:
    16392556 (25.60%) aligned 0 times
    40232444 (62.83%) aligned exactly 1 time
    7404342 (11.56%) aligned >1 times
74.40% overall alignment rate

What could the reason for this be?

next-gen alignment • 1.9k views
ADD COMMENTlink modified 18 months ago by Biostar ♦♦ 20 • written 19 months ago by Sachin10
2

One other trick you can try:

samtools view -f 4 mybam.bam | cut -f 10 | sort | uniq -c | sort -nr | head

Will show you the top 10 most common unmapped reads. The command will take some time to finish, but those sequences might be more useful for blasting than randomly chosen unmapped reads.

ADD REPLYlink written 19 months ago by swbarnes25.5k

Did you use correct reference genome for first datasets? Can you please share the command used in the analysis

ADD REPLYlink written 19 months ago by Renesh1.6k

Yes. I used the same genome for both the datasets. Here is the command I used : bowtie2 -p 12 -x /bowtie2index/dm6 -U File1.fq -S File1.sam

ADD REPLYlink written 19 months ago by Sachin10
3

Any time there is unexpected low % mapping, you need to take a small/random selection of reads and blast them at NCBI. If you have a problem with contamination of some sort, it will quickly become apparent.

ADD REPLYlink written 19 months ago by genomax67k
1

I did this recently with a mapping rate of 40%. The data was supposed to be mouse data, BLAST matched a subset with mouse and also human. After raising the issue with the sequencing company, they confirmed the sample was contaminated with human DNA (don't even get me started on why they didn't check for this before sending us results!)

ADD REPLYlink written 18 months ago by YaGalbi1.4k

Great suggestion genomax!

ADD REPLYlink written 19 months ago by Kevin Blighe42k

The alignment for the first sample is pretty shocking (i.e. poor). It's as if the DNA was from a different genus. In fact, I have aligned human DNA to a mouse genome in the past and achieved better alignment.

  • Are you using the correct genome version?
  • Did you index the genome with the same version of Bowtie that you are using for re-alignment?
  • What are your read lengths?
  • Which library preparation protocol did you use?
  • Are your FASTQ files formatted correctly?
  • What is the average base quality in your reads (use FastQC)?
ADD REPLYlink modified 19 months ago • written 19 months ago by Kevin Blighe42k

I'm having similar mapping results to dm6 too, some sample is lower than 5% and some is higher than 70%. Did you solve your mapping problem? Any suggestions? Thank you!

ADD REPLYlink written 4 weeks ago by Ellie20

What data has been sequenced? genome or transcriptome?

Also, what are you mapping upon? - genome or transcriptome ?

For mapping RNA-seq data onto genome, it is recommended to use HISAT, tophat or STAR aligner

ADD REPLYlink written 19 months ago by Vijay Lakhujani4.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1130 users visited in the last hour