What does the Bowtie2 "overall alignment rate" mean?
1
0
Entering edit mode
5.7 years ago

Is the Bowtie2 "overall alignment rate"...

  1. The percentage of reads that actually aligned to the reference genome? By that I mean, is this percent the same as saying that "only 75% of the reference genome was covered by your reads?"

or

  1. The percentage of fastq data that aligned to the reference genome? So lets say that "100% of the reference genome is covered by the reads, but only 75% of the fastq files were used to cover the reference genome. So the remainder of the fastq data must belong to something else."

I recently purchased a known strain of Candida albicans from a vendor and sequenced its genome using an Illumina MiSeq. I aligned the paired-end fastq files to the strain reference genome and got an overall alignment rate of ~ 43%.

I viewed the BAM results on IGV and from a visual perspective there appeared to heavy coverage across the reference genome, which is why I am questioning the meaning of this "overall alignment rate." Does that mean that 100% of the reference could be covered by the paired-end fastq data and the other ~ 57% could be other DNA? Or does that mean that our sequencing run didn't capture ~57% of the Candida albicans genome?

alignment • 9.4k views
ADD COMMENT
4
Entering edit mode
5.7 years ago
h.mon 35k

Does that mean that 100% of the reference could be covered by the paired-end fastq data and the other ~ 57% could be other DNA? Or does that mean that our sequencing run didn't capture ~57% of the Candida albicans genome?

Neither: Bowtie2 final output message just states the percentage of reads from the fastq that mapped the reference genome. So 43% of your reads mapped to the reference genome, and 57% didn't map. This 57% could be contaminants and / or adapters, or maybe your strain is somewhat divergent from the reference strain genome - with just percentage mapping, we can't know. Did you check for adapters? Did you blast some unmapped reads?

Bowtie2 "overall alignment" message doesn't say anything about the proportion of the reference genome that is covered by reads. The fact it appeared 100% of your reference genome was covered reflects your library prep type, which probably was Nextera DNA on whole genome DNA extraction. If you map an amplicon library, just a tiny proportion of the reference genome will be covered.

To get coverage statistics, there are several programs, for example mosdepth.

ADD COMMENT

Login before adding your answer.

Traffic: 2996 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6