Question: What does the Bowtie2 "overall alignment rate" mean?
0
gravatar for laurenkleine18
12 weeks ago by
laurenkleine1810 wrote:

Is the Bowtie2 "overall alignment rate"...

  1. The percentage of reads that actually aligned to the reference genome? By that I mean, is this percent the same as saying that "only 75% of the reference genome was covered by your reads?"

or

  1. The percentage of fastq data that aligned to the reference genome? So lets say that "100% of the reference genome is covered by the reads, but only 75% of the fastq files were used to cover the reference genome. So the remainder of the fastq data must belong to something else."

I recently purchased a known strain of Candida albicans from a vendor and sequenced its genome using an Illumina MiSeq. I aligned the paired-end fastq files to the strain reference genome and got an overall alignment rate of ~ 43%.

I viewed the BAM results on IGV and from a visual perspective there appeared to heavy coverage across the reference genome, which is why I am questioning the meaning of this "overall alignment rate." Does that mean that 100% of the reference could be covered by the paired-end fastq data and the other ~ 57% could be other DNA? Or does that mean that our sequencing run didn't capture ~57% of the Candida albicans genome?

alignment • 253 views
ADD COMMENTlink modified 12 weeks ago by h.mon21k • written 12 weeks ago by laurenkleine1810
2
gravatar for h.mon
12 weeks ago by
h.mon21k
Brazil
h.mon21k wrote:

Does that mean that 100% of the reference could be covered by the paired-end fastq data and the other ~ 57% could be other DNA? Or does that mean that our sequencing run didn't capture ~57% of the Candida albicans genome?

Neither: Bowtie2 final output message just states the percentage of reads from the fastq that mapped the reference genome. So 43% of your reads mapped to the reference genome, and 57% didn't map. This 57% could be contaminants and / or adapters, or maybe your strain is somewhat divergent from the reference strain genome - with just percentage mapping, we can't know. Did you check for adapters? Did you blast some unmapped reads?

Bowtie2 "overall alignment" message doesn't say anything about the proportion of the reference genome that is covered by reads. The fact it appeared 100% of your reference genome was covered reflects your library prep type, which probably was Nextera DNA on whole genome DNA extraction. If you map an amplicon library, just a tiny proportion of the reference genome will be covered.

To get coverage statistics, there are several programs, for example mosdepth.

ADD COMMENTlink written 12 weeks ago by h.mon21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1598 users visited in the last hour