Calculate percentage mapped reads from Bowtie2/RSEM
Entering edit mode
7.0 years ago
ando.kelli ▴ 60

Hey all,

I used the following script to align my reads to a reference transcriptome (bowtie2), and calculate transcript abundance (RSEM).

$TRINITY_HOME/util/ --transcripts location_of_data/assembly.fasta --seqType fq --est_method RSEM -aln_method bowtie2 --trinity_mode --thread_count 32 --output_dir location_of_output_dir --left left_reads.fq.gz --right right_reads.fq.gz --output_prefix sample_name

It all ran without any problems. However, when I tried to calculate the percentage of reads that mapped to the reference transcriptome, the output stated that 100% of reads were mapped which is unusual.

I used samtools flagstat to generate statistics for the output .bam file for each sample, and it said 100% of reads were mapped.

I then used the RSEM script on the contents of the .stat output directory to generate statistics for the bowtie2 mapping. It said the same. 100% of reads mapped, with ~65% unique and 35 multi-aligned.

Does the .stat output directory and the output .bam file from bowtie2 only contain data for mapped reads? Is it possible that 100% of the reads did map, since the reference transcriptome was generated from those reads to begin with?

How else can I generate the statistic?

Cheers :-)

RSEM Bowtie2 Mapped Reads resm-plot-model flagstat • 4.2k views
Entering edit mode

I've never seen an instance in which 100% of a nontrivial library mapped to anything. But when mapping to an assembly of those reads, it's possible. Normally that would mean you have excellent library-creation and read preprocessing procedures.

That said - you should inspect the command used in "". By default bowtie2 does output unmapped reads, but perhaps they are being suppressed or redirected.

Entering edit mode

Thanks Brian, I thought it was strange but technically not impossible. I'll have a look at the script :-)

Entering edit mode
3.8 years ago

I think using Samtools Flagstat is not the best way to quantify the reads successfully mapped. As mentioned elsewhere by Devon Ryan, samtools flagstat will report more reads than actually present in fastq file also considering the secondary alignments. In such case the stats generated are merely representative and doesn't report the actual % or reads mapped (multimapping reads are counted multiple times), which could exaggerate the mapping percentages.

Please correct me if I am missing something.

For more reference, refer the discussion here


Login before adding your answer.

Traffic: 1169 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6