Question: bowtie output statistic in RSEM
5.4 years ago
chunxuan wrote:

I used the RSEM to map paired-end rna-seq data to transcriptome, I am really confused by the output by the bowtie and final mapped bam file.

The command is:

rsem-calculate-expression -p 6 --paired-end $fastqP_1_fa,$fastqP_2_fa  $fastqP_3_fa,$fastqP_4_fa $rsemIndex $output

$fastqP_1_fa, $fastqP_3_fa are one paired data, $fastqP_2_fa, $fastqP_4_fa are the other paired data. Same library sequenced in two lanes.

bowtie output  statistic is:

# reads processed: 31414684
# reads with at least one reported alignment: 15977643 (50.86%)
# reads that failed to align: 15437041 (49.14%)
Reported 89378248 paired-end alignments to 1 output stream(s)

However, for the two lanes, there are 15577283 *2 + 15837401 *2 = 62829368 total reads. So what does "reads processed: 31414684" mean here? Why less than half of total reads processed? and what is 89378248 ?

In the output bam file, there are 209630578 lines, 46851725 total reads (they unique reads, i.e., only count once even mapped to multiple locations), and 15977643 mapped reads, the only number I found understandable.

Could someone give me a clue on this?

5.4 years ago
Istvan Albert
University Park, USA
Istvan Albert wrote:

Well 2 * 31414684 = 62829368, it should probably say read-pairs processed, one read pair may align in multiple locations hence 89378248


