I used the RSEM to map paired-end rna-seq data to transcriptome, I am really confused by the output by the bowtie and final mapped bam file.
The command is:
rsem-calculate-expression -p 6 --paired-end $fastqP_1_fa,$fastqP_2_fa $fastqP_3_fa,$fastqP_4_fa $rsemIndex $output
$fastqP_1_fa, $fastqP_3_fa are one paired data, $fastqP_2_fa, $fastqP_4_fa are the other paired data. Same library sequenced in two lanes.
bowtie output statistic is:
# reads processed: 31414684 # reads with at least one reported alignment: 15977643 (50.86%) # reads that failed to align: 15437041 (49.14%) Reported 89378248 paired-end alignments to 1 output stream(s)
However, for the two lanes, there are 15577283 *2 + 15837401 *2 = 62829368 total reads. So what does "reads processed: 31414684" mean here? Why less than half of total reads processed? and what is 89378248 ?
In the output bam file, there are 209630578 lines, 46851725 total reads (they unique reads, i.e., only count once even mapped to multiple locations), and 15977643 mapped reads, the only number I found understandable.
Could someone give me a clue on this?