I used the RSEM to map paired-end rna-seq data to transcriptome, I am really confused by the output by the bowtie and final mapped bam file.
The command is:
rsem-calculate-expression -p 6 --paired-end $fastqP_1_fa,$fastqP_2_fa $fastqP_3_fa,$fastqP_4_fa $rsemIndex $output
$fastqP_3_fa are one paired data,
$fastqP_4_fa are the other paired data. Same library sequenced in two lanes.
bowtie output statistic is:
# reads processed: 31414684 # reads with at least one reported alignment: 15977643 (50.86%) # reads that failed to align: 15437041 (49.14%) Reported 89378248 paired-end alignments to 1 output stream(s)
However, for the two lanes, there are
15577283 *2 + 15837401 *2 = 62829368 total reads. So what does
reads processed: 31414684 mean here? Why less than half of total reads processed? and what is
In the output bam file, there are
46851725 total reads (they unique reads, i.e., only count once even mapped to multiple locations), and
15977643 mapped reads, the only number I found understandable.
Could someone give me a clue on this?