I used the RSEM to map paired-end rna-seq data to transcriptome, I am really confused by the output by the bowtie and final mapped bam file.
The command is:
rsem-calculate-expression -p 6 --paired-end $fastqP_1_fa,$fastqP_2_fa $fastqP_3_fa,$fastqP_4_fa $rsemIndex $output
$fastqP_1_fa
, $fastqP_3_fa
are one paired data, $fastqP_2_fa
, $fastqP_4_fa
are the other paired data. Same library sequenced in two lanes.
bowtie output statistic is:
# reads processed: 31414684
# reads with at least one reported alignment: 15977643 (50.86%)
# reads that failed to align: 15437041 (49.14%)
Reported 89378248 paired-end alignments to 1 output stream(s)
However, for the two lanes, there are 15577283 *2 + 15837401 *2 = 62829368
total reads. So what does reads processed: 31414684
mean here? Why less than half of total reads processed? and what is 89378248
?
In the output bam file, there are 209630578
lines, 46851725
total reads (they unique reads, i.e., only count once even mapped to multiple locations), and 15977643
mapped reads, the only number I found understandable.
Could someone give me a clue on this?