I am quite new to STAR aligner, and have some confusion in the numbers of unmapped/mapped reads output from STAR:
I would like to know whether the STAR output BAM file if I do not use the argument (
--outSAMunmapped within) is already filtered for unmapped, or duplicate reads or not ? or do I need to further filter it before variant calling?
The short story is that:
Assuming my file is file.bam, I have run STAR without the argument (
--outSAMunmapped within), and I obtained BAM file. Looking into the log.final of that file produced % uniquely mapped reads 83.3%. If I look using the command
samtools view -c -f4 file.bam
it produced 0 reads, so unmapped also running
samtools flagstat file.bam generated this image
so no duplicate, no unmapped reads so clean file. When I rerun the alignment by adding
--outSAMunmapped within, and rerun
samtools flagstat file.bam,
I could get this image
so mapping appeared as a %, but still the duplicate is 0
Based on that I assume that
- the BAM file produced from STAR if one do not use argument
--outSAMunmapped withinis a file that contains only mapped reads (not sure whether these are unique mapped or ?),
- if you add this argument, you get a BAM file that contain both mapped and unmapped but how about duplicate reads and mismatches.
Which statistics on the output bam file to be used in a paper or presentation?