Hi,
I am doing a featureCounts on 4 RNA-seq samples, code as found below. I am counting on the "gene" from gencode v33 as seen in the gtf file. Reason why I am doing this is to counter-check the counting process against the aligner's (STAR in this case) summary output.
featureCounts -a gencode.v33.primary_assembly.annotation.gtf -o out.txt -O --fraction -B -t "gene" -s 2 -T 8 --primary -p 504-709_S21Aligned.sortedByCoord.out.bam 504-710_S22Aligned.sortedByCoord.out.bam 505-702_S26Aligned.sortedByCoord.out.bam 505-703_S27Aligned.sortedByCoord.out.bam
Am I right to assume that the above code will correspond to the uniquely mapped reads summary in STAR? The counting above is very close but not exactly same to the reported uniquely mapped reads from STAR, hence my assumption.
If so, what does the --primary tag do in this case, as I assume it will count the multi-mapped reads which is given a primary tag.
Regards Solyris
This
is not entirely true (at least for featureCounts v2.0.0). featureCounts recognizes multimapped reads using the
NH:itag. If you use--primarywithout-MfeatureCounts will count only uniquely mapped reads (NH:i:1) and completely ignore multimapped reads regardless on the SAM primary tag. Using--primaryin this scenario doesn't have any effect on counting whatsoever.In case the aligner doesn't output the
NH:iflag, featureCounts doesn't recognize unique/multimapped and will count all the alignments. In this case, using--primarywill help you to count each read only once.If you want to count both uniquely and multimapped reads (and your aligner outputs the
NH:iflag) but each of the reads only once you can use--primary -M.It appears that
--primaryusage with-Msets off the-Mparameter off as per the description here:So would you recommend using
--fraction -Mrather than--primary -M. I confirmed this by checking the counts which were not fraction at all when using--fraction --primary -Mtogether.PS: I am new to featureCounts so please help me here.