3.7 years ago by
I'm also doing per exon counts using featureCounts. This is the command I ran
featureCounts -T 14 -a my_gtf_file.gft -o my_output.txt -p -s 2 -g exon_id my_bam_file.bam
And it worked for me.. (at least I think it did) I then manually collapsed all exons into genes and compared the counts with featureCounts per gene (default) and (looking by eye) gene counts from collapsed exons counts had slightly fewer reads compare to default featureCounts counts, but we reasoned that if the read is ambiguous (for example spans exon-intron junction) featureCounts discards it, therefore less reads when counting per exon.
From my understanding:
-t option is for counting against your particular feature of interest e.g three_prime_utr. That option really sets the coordinates rage. "Is your read in that coordinates range..?" Most often we are interested in "Is my read in the exon coordinates range..?" hence the default
-g option is the crucial one, it sets your bins. "How do you want to bin your reads..?" In exons please.., but you are saying bin them into genes.
You can grab any line from GTF file (I'm only going to speak for GTF file) and in every line you'll have gene_id attribute, in most you'll have transcript_id and in some you'll have exon_id. Obviously exons go into transcript and transcripts go into gene and so you can only group reads into the same level or higher, but not lower. In other words if you specify `-t transcript` you can only group by `-g transcript_id` or `-g gene_id` (default) but you cannot group by `-g exon_id`. On the other hand if you specify `-t exon` (default) you can group by `-g exon_id` or `-g transcript_id` or `-g gene_id`. And with three_prime_utr they only have transcript_id and gene_id attributes (make sense, because utr belongs to the transcript).
-f option (I think, haven't tested it out yet) is for some internal featureCounts use. The meta-feature term is something featureCounts people came up themselves. From the docs We define a meta-feature to be a set of features representing a biological construct of interest. For example, features often correspond to exons and meta-features to genes.
I didn't use it when I was counting reads per exon, but I'm pretty sure now that I should have. In my case I left `-f` off so featureCounts must have counted reads per meta-feature i.e gene and then some how broken them into exon_id groups.. I guess if I'd have specified `-f` if would have counted right from the star.. Although I'm a little confused about `-f` option, because given that you want to group by exons it wouldn't make sense to summarise against genes.. would it..? I'd like to more about -f option myself.