I have a question regarding featureCounts behavior.
Assume my GTF file has 1 gene, comprising 2 transcripts (G1T1, G1T2), with a total of 3 exons (E1,E2,E3):
G1T1 consists of exons E1 and E2, while G1T2 consists of exons E3 and E2.
Assume my SAM file has 4 reads, as shown below.
R1 overlaps E1;
R2 is spliced between E1 and E3;
R3 is spliced between E1 and E2;
R4 is spliced between E3 and E2.
|____E1____| |____E2____| Transcript G1T1 |____E3____| |____E2____| transcript G1T2 R1------> R2---------....---> R3---------....................----> R4-----.....---->
When using featureCounts with option "-g transcript_id", only R1 is assigned and the counts for G1T1 is 1.
When using featureCoutns with option "-g transcript_id -O", all reads are assigned and the counts for G1T1 is 4 (R1,R2,R3,R4), whereas the count for G1T2 is 3 (R2,R3,R4).
I understand the logic behind allowing multiply overlapping features ("-O"), however I would expect that at the transcript level R3 is uniquely assigned to G1T1 and R4 uniquely assigned to G1T2. In other words, I would expect that even though the parts of a spliced read are mapped to multiple features, when the reads are counted at a level where a distinction between metafeatures is possible, this distinction is used to uniquely assign the reads.
I feel that without "-O", the counts are unnecessarily low and with the option "-O" the counts are unnecessarily inflated. What am I missing here?
Thanks in advance!