Version: Version 2.1.1
Command:
featurecounts -s 2 -p --countReadPairs -O -f -T 8 -a gtf -o output_file BAM_file
GTF annotation file (gencode.v25.annotation.gtf) has 2,579,817 entries. Scanning through the output, I noticed that all features are including (exons, UTRs, stop-codons, etc). However, only 1,182,765 lines are present in the output_file.
Initially, I thought featurecounts was selecting annotations that are exactly the same. For instance, HAVANA and ENSEMBL are reported and I suspect there is a lot of overlap. However, when I examine the output_file closer, I find multiple lines that are exactly the same.
**ENSG00000238009.6 chr1 112700 112804 - 105 0**
ENSG00000238009.6 chr1 92091 92240 - 150 0
ENSG00000238009.6 chr1 89295 91629 - 2335 7
ENSG00000238009.6 chr1 129055 129217 - 163 0
ENSG00000238009.6 chr1 120721 120932 - 212 0
**ENSG00000238009.6 chr1 112700 112804 - 105 0**
ENSG00000238009.6 chr1 92230 92240 - 11 0
ENSG00000238009.6 chr1 129055 129173 - 119 0
**ENSG00000238009.6 chr1 112700 112804 - 105 0**
If featurecounts produce output with the same number of annotations as the gtf annotation file, I wouldn't be so confused. But it appears there is some sort of filtering and/or collapsing. This makes it difficult for me to know what each line represents with regards to the gtf annotation file.
Could someone provide some feedback about what is going on?
That release is from back in 2016. Is there a specific reason you are using something that old.
Yes there is. I am trying to replicate previous results before updating it.