Featurecounts: Unexpected number of features
1
0
Entering edit mode
5 months ago

Version: Version 2.1.1
Command:

featurecounts -s 2 -p --countReadPairs -O -f -T 8 -a gtf -o output_file BAM_file

GTF annotation file (gencode.v25.annotation.gtf) has 2,579,817 entries. Scanning through the output, I noticed that all features are including (exons, UTRs, stop-codons, etc). However, only 1,182,765 lines are present in the output_file.

Initially, I thought featurecounts was selecting annotations that are exactly the same. For instance, HAVANA and ENSEMBL are reported and I suspect there is a lot of overlap. However, when I examine the output_file closer, I find multiple lines that are exactly the same.

**ENSG00000238009.6 chr1    112700  112804  -   105 0**  
ENSG00000238009.6   chr1    92091   92240   -   150 0  
ENSG00000238009.6   chr1    89295   91629   -   2335    7  
ENSG00000238009.6   chr1    129055  129217  -   163 0  
ENSG00000238009.6   chr1    120721  120932  -   212 0  
**ENSG00000238009.6 chr1    112700  112804  -   105 0**  
ENSG00000238009.6   chr1    92230   92240   -   11  0  
ENSG00000238009.6   chr1    129055  129173  -   119 0  
**ENSG00000238009.6 chr1    112700  112804  -   105 0**  

If featurecounts produce output with the same number of annotations as the gtf annotation file, I wouldn't be so confused. But it appears there is some sort of filtering and/or collapsing. This makes it difficult for me to know what each line represents with regards to the gtf annotation file.

Could someone provide some feedback about what is going on?

featurecounts • 966 views
ADD COMMENT
0
Entering edit mode

gencode.v25.annotation.gtf

That release is from back in 2016. Is there a specific reason you are using something that old.

ADD REPLY
0
Entering edit mode

Yes there is. I am trying to replicate previous results before updating it.

ADD REPLY
1
Entering edit mode
5 months ago
Mensur Dlakic ★ 30k

I think the problem most likely is that you are using the a combination of -O and -f switches. This is what they do:

  -O                  Assign reads to all their overlapping meta-features (or
                      features if -f is specified).

  -f                  Perform read counting at feature level (eg. counting
                      reads for exons rather than genes).

When you remove the switches from your command, all the exons will be counted as one gene and I suspect the duplicates will go away.

ADD COMMENT
0
Entering edit mode

I appreciate the suggestion. However, when running with only the -f flag, I still have less features than expected. Below is the stout of the program.

I assumed that each line in the gtf annotation would be a feature. But this doesn't appear correct.

Load annotation file gencode.v25.annotation.gtf ...           
    Features : 1182765                                                      
    Meta-features : 58037                                                   
    Chromosomes/contigs : 25

Process BAM file R14123_accepted_hits.sorted.bam...                        
    Strand specific : reversely stranded                                    
    Paired-end reads are included.                                          
    Total alignments : 74858410                                             
    Successfully assigned alignments : 7897561 (10.5%)                      
    Running time : 2.00 minutes
ADD REPLY
0
Entering edit mode

However, when running with only the -f flag

@Mensur suggested that you remove the -f flag.

Generally people want the summarization at gene level unless you are interested in doing transcript level analysis.

ADD REPLY
0
Entering edit mode

Transcriptional level is something I am interested in and why I am looking at this at a finer detail.

ADD REPLY
0
Entering edit mode

Have you considered using salmon or kallisto?

ADD REPLY
0
Entering edit mode

I have played around with salmon. However, featurecounts appears more transparant in where the information, from the gtf file, is coming from. I am simply struggling to understand why so many entries are being ignored by the software.

ADD REPLY

Login before adding your answer.

Traffic: 3286 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6