Question

Counting gene vs exons in featureCounts

2

Entering edit mode

18 months ago

Meisam ▴ 230

I know that for RNASeq the default option is to count reads based on exons (-t 'exon' in case of counting with featureCounts), and based on my understanding from previous questions like here and here and -poorly described- subread documentation I have realised that by counting the reads based on genes (-t 'gene') we would end up including transcripts aligned to intron regions to our final count table for each gene as well. If this is correct I would assume that by changing 'exon' to 'gene' one should end up with more counts for most genes (both cases we summarise at gene level or -g 'gene_id') and overall more read counts, however it seems that by setting feature type to 'gene' we actually end up with less reads assigned:

analysing a test BAM file with:

featureCounts -p --countReadPairs -M -T 8 \
        -t 'gene' \
        -g 'gene_name' \
        -a gencode.v38.annotation.gtf \
        -o test.txt \
        test.bam

would result in:


//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file gencode.v38.annotation.gtf ...                        ||
||    Features : 60649                                                        ||
||    Meta-features : 59385                                                   ||
||    Chromosomes/contigs : 25                                                ||
||                                                                            ||
|| Process BAM file test.bam...                                               ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 30471464                                             ||
||    Successfully assigned alignments : 24302041 (79.8%)                     ||
||    Running time : 0.29 minutes                                             ||

while analysing the same BAM file with same arguments except for default exon feature type:

featureCounts -p --countReadPairs -M -T 8 \
        -t 'exon' \
        -g 'gene_name' \
        -a gencode.v38.annotation.gtf \
        -o test.txt \
        test.bam

results in 6% much more assigned reads:

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file gencode.v38.annotation.gtf ...                        ||
||    Features : 1499012                                                      ||
||    Meta-features : 59385                                                   ||
||    Chromosomes/contigs : 25                                                ||
||                                                                            ||
|| Process BAM file test.bam...                                               ||
||    Paired-end reads are included.                                          ||
||    Total alignments : 30471464                                             ||
||    Successfully assigned alignments : 26053717 (85.5%)                     ||
||    Running time : 0.25 minutes                                             ||

Could someone please explain to me what I'm missing here?

RNA-Seq subread annotation featureCounts • 1.6k views

ADD COMMENT • link updated 17 months ago by PK ▴ 130 • written 18 months ago by Meisam ▴ 230

0

Entering edit mode

Yeah , it is expected right. when you summarise against gene exon-exon spanning reads are counted once but in the exon level you count as twice. Since the split reads originated from two exons. Check the junction reads please.

ADD REPLY • link 17 months ago by PK ▴ 130