I just finished up a pipeline that involved differential expression of some RNA-Seq mouse files. Using the latest refseq assembly, I aligned the files, and used featureCount to generate counts (with the plan of using this output in DeSeq2.)
Anyway, I've been doing some checking out of the output file from featureCount, and when I looked at the number of rows in the file, there were close to 40,000.
Here's my featureCount command for starters:
featureCounts -t exon -g gene -a GCF_000001635.26_GRCm38.p6_genomic.gtf -o counts.txt -M bam1 bam2 bam3 bam4
So I'm curious. Are these extra genes redundant? Are there some pseudogenes in the list? What accounts for the large amount of genes.