featureCount Too Many Genes
2
0
Entering edit mode
4.7 years ago
gtasource ▴ 60

Howdy folks,

I just finished up a pipeline that involved differential expression of some RNA-Seq mouse files. Using the latest refseq assembly, I aligned the files, and used featureCount to generate counts (with the plan of using this output in DeSeq2.)

Anyway, I've been doing some checking out of the output file from featureCount, and when I looked at the number of rows in the file, there were close to 40,000.

Here's my featureCount command for starters:

featureCounts -t exon -g gene -a GCF_000001635.26_GRCm38.p6_genomic.gtf -o counts.txt -M bam1 bam2 bam3 bam4

So I'm curious. Are these extra genes redundant? Are there some pseudogenes in the list? What accounts for the large amount of genes.

Thanks!

featureCount differential expression • 1.6k views
ADD COMMENT
2
Entering edit mode
4.7 years ago

The latest ensembl mouse version shows 22K coding genes, and 16K non-coding genes, that makes close to 40,000 genes.

http://uswest.ensembl.org/Mus_musculus/Info/Annotation

ADD COMMENT
0
Entering edit mode

Thanks, after I sorted and calculated the number of unique genes, I realized there were no duplicates. Should've checked the annotation info first though. :P

ADD REPLY
2
Entering edit mode
4.7 years ago

40000 genes is not wrong, it's perfectly normal. It also includes lncRNAs, for example.

ADD COMMENT

Login before adding your answer.

Traffic: 2516 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6