Question

featureCount Too Many Genes

0

Entering edit mode

5.3 years ago

gtasource ▴ 60

Howdy folks,

I just finished up a pipeline that involved differential expression of some RNA-Seq mouse files. Using the latest refseq assembly, I aligned the files, and used featureCount to generate counts (with the plan of using this output in DeSeq2.)

Anyway, I've been doing some checking out of the output file from featureCount, and when I looked at the number of rows in the file, there were close to 40,000.

Here's my featureCount command for starters:

featureCounts -t exon -g gene -a GCF_000001635.26_GRCm38.p6_genomic.gtf -o counts.txt -M bam1 bam2 bam3 bam4

So I'm curious. Are these extra genes redundant? Are there some pseudogenes in the list? What accounts for the large amount of genes.

Thanks!

featureCount differential expression • 1.8k views

ADD COMMENT • link updated 5.3 years ago by WouterDeCoster 47k • written 5.3 years ago by gtasource ▴ 60

score 2 · Accepted Answer · 2019-07-31

2

Entering edit mode

5.3 years ago

swbarnes2 14k

The latest ensembl mouse version shows 22K coding genes, and 16K non-coding genes, that makes close to 40,000 genes.

http://uswest.ensembl.org/Mus_musculus/Info/Annotation

ADD COMMENT • link 5.3 years ago by swbarnes2 14k

0

Entering edit mode

Thanks, after I sorted and calculated the number of unique genes, I realized there were no duplicates. Should've checked the annotation info first though. :P

ADD REPLY • link 5.3 years ago by gtasource ▴ 60

score 2 · Accepted Answer · 2019-07-31

2

Entering edit mode

5.3 years ago

WouterDeCoster 47k

40000 genes is not wrong, it's perfectly normal. It also includes lncRNAs, for example.

ADD COMMENT • link 5.3 years ago by WouterDeCoster 47k