Question: featureCount Too Many Genes
0
gravatar for gtasource
4 months ago by
gtasource30
gtasource30 wrote:

Howdy folks,

I just finished up a pipeline that involved differential expression of some RNA-Seq mouse files. Using the latest refseq assembly, I aligned the files, and used featureCount to generate counts (with the plan of using this output in DeSeq2.)

Anyway, I've been doing some checking out of the output file from featureCount, and when I looked at the number of rows in the file, there were close to 40,000.

Here's my featureCount command for starters:

featureCounts -t exon -g gene -a GCF_000001635.26_GRCm38.p6_genomic.gtf -o counts.txt -M bam1 bam2 bam3 bam4

So I'm curious. Are these extra genes redundant? Are there some pseudogenes in the list? What accounts for the large amount of genes.

Thanks!

ADD COMMENTlink modified 4 months ago by WouterDeCoster42k • written 4 months ago by gtasource30
2
gravatar for swbarnes2
4 months ago by
swbarnes27.1k
United States
swbarnes27.1k wrote:

The latest ensembl mouse version shows 22K coding genes, and 16K non-coding genes, that makes close to 40,000 genes.

http://uswest.ensembl.org/Mus_musculus/Info/Annotation

ADD COMMENTlink written 4 months ago by swbarnes27.1k

Thanks, after I sorted and calculated the number of unique genes, I realized there were no duplicates. Should've checked the annotation info first though. :P

ADD REPLYlink written 4 months ago by gtasource30
2
gravatar for WouterDeCoster
4 months ago by
Belgium
WouterDeCoster42k wrote:

40000 genes is not wrong, it's perfectly normal. It also includes lncRNAs, for example.

ADD COMMENTlink written 4 months ago by WouterDeCoster42k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 797 users visited in the last hour