Question: featureCount Too Many Genes
0
gravatar for gtasource
20 days ago by
gtasource20
gtasource20 wrote:

Howdy folks,

I just finished up a pipeline that involved differential expression of some RNA-Seq mouse files. Using the latest refseq assembly, I aligned the files, and used featureCount to generate counts (with the plan of using this output in DeSeq2.)

Anyway, I've been doing some checking out of the output file from featureCount, and when I looked at the number of rows in the file, there were close to 40,000.

Here's my featureCount command for starters:

featureCounts -t exon -g gene -a GCF_000001635.26_GRCm38.p6_genomic.gtf -o counts.txt -M bam1 bam2 bam3 bam4

So I'm curious. Are these extra genes redundant? Are there some pseudogenes in the list? What accounts for the large amount of genes.

Thanks!

ADD COMMENTlink modified 20 days ago by WouterDeCoster40k • written 20 days ago by gtasource20
2
gravatar for swbarnes2
20 days ago by
swbarnes26.2k
United States
swbarnes26.2k wrote:

The latest ensembl mouse version shows 22K coding genes, and 16K non-coding genes, that makes close to 40,000 genes.

http://uswest.ensembl.org/Mus_musculus/Info/Annotation

ADD COMMENTlink written 20 days ago by swbarnes26.2k

Thanks, after I sorted and calculated the number of unique genes, I realized there were no duplicates. Should've checked the annotation info first though. :P

ADD REPLYlink written 20 days ago by gtasource20
2
gravatar for WouterDeCoster
20 days ago by
Belgium
WouterDeCoster40k wrote:

40000 genes is not wrong, it's perfectly normal. It also includes lncRNAs, for example.

ADD COMMENTlink written 20 days ago by WouterDeCoster40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1094 users visited in the last hour