How is it possible to have more than 25000 genes in counts file?
2
0
Entering edit mode
5.5 years ago
Batu ▴ 250

Hello everyone,

I have a question about RNA-seq for Homo sapiens. Humans have about 25000 genes which are protein-coding. But in my recent analysis, there are more than 58000 genes with their Ensembl Gene IDs inside my counts file. How is it possible? Did I miss something while processing the fastq's and it caused an error? Thanks...

RNA-Seq rna-seq sequencing • 1.0k views
ADD COMMENT
7
Entering edit mode
5.5 years ago

Not all genes are protein coding.

ADD COMMENT
1
Entering edit mode

and 58,000 is about right for the full gene set including non-coding.

ADD REPLY
1
Entering edit mode
ADD REPLY
3
Entering edit mode
5.5 years ago

How is it possible to have more than 25000 genes in counts file.

If your annotation file (most likely a GTF file?) that you used for generating the counts contained more than 25K gene IDs.

For more details than you might ever want to know about annotation caveats, I recommend reading "A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification."

ADD COMMENT

Login before adding your answer.

Traffic: 2966 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6