Total number of genes in hg38
3.3 years ago

I am studying RNA-Seq data and I need to know how many genes are incorporated in reference genome h38 build (UCSC) ?

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339237/

See Figure 4 there. Three databases had their own opinions.

28442 for UCSC at that time

Hi Natasha, thank you for sharing the paper. I now have the answer to my question.

Hi Glory Basumata,

Cheers,
Wouter

Thanks for the update Wouter. I am a new user here in this community, so I didn't know about upvote :) Cheers!

No problem and welcome to biostars. Interesting guidelines for posting can be found in the following posts:

What have you tried?

3.3 years ago

locate refGene file, uncompress it, look for the 13th column containing gene names, make sure there are no repetitions, and count them:

curl http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz \
| zcat | cut -f13 | sort -u | wc -l
28054

Thank you Jorge. This helped me.

3.3 years ago

These metrics are summarized by Ensembl (for their annotation) at https://www.ensembl.org/Homo_sapiens/Info/Annotation

Thank you for your help WouterDeCoster.

3.3 years ago
GenoMax 107k

GENCODE has a statistics page for this information.

Hi genomax, thank you for sharing the weblink. I now have rough estimate of the no. of genes to work with.