Entering edit mode
6.9 years ago
Glory Basumata
▴
140
I am studying RNA-Seq data and I need to know how many genes are incorporated in reference genome h38 build (UCSC) ?
I am studying RNA-Seq data and I need to know how many genes are incorporated in reference genome h38 build (UCSC) ?
locate refGene file, uncompress it, look for the 13th column containing gene names, make sure there are no repetitions, and count them:
curl http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz \
| zcat | cut -f13 | sort -u | wc -l
28054
These metrics are summarized by Ensembl (for their annotation) at https://www.ensembl.org/Homo_sapiens/Info/Annotation
GENCODE has a statistics page for this information.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
This was this article in 2015.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339237/
See Figure 4 there. Three databases had their own opinions.
28442 for UCSC at that time
Hi Natasha, thank you for sharing the paper. I now have the answer to my question.
Hi Glory Basumata,
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Cheers,
Wouter
Thanks for the update Wouter. I am a new user here in this community, so I didn't know about upvote :) Cheers!
No problem and welcome to biostars. Interesting guidelines for posting can be found in the following posts:
What have you tried?