I am studying RNA-Seq data and I need to know how many genes are incorporated in reference genome h38 build (UCSC) ?
This was this article in 2015.
See Figure 4 there. Three databases had their own opinions.
28442 for UCSC at that time
Hi Natasha, thank you for sharing the paper. I now have the answer to my question.
Hi Glory Basumata,
If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Thanks for the update Wouter. I am a new user here in this community, so I didn't know about upvote :) Cheers!
No problem and welcome to biostars. Interesting guidelines for posting can be found in the following posts:
What have you tried?
locate refGene file, uncompress it, look for the 13th column containing gene names, make sure there are no repetitions, and count them:
curl http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz \
| zcat | cut -f13 | sort -u | wc -l
Thank you Jorge. This helped me.
These metrics are summarized by Ensembl (for their annotation) at https://www.ensembl.org/Homo_sapiens/Info/Annotation
Thank you for your help WouterDeCoster.
GENCODE has a statistics page for this information.
Hi genomax, thank you for sharing the weblink. I now have rough estimate of the no. of genes to work with.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy