[hg19] Reading genes with CCDS
Entering edit mode
3.8 years ago
mollitz ▴ 80


I got a bit lost while trying to access gene sequences in the GRCH37. I downloaded CCDS coordinates (I tried datasets from Enseml an NCBI: ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/cds/ http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/ccdsGene.txt.gz along with the provided FASTA files of each chromosomes from both databases) though, the provided CCDS coordinates for the genes never match:

I "flatten" the fasta file (e.g. for chromosome 1), so that it doesn't contain any newlines or fasta-headers, lookup the sequence for a given gene from the CCDS file and compare the sequence to an online viewer (for example Ensembl). The sequences never match. Searching for the sequence (which I copy from Ensemble) in my chromosome file, I can find it with a notable offset.

Failing on such a simple task shows that I lack of experience so I kindly ask for two advices.

  • How to get/download the genome (GRCh37) with matching gene annotations so I can search (on my machine) for gene sequences?
  • I didn't find a free/good edX/Coursera/... course or any other good tutorial. I've got a CS background, so I'm fine with algorithms and Python etc., though is there a good resource online which gives a good overview about datasets and how to work with them?

Best wishes and thanks for replies

GRCH37 hg19 CCDS coordinates • 1.4k views
Entering edit mode

Can you please give an example of a CCDS where it doesn't seem to match up. We can use this to try to work out what's going wrong.

Entering edit mode
3.8 years ago
Satyajeet Khare ★ 1.6k

You can get them from RefSeq, UCSC, Ensembl etc. Here is Gencode link to hg37. You can download genome sequence, gene GTF files that match in annotation. BTW, hg38 is also available now. You can get the link to hg38 on the same portal.


Login before adding your answer.

Traffic: 2282 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6