Question: [hg19] Reading genes with CCDS
gravatar for mollitz
3.0 years ago by
mollitz60 wrote:


I got a bit lost while trying to access gene sequences in the GRCH37. I downloaded CCDS coordinates (I tried datasets from Enseml an NCBI: along with the provided FASTA files of each chromosomes from both databases) though, the provided CCDS coordinates for the genes never match:

I "flatten" the fasta file (e.g. for chromosome 1), so that it doesn't contain any newlines or fasta-headers, lookup the sequence for a given gene from the CCDS file and compare the sequence to an online viewer (for example Ensembl). The sequences never match. Searching for the sequence (which I copy from Ensemble) in my chromosome file, I can find it with a notable offset.

Failing on such a simple task shows that I lack of experience so I kindly ask for two advices.

  • How to get/download the genome (GRCh37) with matching gene annotations so I can search (on my machine) for gene sequences?
  • I didn't find a free/good edX/Coursera/... course or any other good tutorial. I've got a CS background, so I'm fine with algorithms and Python etc., though is there a good resource online which gives a good overview about datasets and how to work with them?

Best wishes and thanks for replies

hg19 ccds coordinates grch37 • 1.2k views
ADD COMMENTlink modified 3.0 years ago by Satyajeet Khare1.6k • written 3.0 years ago by mollitz60

Can you please give an example of a CCDS where it doesn't seem to match up. We can use this to try to work out what's going wrong.

ADD REPLYlink written 3.0 years ago by Emily_Ensembl21k
gravatar for Satyajeet Khare
3.0 years ago by
Satyajeet Khare1.6k
Pune, India
Satyajeet Khare1.6k wrote:

You can get them from RefSeq, UCSC, Ensembl etc. Here is Gencode link to hg37. You can download genome sequence, gene GTF files that match in annotation. BTW, hg38 is also available now. You can get the link to hg38 on the same portal.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Satyajeet Khare1.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1512 users visited in the last hour