Question: How does RefSeq get their transcript sequences?
gravatar for pwg46
4.6 years ago by
United States
pwg46370 wrote:

I have been looking at the rna.fa.gz file in RefSeq's database. For the majority of the NM's, the sequences don't start with ATG. So, I thought perhaps the rna file was containing the entire sequence of the mRNA and not just the coding slices. So, I took an NM, the chromosomal CDS start position and he chromosomal first exon start position (all of which I got from another data file provided by Refseq DB) to see where in the NM's sequence the coding region should begin. But even then, still no 'ATG'. Also, when there is a perfect map between an NM and ENST, the NM's sequence given in the rna.fa file is completely different than the ENST's sequence given by Ensembl's own data file--The chromosomal positions of the ENST and NM perfectly match on the same chromosome (and on the same grch38 build), yet somehow the sequences they each give in their own data files are different. Could someone please clarify how RefSeq is coming up with their transcript sequences?

identifier refseq sequence nm atg • 2.1k views
ADD COMMENTlink written 4.6 years ago by pwg46370

Here is a link with the detailed process of curating RefSeq transcripts:

ADD REPLYlink written 4.6 years ago by roy.granit800

There's no reason to expect a transcript sequence to start with ATG, in fact it usually won't. Unless you're looking at non-coding sequences, they should typically contain an ATG, though. Can you give an example of a mismatch between the refseq and corresponding Ensembl sequence for the exact same transcript?

ADD REPLYlink written 4.6 years ago by Devon Ryan90k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1618 users visited in the last hour