I have been looking at the rna.fa.gz file in RefSeq's database. For the majority of the NM's, the sequences don't start with ATG. So, I thought perhaps the rna file was containing the entire sequence of the mRNA and not just the coding slices. So, I took an NM, the chromosomal CDS start position and he chromosomal first exon start position (all of which I got from another data file provided by Refseq DB) to see where in the NM's sequence the coding region should begin. But even then, still no 'ATG'. Also, when there is a perfect map between an NM and ENST, the NM's sequence given in the rna.fa file is completely different than the ENST's sequence given by Ensembl's own data file--The chromosomal positions of the ENST and NM perfectly match on the same chromosome (and on the same grch38 build), yet somehow the sequences they each give in their own data files are different. Could someone please clarify how RefSeq is coming up with their transcript sequences?
Question: How does RefSeq get their transcript sequences?
4.6 years ago by
pwg46 • 370
pwg46 • 370 wrote:
ADD COMMENT • link •
Please log in to add an answer.
Powered by Biostar version 2.3.0
Traffic: 1618 users visited in the last hour