Question: RefSeq Version Numbers/Mapping File
1
gravatar for pwg46
6.0 years ago by
pwg46440
United States
pwg46440 wrote:

Hello,

I notice that the refSeq db's data files contain refSeq transcripts, proteins, etc. with version numbers. I am wondering approximately how often these version numbers change? Also, is it likely that two Refseq transcripts, which are the same transcript (but different versions), would have different sequences if they are both GrCH38 annotations?  

Also, I am looking for a data file which maps refSeq transcripts to proteins, but also takes into account version numbers. I know Biomart maps refSeq transcripts to proteins, but it doesn't end the transcripts/proteins with their version numbers (it only chooses the latest versions).

Thanks

ADD COMMENTlink modified 5.3 years ago by Biostar ♦♦ 20 • written 6.0 years ago by pwg46440
0
gravatar for Prakki Rama
6.0 years ago by
Prakki Rama2.4k
Singapore
Prakki Rama2.4k wrote:

how often these version numbers change?

To find out the changes of version, you can look for Revision history in Display settings,  under the search box.

Is it likely that two Refseq transcripts, which are the same transcript (but different versions), would have different sequences?

Possible. For example, XM_003440720 and NM_001279661 are two different versions of the same nucleotide sequence. XM_003440720 is now obsolete which was previous version of NM_001279661. They are not completely different sequences in a strict sense, but the new one seems to be improved version, with additional bases to the previous one.

looking for a data file which maps refSeq transcripts to proteins, but also takes into account version numbers.

one way to do this is by using eutils. In terminal:

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=XM_003440720,NM_001279661&retmode=text" | grep 'accession "' | sed 's/          accession "//g' | sed 's/" ,//g' | egrep "NP|XP" |  while read IDS ; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=${IDS}&retmode=text&rettype=fasta" ; done;

The output includes older version as well as newer version protein sequences of above mentioned ID's XM_003440720, NM_001279661.

>gi|348506442|ref|XP_003440768.1| PREDICTED: 40S ribosomal protein S12 [Oreochromis niloticus]
MAEEGRQAHLCVLAANCDEPMYVKLVEALCAEHQINLIKVDDNKKLGEWVGLCKIDREGKPRKVVGCSCV
VVKDYGKESQAKDVIEEYFKSKK

>gi|525343327|ref|NP_001266590.1| 40S ribosomal protein S12 [Oreochromis niloticus]
MAEEGSPAGGVMDVNTALPEVLKTALIHDGLAPGIREAAKALDKRQAHLCVLAANCDEPMYVKLVEALCA
EHQINLIKVDDNKKLGEWVGLCKIDREGKPRKVVGCSCVVVKDYGKESQAKDVIEEYFKSKK

 

 

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by Prakki Rama2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2285 users visited in the last hour
_