Entering edit mode
6.0 years ago
seqall
•
0
A very basic question:
What's the difference between the "*.rna.fna.gz" files at: ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/
and "GRCh38_latest_rna.fna.gz" at: ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/?
Are the RNA sequences in "GRCh38_latest_rna.fna.gz" exactly the same as the ones in the combination of all "*.rna.fna.gz"?
Thanks a lot!
From what I can tell, the files in the directory ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/ are just symbolic links to data elsewhere on the NCBI servers, links that are updated such that they will always point to the most updated files.
GRCh38_latest_rna.fna.gz contains the mRNA sequence of all RefSeq transcripts, specifically:
The files in the other directory are broken up into various chunks, for whatever reason, but the sequences in them are exactly the same as per GRCh38_latest_rna.fna.gz. According to the README files, these 'chunks' also contain all RefSeq transcripts
Thanks a lot for your info, Kevin! After investigation and inquiry of this two sets, it turns out that the "*.rna.fna.gz" files at: ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/ are the latest and contain newer versions of transcripts. They have mRNA sequences:
After comparing all the transcript IDs
A part of the differences are as follows:
Interesting. Perhaps feed this back to the NCBI (if not already done).