Question: getting feature annotations for all NCBI Refseq sequences
gravatar for bitpir
2.4 years ago by
bitpir190 wrote:

Hi, I was wondering if there's a way to download all feature annotations (Gene; CDS; rRNA; tRNA; ncRNA; repeat_region) of all the Refseq sequences ( from NCBI? I can't seem to find it anywhere on the web or NCBI. Something like GFF for Genbank sequences would be great. Thanks!

cds refseq annotation ncbi • 1.2k views
ADD COMMENTlink modified 2.4 years ago by genomax89k • written 2.4 years ago by bitpir190
gravatar for genomax
2.4 years ago by
United States
genomax89k wrote:

Get summary file for NCBI RefSeq genomes.


Grab the ftp path (and or any other fields you need) from this file for each genome.

awk -F '\t' '{print $20}' assembly_summary_refseq.txt > ftp_paths

You should get something like this:

From each of the ftp path directory you should be able to get the *genomic.gff.gz file for that genome.

ADD COMMENTlink written 2.4 years ago by genomax89k

Awesome! Thank you so much for your answer, this is super helpful! Are the sequences from the Refseq/release represented in the assembly_summary_refseq.txt? And in case of viruses e.g., is there a similar assembly_summary file where I can get all the annotations? Is there also a NC_ to assembly type of file somewhere? I can see it when I search for the NC number but can never find the file list easily... Thanks a lot for your help!

ADD REPLYlink written 2.4 years ago by bitpir190

There is a similar summary file for virii. You can find that here.

What exactly do you mean by NC_ to assembly type? Can you give an example?

ADD REPLYlink written 2.4 years ago by genomax89k

great, thank you! My virus db includes View all RefSeq and Neighbor nucleotide records, will all of the viruses be captured in the summary file? I was thinking of finding all the NC_ accession number to a particular assembly accession number (e.g. NC_011750.1 --> GCF_000026345.1). I have downloaded the refseq genomic sequences, and I'm kind of working backwards to get the see which NC_ is associated with which assembly number and getting their respective annotation.

ADD REPLYlink written 2.4 years ago by bitpir190

Ah! Just found the answer to my question, for the second part at least :)

ADD REPLYlink written 2.4 years ago by bitpir190
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1338 users visited in the last hour