Question: Download genes sequences from large number of whole genomes in NCBI
gravatar for marcoooo
18 months ago by
marcoooo0 wrote:


I have a large list of NCBI accession codes of complete genome sequences of a species, and need to download a few gene from each one of these sequences (same genes for each sequences, whit slightly different position in different sequences). Being the list large, I cannot manually check for the genes positions in the annotations of each sequence and download the regions of interest, and I was wondering if there is a way of using the NCBI tools to do this in a more automatized fashion. I tried playing with efetch and eUtils, but with no success so far.

Does anybody have any idea how to do this?

I know that the "download all the sequences and align them to find the genes" should work, but few of these sequences have Ns stretches that makes the alignment problematic.

Thanks in advance for the help.


ncbi sequence genome gene • 414 views
ADD COMMENTlink modified 18 months ago by Istvan Albert ♦♦ 81k • written 18 months ago by marcoooo0
gravatar for Istvan Albert
18 months ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

Download the blast database of refseq, then use the blastdbcmd BLAST database client to query and extract sequences. You can extract by name, by coordinate, strands etc.

This is probably the fastest and most efficient way to query the entire refseq.

ADD COMMENTlink written 18 months ago by Istvan Albert ♦♦ 81k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1507 users visited in the last hour