Question: Downloading Fasta Files
gravatar for Mcdenzlix
8.4 years ago by
Mcdenzlix50 wrote:

i need to download about 40 complete genomes from ncbi and still filter out sequences between specified bps(like btn 1000bp to 3000bp) from the genomes separately. i need help on how to do that. i would also like to blast some sequences against each of the downloaded genomes to check for presence absence of the querries. please assist or give best guidelines

ADD COMMENTlink modified 5.1 years ago by Biostar ♦♦ 20 • written 8.4 years ago by Mcdenzlix50
gravatar for Maximilian Haeussler
8.4 years ago by
Maximilian Haeussler1.3k wrote:

not tested as you didn't post an example, use this only as a starting point:

GENOMELIST=E_coli.fa.gz E_coli_strain2.fa.gz
mkdir download
mkdir filtered
mkdir blast

for i in ${GENOMELIST}; do
  wget ${URL}/$i -O download/$i;
  gunzip download/$i;
  faFilter -minSize=1000 -maxSize=3000 download/$i filtered/$i;
  formatdb -i filtered/$i -p F;
  blastall -p blastn -i ${INSEQFILE} -o blast/$i.blast -e 0.000001;

faFilter is from the UCSC source code collection, see or also

ADD COMMENTlink modified 8.4 years ago by Neilfws48k • written 8.4 years ago by Maximilian Haeussler1.3k
gravatar for Lee Katz
8.4 years ago by
Lee Katz2.9k
Atlanta, GA
Lee Katz2.9k wrote:

Per usual, BioPerl has the answer.

# you could make an array of IDs you need to fetch
use Bio::DB::GenBank;
$gb = Bio::DB::GenBank->new();
$seq = $gb->get_Seq_by_id('MUSIGHBA1'); # Unique ID
  [0, 100],
# then, look at the blast modules and SearchIO to see how to start blasting and parsing
ADD COMMENTlink written 8.4 years ago by Lee Katz2.9k
gravatar for Pierre Lindenbaum
8.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

you can download your genomes, build a BLAST database with formatdb and then extract a second set of sequences using fastacmd:

ncbi/build/fastacmd has a option -L

  -L  Range of sequence to extract (Format: start,stop)
      0 in 'start' refers to the beginning of the sequence
      0 in 'stop' refers to the end of the sequence [String]  Optional
    default = 0,0

then run your blastall query with the second database.

ADD COMMENTlink written 8.4 years ago by Pierre Lindenbaum118k
gravatar for Casbon
8.4 years ago by
Casbon3.2k wrote:

Might help:

ADD COMMENTlink written 8.4 years ago by Casbon3.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1006 users visited in the last hour