Question: Best method to retrieve genome sequences
0
gravatar for kay.hellboy
4.8 years ago by
European Union
kay.hellboy0 wrote:

what is the best method to get all sequences of the eukaryotic genomes. (just the DNA sequence of the chromosomes)

i wrote a code in perl to retrieve the sequences by accession. but i dont have all the accessions and ensemle will probably block my IP if try to download them all programmatically.

sequence genome • 1.5k views
ADD COMMENTlink modified 4.8 years ago by MAPK1.4k • written 4.8 years ago by kay.hellboy0

I don't think Ensembl / Ensembl Genomes will IP block you if you retrieve genome sequences from their respective ftp sites. IP blocks are in general only given to people who scrape the website and thus bring down the production servers.

ADD REPLYlink written 4.8 years ago by Bert Overduin3.6k

i read it on their website, that repeated requests can be considered as abusive :S so not sure about that.

ADD REPLYlink written 4.8 years ago by kay.hellboy0

I am sure downloading many files from the ftp site is not considered as abusive. Cheers, Bert (Ensembl team member April 2005 - March 2014 :) )

 

ADD REPLYlink written 4.8 years ago by Bert Overduin3.6k
1
gravatar for MAPK
4.8 years ago by
MAPK1.4k
United States
MAPK1.4k wrote:

You can look for taxids, extract all the GI/ accession numbers from those species in NCBI. Once you have GIs, you can download sequences from NCBIs nr/nt database using blastdbcmd -batch_entrez option in standalone blast (dont't remember the command exactly).

ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by MAPK1.4k

i'll give it a try, thank you

ADD REPLYlink written 4.8 years ago by kay.hellboy0

but i wouldnt know which GI is for the chromosome, because it holds all the GIs of that taxid without indication to its type

ADD REPLYlink written 4.8 years ago by kay.hellboy0

I think that is correct, you have to figure out the way to filter out the mitochondrial sequences. 

ADD REPLYlink written 4.8 years ago by MAPK1.4k

i found a file one the ncbi genome ftp page that has all the IDs for each kingdom, made it a lot easier. thank you for the hint

ADD REPLYlink written 4.8 years ago by kay.hellboy0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour