I am currently developing a website-database-pipeline. Unfortunately I am working on a windows core2duo PC with 2GB ram (NCBI utils can be used but -> cygwin, tried and PC lagged and no enough memory for linux), my workstation is going to be delivered in 2-3 months.

But this is a kind of urgent need.

There are a couple of questions:

  1. I have a list of accession and GI ids and I want to fetch all the sequences from the NCBI FTP. Basically I am finding it difficult to understand and hence navigate through the FTP page. My website back-end is running blast using the user's query and following which I wish to download the respective sequences. Is there any way I can fetch the sequences based on the hits through a script?

  2. In order to get the organism name for the respective hit, I am using PERL (Mechanize) module to access the hit page and then get the organism name. But the ncbi pages are dynamically generated so the only way to get download the organism name is pattern matching from the title (assuming that the first two words are the organism names) which is not dynamically generated. Is there any other efficient way?

In desperate need of suggestions...

Thanks in advance.

Have a good day!

