Question: Getting plant full length cDNA sequences from NCBI
0
gravatar for abhijit.synl
2.3 years ago by
abhijit.synl40
United States
abhijit.synl40 wrote:

Hello, I wanted to know if there was a method to download plant full length cDNA sequences from NCBI on a periodic basis. Lets say once every month. The way I am doing now is typing FLI_CDNA OR full-length OR "full length" in the search textbox, going to the Nucleotide section and selecting plant and mRNA as filters. After that I do some post-processing to select unique sequences. Works well. But I want an automated unix method to do this on a periodic basis. I do not know if such lists can be found on the ncbi ftp site.

Looking for a solution.

Thanks Abhijit

blast update cdna ncbi • 740 views
ADD COMMENTlink modified 2.3 years ago by apa@stowers420 • written 2.3 years ago by abhijit.synl40
1
gravatar for apa@stowers
2.3 years ago by
apa@stowers420
Kansas City
apa@stowers420 wrote:

I do this regularly, using NCBI eutils. I run them via Perl (manual here) but there is also a command-line toolset (manual here)

This is a minimal example that downloads complete nucleotide fastas for Mnemiopsis; you will have to modify the $esearch URL for your needs:

#!/usr/bin/env perl
use LWP::Simple;
use strict;

my $retmax = 500;  # records per batch
## Entrez query string: Mnemiopsis[ORGN] AND complete[Title] NOT partial[Title] NOT genome[Title]
my $esearch = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?"
    . "db=nuccore&usehistory=y&term=Mnemiopsis%5BORGN%5D+AND+"
    . "complete%5BTitle%5D+NOT+partial%5BTitle%5D+NOT+genome%5BTitle%5D";
my $eresult = get($esearch);
my ($N, $key, $web) = ($eresult =~ m|<Count>(\d+)</Count>.*<QueryKey>(\d+)</QueryKey>.*<WebEnv>(\S+)</WebEnv>|s);
my $efetch = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?"
    . "db=nuccore&WebEnv=$web&query_key=$key&rettype=fasta&retmode=text";
my $nbatch = int($N/$retmax);
$nbatch++ if $N/$retmax > $nbatch;
my $retstart = 0-$retmax;
foreach my $i (1..$nbatch) {
    sleep 1;  # slow down server hit rate
    $retstart += $retmax;
    print STDERR "Batch $i/$nbatch\n";
    my $efetch1 = "$efetch&retstart=$retstart&retmax=$retmax";
    my $efetch1_result = get($efetch1);
    print $efetch1_result;
}
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by apa@stowers420
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1475 users visited in the last hour