Question: Retrieving CDS sequences NCBI
0
gravatar for vigneshprbh37
3.7 years ago by
INDIA
vigneshprbh3720 wrote:

I have a list of gi corresponding to the nucleotide sequence in NCBI. I need to compile a file of cds sequences of those gi. I attempted to use batch entrez but it yielded whole genome sequences.

Can anyone suggest a method to retrieve cds sequences from NCBI given I  have a list of corresponding gi

fasta batch entrez cds ncbi • 2.6k views
ADD COMMENTlink modified 19 months ago by Biostar ♦♦ 20 • written 3.7 years ago by vigneshprbh3720

You can try the e-utilities, for example e-fetch.

A greedy solution could be to get the genbank file corresponding to your gi with efetch with get, something like that in Perl:

$gbk = get(http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=$id&retmode=text&rettype=gb);

Then parse the genbank with BioPerl :

my $gbk_stream = Bio::SeqIO->new(    -file => $gbk,
                                    -format => 'GenBank');
my $seq_obj = $gbk_stream->next_seq();

if (defined ($seq_obj)){
  for my $feat_object ($seq_obj->get_SeqFeatures) {
    if ($feat_object->has_tag('cds')) {
      $cds = $feat_object->get_tag_values('cds'));
    }
  }
}

If you only need the CDS it may be a little bit to much... But if you are going to need other information from the genbank file it could be useful.

There is probably a simpler and more elegant solution...

ADD REPLYlink written 3.7 years ago by emmanuel.bouilhol20
1
gravatar for roy.granit
3.7 years ago by
roy.granit750
Israel/LabWorm
roy.granit750 wrote:

I believe this could be done using UCSC 'table browser' just select the type of identifiers, input the list of ids, and select CDS output..

 

ADD COMMENTlink written 3.7 years ago by roy.granit750
0
gravatar for biocyberman
3.7 years ago by
biocyberman760
Denmark
biocyberman760 wrote:

I haven't tried UCSC's tablebrowser for this, but my favorite tool is Ensembl's  Biomart: http://www.ensembl.org/biomart

It 's a very useful tool for this kind of things any many other kind of queries, for example: finding homolog genes across species, convert one type of IDs to another. It's worth to familiarize yourself with it.

When it comes to coordinates, be sure to choose the correct version of assembly. For example if you are working with GRCh37/hg19, it is a good idea to go to this site instead: http://grch37.ensembl.org/biomart/

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by biocyberman760
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1526 users visited in the last hour