Question: Retrieving CDS sequences NCBI
gravatar for vigneshprbh37
3.8 years ago by
vigneshprbh3720 wrote:

I have a list of gi corresponding to the nucleotide sequence in NCBI. I need to compile a file of cds sequences of those gi. I attempted to use batch entrez but it yielded whole genome sequences.

Can anyone suggest a method to retrieve cds sequences from NCBI given I  have a list of corresponding gi

fasta batch entrez cds ncbi • 2.7k views
ADD COMMENTlink modified 21 months ago by Biostar ♦♦ 20 • written 3.8 years ago by vigneshprbh3720

You can try the e-utilities, for example e-fetch.

A greedy solution could be to get the genbank file corresponding to your gi with efetch with get, something like that in Perl:

$gbk = get($id&retmode=text&rettype=gb);

Then parse the genbank with BioPerl :

my $gbk_stream = Bio::SeqIO->new(    -file => $gbk,
                                    -format => 'GenBank');
my $seq_obj = $gbk_stream->next_seq();

if (defined ($seq_obj)){
  for my $feat_object ($seq_obj->get_SeqFeatures) {
    if ($feat_object->has_tag('cds')) {
      $cds = $feat_object->get_tag_values('cds'));

If you only need the CDS it may be a little bit to much... But if you are going to need other information from the genbank file it could be useful.

There is probably a simpler and more elegant solution...

ADD REPLYlink written 3.8 years ago by emmanuel.bouilhol20
gravatar for roy.granit
3.8 years ago by
roy.granit790 wrote:

I believe this could be done using UCSC 'table browser' just select the type of identifiers, input the list of ids, and select CDS output..


ADD COMMENTlink written 3.8 years ago by roy.granit790
gravatar for biocyberman
3.8 years ago by
biocyberman760 wrote:

I haven't tried UCSC's tablebrowser for this, but my favorite tool is Ensembl's  Biomart:

It 's a very useful tool for this kind of things any many other kind of queries, for example: finding homolog genes across species, convert one type of IDs to another. It's worth to familiarize yourself with it.

When it comes to coordinates, be sure to choose the correct version of assembly. For example if you are working with GRCh37/hg19, it is a good idea to go to this site instead:

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by biocyberman760
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 843 users visited in the last hour