Question: Nucleotide CDS from RefSeq?
0
gravatar for Prohan
5.0 years ago by
Prohan350
United States
Prohan350 wrote:

Hi All,

I'm trying to retrieve the nucleotide sequences of the complete RefSeq protein CDS's. I've looked at the files at ftp://ftp.ncbi.nih.gov/refseq/release/complete/ but I can't seem to find a file that has the CDS + the original nucleotide (whole genome) sequence that the CDS came from.

I don't have a problem parsing genbank files - just seems odd that there isn't one genbank file that has the information I need.

I could add the whole genome sequences to the genbank files that have the CDS info. Just seems like I'm missing something obvious here.

Here's the general problem I'm trying to solve:

I have a protein with accession "CAA23625" from RefSeq - I'd like the nucleotide sequence of the CDS. Ideally I'd like to do the parsing locally without having to really on hitting NCBI's server with an Entrez query. Thanks,

Rohan

genbank biopython ncbi • 1.8k views
ADD COMMENTlink modified 5.0 years ago by eddie.im130 • written 5.0 years ago by Prohan350
1

What is your question? Perhaps a specific example might help...

ADD REPLYlink written 5.0 years ago by Peter5.8k
0
gravatar for eddie.im
5.0 years ago by
eddie.im130
Brazil
eddie.im130 wrote:

Get a list of protein acession number (in uniprot you can download then easily), then convert those to "EMBL CDS" through Uniprot "IDmaping" tab. Then use bpfetch (bioperl) with a loop. Thats how i did it.
 

while read accession_number ; 
do bp_fetch net::embl:${accession_number} ; 
done < accessions.list > results.txt
ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by eddie.im130

Thanks for the info. I'm trying to do it locally by parsing the genbank files rather than htting the NCBI/embl servers a ton. 

ADD REPLYlink written 5.0 years ago by Prohan350
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1631 users visited in the last hour