Question: Construction of download links for genbank files from NCBI gene database
0
gravatar for amir.taheri.ghahfarokhi
17 months ago by

Hello,

To download the RefSeq genbank file for a given gene, I usually navigate to NCBI Gene database (e.g. for GH1 gene: https://www.ncbi.nlm.nih.gov/gene/2688), and then from the gene webpage I click on the "GenBank" link on top-right of the graphical presentation (which leads me here: https://www.ncbi.nlm.nih.gov/nuccore/NC_000017.11?report=genbank&from=63917193&to=63918852&strand=true). To save the genbank file on my computer, I use the "Send to" dropdown list and I select the File and then "Create File". This will save the genebank file on my computer. I can see the URL for the downloaded file as: https://www.ncbi.nlm.nih.gov/sviewer/viewer.cgi?tool=portal&save=file&log$=seqview&db=nuccore&report=genbank&id=568815581&from=63917193&to=63918852&strand=on&conwithfeat=on&basic_feat=on&withparts=on

In my opinion, it will be easy to construct this link and automate the download for those gene with known genomic coordinates (from=63917193&to=63918852). Does anyone know what "id=568815581" in this link refers to? Is it specifying a piece of a chromosome? Where can I find the list of these IDs for human genome?

Thanks in advance for your help. Best. Amir

assembly gene • 546 views
ADD COMMENTlink modified 17 months ago • written 17 months ago by amir.taheri.ghahfarokhi0

This is an internal ID, as far as I know (might be the old outphased GI number), see

https://www.ncbi.nlm.nih.gov/nuccore/568815581

Maybe a batch search can help you https://www.ncbi.nlm.nih.gov/sites/batchentrez

There are more ways to have this automated, but none of those are trivial. Do you require this only occasionally?

ADD REPLYlink modified 17 months ago • written 17 months ago by Carambakaracho1.9k

Thanks, Carambakaracho for the quick reply. I am writing an excel macro and at one point I need to download the GenBank file for a given gene, and then annotate a list of features. To this end, I can parse the GenBank files and annotate my features. I have downloaded the genomic coordinates for genes, I only need to find a way to construct the download link. I think you already helped a lot. Best, Amir

ADD REPLYlink written 17 months ago by amir.taheri.ghahfarokhi0

It is possible to download the genbank file of interest using NCBI Eutils.

elink -target nuccore -db gene -id "2688"|efilter -query "genomic and assembly" -source refseq|efetch -format gb

It is not advisable to use Excel for parsing Genbank file, I'd suggest that you look at BioPython instead.

ADD REPLYlink modified 17 months ago • written 17 months ago by Sej Modha4.6k
0
gravatar for genomax
17 months ago by
genomax76k
United States
genomax76k wrote:

While @Sej referred to NCBI's unix utils if you strictly want to create web links then you should use NCBI's E-utilities. There is a quick start book available here. You will want to replace gi with Accession numbers since NCBI has mostly deprecated gi for external use.

ADD COMMENTlink written 17 months ago by genomax76k

Thank you both @Sej and @genomax. I needed something to use in Excel.

ADD REPLYlink written 17 months ago by amir.taheri.ghahfarokhi0
0
gravatar for amir.taheri.ghahfarokhi
17 months ago by

Here are the ids for human and mouse chromosomes:

Human

Chr1    568815597
Chr2    568815596
Chr3    568815595
Chr4    568815594
Chr5    568815593
Chr6    568815592
Chr7    568815591
Chr8    568815590
Chr9    568815589
Chr10   568815588
Chr11   568815587
Chr12   568815586
Chr13   568815585
Chr14   568815584
Chr15   568815583
Chr16   568815582
Chr17   568815581
Chr18   568815580
Chr19   568815579
Chr20   568815578
Chr21   568815577
Chr22   568815576
ChrX    568815575
ChrY    568815574

Mouse

Chr1    372099109
Chr2    372099108
Chr3    372099107
Chr4    372099106
Chr5    372099105
Chr6    372099104
Chr7    372099103
Chr8    372099102
Chr9    372099101
Chr10   372099100
Chr11   372099099
Chr12   372099098
Chr13   372099097
Chr14   372099096
Chr15   372099095
Chr16   372099094
Chr17   372099093
Chr18   372099092
Chr19   372099091
ChrX    372099090
ChrY    372099089

And here is the link to the Excel file that for a given gene symbol download the genbank file. https://github.com/Ghahfarokhi/sgRNA_Annotator

My goal was to develop an Excel file that could download genbank file and annotate CRISPR sgRNAs.

ADD COMMENTlink modified 17 months ago by finswimmer13k • written 17 months ago by amir.taheri.ghahfarokhi0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 859 users visited in the last hour