Downloading specific gene sequences within a genome from genbank using biopython
0
0
Entering edit mode
3.2 years ago
Wilber0x ▴ 50

Is there a way to download specific gene sequences from genbank using Seq.IO and the gene IDs? I know how to download a genome sequence with the genbank ID, but I am looking to use the gene IDs within a genbank file to directly download sequences from that genome sequence.

gene sequence python • 955 views
ADD COMMENT
0
Entering edit mode

What do you mean by a gene ID? Can you give us a few examples?

ADD REPLY
0
Entering edit mode

So the genbank id for Actinidia chinensis is NC_026690.1 whereas if i want to look at trnR (UCU) within that genome it has a gene ID which is 23857713

ADD REPLY
1
Entering edit mode

NC_xxxx is an entire contig, and here is an entire genome. It contains multiple genes and thus multiple Entrez Gene IDs. You should be able to extract all gene features from the genbank file, get the db_xref for each of them and use the Entrez IDs in a straightforward manner. Each step will need some digging on Google and some experimentation, but it should not be too challenging to figure this out from the outline I've given you.

ADD REPLY
1
Entering edit mode

Using EntrezDirect you can get the fasta sequence for all genes for this accession. Returned data should be parse able in python to keep ones you want. (truncated to show just headers and a few example due to space).

$ esearch -db gene -query "23857713" | elink -target nuccore | efetch -format gene_fasta | grep ">"
>lcl|NC_026690.1_gene_1 [gene=psbA] [locus_tag=VU32_p001] [db_xref=GeneID:23764072] [location=complement(join(156109..156346,1..824))] [gbkey=Gene]
>lcl|NC_026690.1_gene_2 [gene=trnK (UUU)] [locus_tag=VU32_t001] [db_xref=GeneID:23857666] [location=complement(1045..3609)] [gbkey=Gene]
>lcl|NC_026690.1_gene_3 [gene=matK] [locus_tag=VU32_p083] [db_xref=GeneID:23764034] [location=complement(1344..2858)] [gbkey=Gene]
>lcl|NC_026690.1_gene_4 [gene=rps16] [locus_tag=VU32_p082] [db_xref=GeneID:23763962] [location=complement(4345..5483)] [gbkey=Gene]
>lcl|NC_026690.1_gene_5 [gene=trnQ (UUG)] [locus_tag=VU32_t002] [db_xref=GeneID:23857667] [location=complement(6927..6999)] [gbkey=Gene]
>lcl|NC_026690.1_gene_6 [gene=psbK] [locus_tag=VU32_p081] [db_xref=GeneID:23764035] [location=7341..7532] [gbkey=Gene]
>lcl|NC_026690.1_gene_7 [gene=psbI] [locus_tag=VU32_p080] [db_xref=GeneID:23763964] [location=7917..8027] [gbkey=Gene]
ADD REPLY

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6