How to download all available sequences of a gene from all bacteria using R
0
0
Entering edit mode
5.3 years ago
mschmidt ▴ 80

I need to download all/many sequences of a specific bacterial gene from Genbank nuccore database from entries limited to complete genome sequences. I prefer using R. Querying: 'Bacteria[ORNG] AND gyrB[GENE] AND complete genome[TI] ' in web interface results in >10k hits. I do not want to download whole genome sequences but only extracted gyrB sequences to make a local database. I tried

library(rentrez):
db = "nuccore"
query = "Bacteria[ORGN] AND gyrB[GENE] AND complete[TI]" 
found = entrez_search(db, query, config = NULL, retmode = "xml", use_history = FALSE, retmax = 90000)

but this fetch ids for whole genome sequences. Is it possible to get fasta sequences for gryB genes or at least gyrB coordinates however I'm not into downloading whole genome sequences of thousands of genomes.

R sequence gene genbank • 1.5k views
ADD COMMENT
0
Entering edit mode

You can get this data from Ensembl bacteria using the Ensembl Genomes perl API or maybe using the R package biomartr.

ADD REPLY
0
Entering edit mode

It would be a great option but I found that BioMart is not currently available for Ensembl Bacteria. https://support.bioconductor.org/p/82585/

ADD REPLY

Login before adding your answer.

Traffic: 2466 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6