Hey, I have a list of NCBI ID and gi number and they are the ID/number for the whole bacteria gene. I want to build a fasta file with only the 16s rRNA gene from the bacteria associated these ID and gi#. Does anyone know if there is a quick way of doing it? Is there a search that you can just enter the whole bacteria genome ID and then specify the 16srRNA gene. Alternatively, does anyone know the primer sequences that can extract the 16s gene? if so, I guess I can run the PCR simulation to extract out the 16S region from the whole genome.
You have a list of ids and gis, am I right?
Save the list to the text-file, 16s-sRNAs.txt, each id in a new line,
Go to Batch Entrez:
Read the text on the page, it’s important.
1) select a correct nucleotide database in the upper left corner of the page. The text on the page explains the difference.
2) select your File in the middle of the menu - 16s-sRNAs.txt from your computer.
3) go to Retrieve - right side of the manu.
In some time you will see what nucleotide sequences NCBI has for this list of IDs.
You can make a database out of the sequences with makeblastdb.
See the following papers, if you need additional information:
The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses.
Then and now: use of 16S rDNA gene sequencing for bacterial identification and discovery of novel bacteria in clinical microbiology laboratories.
16S rRNA Gene Sequencing for Bacterial Identification in the Diagnostic Laboratory: Pluses, Perils, and Pitfalls
NCBI has a 16s microbial blast database. It is available here. Get that file. Then use
blastdbcmd from blast+ package to retrieve the sequences you need. Look into the
-entry_batch option where you can provide the gi # (for now this will work but gi's are going away in September 2016).