Question

Get the 16s rRNA gene from NCBI

0

Entering edit mode

7.9 years ago

kelvinfrog75 ▴ 10

Hey, I have a list of NCBI ID and gi number and they are the ID/number for the whole bacteria gene. I want to build a fasta file with only the 16s rRNA gene from the bacteria associated these ID and gi#. Does anyone know if there is a quick way of doing it? Is there a search that you can just enter the whole bacteria genome ID and then specify the 16srRNA gene. Alternatively, does anyone know the primer sequences that can extract the 16s gene? if so, I guess I can run the PCR simulation to extract out the 16S region from the whole genome.

gene • 9.4k views

ADD COMMENT • link updated 7.9 years ago by natasha.sernova ★ 4.0k • written 7.9 years ago by kelvinfrog75 ▴ 10

score 0 · Answer 1 · 2016-06-02

0

Entering edit mode

7.9 years ago

natasha.sernova ★ 4.0k

You have a list of ids and gis, am I right?

Save the list to the text-file, 16s-sRNAs.txt, each id in a new line,

Go to Batch Entrez:

Read the text on the page, it’s important.

http://www.ncbi.nlm.nih.gov/sites/batchentrez

Then:

1) select a correct nucleotide database in the upper left corner of the page. The text on the page explains the difference.

2) select your File in the middle of the menu - 16s-sRNAs.txt from your computer.

3) go to Retrieve - right side of the manu.

In some time you will see what nucleotide sequences NCBI has for this list of IDs.

You can make a database out of the sequences with makeblastdb.

See the following papers, if you need additional information:

The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses.

http://www.ncbi.nlm.nih.gov/pubmed/23460914 , 2013

Then and now: use of 16S rDNA gene sequencing for bacterial identification and discovery of novel bacteria in clinical microbiology laboratories.

http://www.ncbi.nlm.nih.gov/pubmed/18828852 ,2008

16S rRNA Gene Sequencing for Bacterial Identification in the Diagnostic Laboratory: Pluses, Perils, and Pitfalls

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2045242/ , 2007

ADD COMMENT • link 7.9 years ago by natasha.sernova ★ 4.0k

0

Entering edit mode

I have the GI number for the whole genome but not the 16s. So I am not sure if the method you describe would work unless it can identify the 16s by inputing the 16s command.

ADD REPLY • link 7.9 years ago by kelvinfrog75 ▴ 10

0

Entering edit mode

That would be one reason to use the method I outlined below.

ADD REPLY • link 7.9 years ago by GenoMax 141k

0

Entering edit mode

Hey, I tried your method but I keep getting "Segmentation fault: 11" when I run blastdbcmd. Do you know what can cause that? thanks.

ADD REPLY • link 7.9 years ago by kelvinfrog75 ▴ 10

0

Entering edit mode

Did you get the correct executable for your OS? How much memory do you have?

ADD REPLY • link 7.9 years ago by GenoMax 141k

score 0 · Answer 2 · 2016-06-02

0

Entering edit mode

7.9 years ago

GenoMax 141k

NCBI has a 16s microbial blast database. It is available here. Get that file. Then use blastdbcmd from blast+ package to retrieve the sequences you need. Look into the -entry_batch option where you can provide the gi # (for now this will work but gi's are going away in September 2016).

ADD COMMENT • link 7.9 years ago by GenoMax 141k