Question

Species identifcation using 16s rRNA

1

Entering edit mode

3.2 years ago

A_heath ▴ 160

Hi all,

I have a fasta file of an assembled genome of a bacterial strain. I would like to identify the species of this strain (or the closest one possible) using the 16s rRNA sequence.

I tried downloading multiple 16s rRNA sequences from the NCBI database and to perform blastn analysis but I don't think I am using the most efficient method.

Do you have any recommendations on what tool to use to identify the species of a particular strain using 16s rRNA ?

Thank you in advance for your help!

Audrey

16srRNA fasta • 2.3k views

ADD COMMENT • link updated 7 months ago by Ram 43k • written 3.2 years ago by A_heath ▴ 160

1

Entering edit mode

try Type Strain Genome Server https://tygs.dsmz.de/

ADD REPLY • link 3.2 years ago by hafiz.talhamalik ▴ 350

1

Entering edit mode

3.2 years ago

harishk0201 ▴ 130

Other than what both 5heikki and Michael have said, I think I might also have a solution?

You can also use barrnap to get the rRNA sequences, then blast/cmsearch against either RFAM or NCBI database.

ADD COMMENT • link 3.2 years ago by harishk0201 ▴ 130

1

Entering edit mode

Another option is to use RNAmmer web server/commandline tool for which barnap should be a replacement, one might argue that one or the other software is more accurate, but in this case, they will all get the job done.

ADD REPLY • link 3.2 years ago by Michael 54k

0

Entering edit mode

Absolutely, I was just offering an alternative since RNAmmer never really worked for me!

Have a great day!

ADD REPLY • link 3.2 years ago by harishk0201 ▴ 130

1

Entering edit mode

Also, RNAmmer comes with a terrible bogus license that makes it almost impossible to use.

ADD REPLY • link 3.2 years ago by Michael 54k

0

Entering edit mode

3.2 years ago

the_dummy ▴ 30

First, I thought about NCBI 16S RefSeq Database. Is it ok to blast a whole genome sequence to this type of database?

related image

If it is ok, you could try it.

ADD COMMENT • link 3.2 years ago by the_dummy ▴ 30

1

Entering edit mode

Well, you will definitely save some CPU time if you identify the rDNA sequence first. If your genome is too large, you might not even get the job started. The processing by cmsearch takes likely the same amount of time or less than uploading the genome.

ADD REPLY • link 3.2 years ago by Michael 54k

0

Entering edit mode

That DB has only like 20k sequences or so..

ADD REPLY • link 3.2 years ago by 5heikki 11k

score 4 · Accepted Answer · 2021-02-16

4

Entering edit mode

3.2 years ago

Michael 54k

Try this:

Download the bacterial SSU model from RFAM: https://rfam.org/family/RF00177#tabview=tab9 Then use the .cm file with the tool cmsearch of the Infernal suite to locate the SSU in the genome, extract the genome sequence at the best scoring coordinates and submit to NCBI Blast or blast against SILVA. You can repeat that with the LSU as well if you like. If you want to blast the rest of the genome to gain further confidence, then you can use the taxon obtained by SSU search to restrict the blast taxon range and thereby speed up the blast search.

Hope this works out for you.

ADD COMMENT • link 3.2 years ago by Michael 54k

0

Entering edit mode

Thank you Michael for your reply.

I tried as you explained using Rfam bacterial LSU (RF02541) and SSU (RF00177). I typed the following command line:

cmsearch -o output LSU.cm genome.fasta

Do you think I need to calibrate first?

And I have an output file with genomic coordinates of the regions of interest. I tried blasting them on the NCBI database and it works perfectly.

However, I have troubles using SILVA. Do you know where I can blast my sequence on this website?

Thank you again for your precious help!

ADD REPLY • link 3.2 years ago by A_heath ▴ 160

1

Entering edit mode

Hi Audrey,

Good to see this works, note also the Caveat about strain identification by 5heikki, the rDNA gene might not be informative for differentiating strains, but it should work fine at least on the genus level. You might have to look at whole genome alignments to identify the key variants that identify the closest strain. As far as I know the calibration only affects the e-value and corrects for sequence composition, but as we are just interested in the best hit(s), it is not really necessary here, but it should pose a problem to first run the calibration (again).

You can search SILVA here: https://www.arb-silva.de/aligner/ (it's not blast but works in a similar way)

ADD REPLY • link 3.2 years ago by Michael 54k

0

Entering edit mode

Thank you for your help, it is so useful. Have a nice day

ADD REPLY • link 3.2 years ago by A_heath ▴ 160

score 4 · Accepted Answer · 2021-02-16

4

Entering edit mode

3.2 years ago

5heikki 11k

This is a good 16S reference file: https://www.arb-silva.de/fileadmin/silva_databases/current/Exports/SILVA_138.1_SSURef_tax_silva.fasta.gz

And the map that goes with it: https://www.arb-silva.de/fileadmin/silva_databases/current/Exports/taxonomy/taxmap_slv_ssu_ref_138.1.txt.gz

Although you may want to read the SILVA site first to determine if you want these files or perhaps e.g. a non-redundant higher quality version

https://www.arb-silva.de/

Blasting against the fasta file is fine as long as you place proper thresholds for hit reporting. Also, it will obviously be way faster if you're just blasting the 16S sequence and not the entire genome assembly (for ideas how to extract see Michael's reply)

That being said, I don't think 16S is the way to go for species identification, especially if you want strain/serotype level resolution

ADD COMMENT • link 3.2 years ago by 5heikki 11k

0

Entering edit mode

Thank you 5heikki for your reply.

With the 16S reference file that you suggested, I downloaded it and made a database to blast it against my genome file. Is that what I should do to identify the species?

ADD REPLY • link 3.2 years ago by A_heath ▴ 160

1

Entering edit mode

If you're fine with species-level resolution and see a full length hit with ~ 99.9% similarity then that's your species (with perhaps some exceptions like e.g. E.coli/Shigella). For higher level resolution (strain/serotype/subspecies/etc.) 16S isn't enough..