Question: 16S rRNA extraction of assembled genome bins
gravatar for luyang1005
9 months ago by
luyang100520 wrote:

Hi, community, I am new in the world of (draft) genome bins analysis. I have multiple assembled genome bins from shotgun metagenomic data analysis. Currently, I want to extract the 16S rRNA sequences from the genome bins. Therefore, I use software CheckM, barrnap to find the 16S rRNA sequences in the genome bins. Both software uses hmmer search to achieve the result. As a result, for some genome bins, I can find two 16S rRNA hits sequences, If I use it archaea and bacteria mode, some fragment of reads can be classified as archaea hits, while also can be classified as bacteria hit in some individual genome bin. For example, one of the bin found two 16S hits of archaea and also two hits of bacteria. The header of the hits are >16S_rRNA::NODE_2_length_100533_cov_5.789665:250-1687(-) and >16S_rRNA::NODE_8_length_10807_cov_5.393508:10362-10807(-) in bacteria output. The header of the hits are >16S_rRNA::NODE_2_length_100533_cov_5.789665:251-1678(-) and >16S_rRNA::NODE_8_length_10807_cov_5.393508:10363-10803(-) And I blast both fasta hits to RDP classifier, and the archaea hits outputs are 16S_rRNA::NODE_2_length_100533_cov_5.789665:251-1678(-);+;Bacteria;100%;"Bacteroidetes";98%;"Bacteroidia";96%;"Bacteroidales";96%;"Rikenellaceae";38%;Mucinivorans;33% 16S_rRNA::NODE_8_length_10807_cov_5.393508:10363-10803(-);+;Bacteria;99%;Firmicutes;70%;Clostridia;61%;Clostridiales;61%;Ruminococcaceae;43%;Hydrogenoanaerobacterium;14% Also bacteria hits outputs are 16S_rRNA::NODE_2_length_100533_cov_5.789665:250-1687(-);+;Bacteria;100%;"Bacteroidetes";98%;"Bacteroidia";94%;"Bacteroidales";94%;"Rikenellaceae";34%;Mucinivorans;24% 16S_rRNA::NODE_8_length_10807_cov_5.393508:10362-10807(-);+;Bacteria;99%;Firmicutes;78%;Clostridia;53%;Clostridiales;53%;Ruminococcaceae;40%;Hydrogenoanaerobacterium;14% So my question are - (1) The result of bacteria and archaea are the same, both are bacteria. Why they are classified into two parts, bacteria and archaea? (2) The two hits came from one genome bin, why they can be predicted and have two 16S with different taxonomy classification? Any one can do me a favor? Appreciate it!

rna-seq gene genome • 362 views
ADD COMMENTlink modified 7 months ago by Asaf6.4k • written 9 months ago by luyang100520

They're not classified as archaea, the genes just match the archaeal 16S hmm well-enough to produce a hit. You would probably get hits against eukaryotic 18S and mitochondrial 16S hmms as well. Why? Because it's the same gene in all the cases and matches the model well enough

ADD REPLYlink written 9 months ago by 5heikki8.6k

Thank you so much! That makes sense! So may I have more suggestion on how to retrieve the 16S sequences from the assembled genome bins? Appreciate your help!

ADD REPLYlink written 9 months ago by luyang100520
gravatar for Asaf
7 months ago by
Asaf6.4k wrote:

Usually the 16S from short reads will be a mess. The assembler will collapse similar sequences together and since the 16S is long and contains a lot of highly conserved regions it will either fail to assemble or will be untrusted. Try to cut the 16S sequence in two and run each part in RDP, you might get different answers.

ADD COMMENTlink written 7 months ago by Asaf6.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1099 users visited in the last hour