6.8 years ago by
University of Nebraska
I'm assuming since you're using rpoB and you mention QIIME that you're interested in identifying amplicons from environmental samples. Yes, you can create your own BLAST database and identify your samples that way. If you're interested in clustering your rpoB sequences and identifying your OTUs for this sequences you can use QIIME.
There are a few ways to go about doing this. You can first use a sequence similarity clustering that is not based on a database to cluster your sequences at a self-determined level of similarity (i.e. 95%, 97%, 99%). Since these methods don't typically identify sequences, you'll have to go and then BLAST your OTU sequences. To do this you can use CD-HIT, USEARCH, etc.
If you're familiar with QIIME you can create your own database. I've done this for numerous markers and it's not particularly difficult. You'll need to have two files: one with your sequence database and one with your corresponding taxonomy. This is similar if you've used MEGAN also. You need to use the sequence database to identify your sequences to OTU and then you need to name your OTUs. If you refer to the QIIME tutorial, make reference to the section "Step 3: Assign Taxonomy" and instead of using the the RDP 16S database you can add your own from the command line.
I'm not aware of any public rpoB sequence database that is already formated for QIIME. I am aware of this paper "Complete rpoB gene sequencing as a suitable supplement to DNA–DNA hybridization for bacterial species and genus delineation" which has a rpoB database in the supplementary materials which you could use in lieu of creating your own from NCBI, EMBL, etc.
Best of luck.
modified 6.8 years ago
6.8 years ago by
Josh Herr ♦ 5.7k