Question: BWA or STAR for microbiome 16s multiple sequence alignment?
gravatar for ariel.balter
9 weeks ago by
ariel.balter130 wrote:

I'm working with MISeq processed sequences of microbiome 16s samples. Is there any reason why I should use R tools like msa (Bioconductor) or decipher over NGS tools like bwa or STAR?

Specifically, I'm using a dada2 workflow put out by the authors. My goal is the produce the phyloseq object for downstream analysis.

In this section

they construct a phylogenetic tree using phangorn. But first, they use DECIPHER for the multiple sequence alignment.

For certain reasons I'd like to do this without DECIPHER. Also, although this sounds lame, I don't want to become an expert in MSA as this is a very tiny part of the workflow, but is a bottleneck for generating the tree. My background is more in whole-genome sequencing like ChIP- and RNA-Seq.

So, can I use bwa-mem or STAR for MSA?

ADD COMMENTlink modified 8 weeks ago by Charles Warden5.8k • written 9 weeks ago by ariel.balter130

No, there is no reason at all to use msa for read alignment. And decipher is not really related ("DECIPHER (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources) is an interactive web-based database"). Do you have any idea what you are doing? I'd suggest you to read some method sections of literature.

Also, without having any context about what you aim to obtain it's quite hard to help you.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by WouterDeCoster35k

I think OP is referring to this decipher on bioconductor. It's an MSA tool

ADD REPLYlink written 8 weeks ago by RamRS19k

Well, that would also make sense.

ADD REPLYlink written 8 weeks ago by WouterDeCoster35k

I added a bunch of detail to the question. Maybe that will help.

ADD REPLYlink written 8 weeks ago by ariel.balter130

All of that should have been in your question to begin with. It changes the answer quite a lot, so I'll move my reaction to a comment.

ADD REPLYlink written 8 weeks ago by WouterDeCoster35k

I hear ya. I thought it would be a more straightforward question.

ADD REPLYlink written 8 weeks ago by ariel.balter130

In principle yes, provided you choose a clever reference. In case you plan to use more than one sequence, you need to make sure you know the corresponding coordinates - which you get from a msa of the original sequences (ok, there's ready aligned data out there, but you won't be able to use a msa as reference directly)

Quite likely the results will be different and a phylogenetic specialist might pick up on the method when you aim to publish unless you provide some comparison to the classic msa technique. That will make you the msa specialist you don't want to be.

Frankly, I don't know bioconductor decipher's output format, but it's certainly not bam. Do you think converting the bam to a phyloseq object is faster than using the existing workflow?

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Carambakaracho620

So, can I use bwa-mem or STAR for MSA?

No you can't. They are not multiple sequence alignment programs. They align one sequence at one time. If you don't like DECIPHER there are other MSA programs you could use.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by genomax59k
gravatar for Charles Warden
8 weeks ago by
Charles Warden5.8k
Duarte, CA
Charles Warden5.8k wrote:

While I'm not sure how much is in your control, it is important to have an appropriate workload. Part of determining that is defining a set of projects where you can gradually expand your skillset, and make sure you have time to critically evaluate your results. So, it can unfortunately be bad if you try to compartmentalize a project too much (skipping learning about certain steps / applications). In other words, if you are responsible for a 16S project, it is important to be able to take the time to understand as much as possible :)

While I would probably use something like mothur and/or RDPclassifier first, it wouldn't be completely unreasonable to also test BWA on something like the RDP fasta training set reference (or, with a subset of reads, it might not be out of the question to check assignments with BLAST). However, the comments are correct that BWA is not appropriate for an MSA.

This is different than creating an MSA, but it seems relevant to your project if you have MiSeq 16S data. So, I hope you don't mind these suggestions.

ADD COMMENTlink written 8 weeks ago by Charles Warden5.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1608 users visited in the last hour