Question

BWA or STAR for microbiome 16s multiple sequence alignment?

0

Entering edit mode

5.5 years ago

ariel.balter ▴ 260

I'm working with MISeq processed sequences of microbiome 16s samples. Is there any reason why I should use R tools like msa (Bioconductor) or decipher over NGS tools like bwa or STAR?

Specifically, I'm using a dada2 workflow put out by the authors. My goal is the produce the phyloseq object for downstream analysis.

In this section

http://web.stanford.edu/class/bios221/MicrobiomeWorkflowII.html#construct_phylogenetic_tree

they construct a phylogenetic tree using phangorn. But first, they use DECIPHER for the multiple sequence alignment.

For certain reasons I'd like to do this without DECIPHER. Also, although this sounds lame, I don't want to become an expert in MSA as this is a very tiny part of the workflow, but is a bottleneck for generating the tree. My background is more in whole-genome sequencing like ChIP- and RNA-Seq.

So, can I use bwa-mem or STAR for MSA?

RNA-Seq alignment microbiome 16s • 2.5k views

ADD COMMENT • link updated 5.5 years ago by Charles Warden 8.2k • written 5.5 years ago by ariel.balter ▴ 260

2

Entering edit mode

No, there is no reason at all to use msa for read alignment. And decipher is not really related ("DECIPHER (DatabasE of genomiC varIation and Phenotype in Humans using Ensembl Resources) is an interactive web-based database"). Do you have any idea what you are doing? I'd suggest you to read some method sections of literature.

Also, without having any context about what you aim to obtain it's quite hard to help you.

ADD REPLY • link 5.5 years ago by WouterDeCoster 47k

0

Entering edit mode

I think OP is referring to this decipher on bioconductor. It's an MSA tool

ADD REPLY • link 5.5 years ago by Ram 43k

0

Entering edit mode

Well, that would also make sense.

ADD REPLY • link 5.5 years ago by WouterDeCoster 47k

0

Entering edit mode

I added a bunch of detail to the question. Maybe that will help.

ADD REPLY • link 5.5 years ago by ariel.balter ▴ 260

0

Entering edit mode

All of that should have been in your question to begin with. It changes the answer quite a lot, so I'll move my reaction to a comment.

ADD REPLY • link 5.5 years ago by WouterDeCoster 47k

0

Entering edit mode

I hear ya. I thought it would be a more straightforward question.

ADD REPLY • link 5.5 years ago by ariel.balter ▴ 260

0

Entering edit mode

In principle yes, provided you choose a clever reference. In case you plan to use more than one sequence, you need to make sure you know the corresponding coordinates - which you get from a msa of the original sequences (ok, there's ready aligned data out there, but you won't be able to use a msa as reference directly)

Quite likely the results will be different and a phylogenetic specialist might pick up on the method when you aim to publish unless you provide some comparison to the classic msa technique. That will make you the msa specialist you don't want to be.

Frankly, I don't know bioconductor decipher's output format, but it's certainly not bam. Do you think converting the bam to a phyloseq object is faster than using the existing workflow?

ADD REPLY • link 5.5 years ago by Carambakaracho ★ 3.2k

0

Entering edit mode

So, can I use bwa-mem or STAR for MSA?

No you can't. They are not multiple sequence alignment programs. They align one sequence at one time. If you don't like DECIPHER there are other MSA programs you could use.

ADD REPLY • link 5.5 years ago by GenoMax 141k

score 1 · Answer 1 · 2018-10-11

While I'm not sure how much is in your control, it is important to have an appropriate workload. Part of determining that is defining a set of projects where you can gradually expand your skillset, and make sure you have time to critically evaluate your results. So, it can unfortunately be bad if you try to compartmentalize a project too much (skipping learning about certain steps / applications). In other words, if you are responsible for a 16S project, it is important to be able to take the time to understand as much as possible :)

While I would probably use something like mothur and/or RDPclassifier first, it wouldn't be completely unreasonable to also test BWA on something like the RDP fasta training set reference (or, with a subset of reads, it might not be out of the question to check assignments with BLAST). However, the comments are correct that BWA is not appropriate for an MSA.

This is different than creating an MSA, but it seems relevant to your project if you have MiSeq 16S data. So, I hope you don't mind these suggestions.