Question: Should I merge, then trim PE reads before multiple sequence alignment?
0
gravatar for hannahfreund3
21 days ago by
hannahfreund30 wrote:

Hi everyone!

I have a set of demultiplexed fungal ITS1 reads, and I'd like to run a multiple sequence alignment with these R1 and R2 reads using MUSCLE.

I've used MUSCLE with metagenomic shotgun data, not amplicon data, so I am not sure about how to preprocess my sequences before the MSA.

Should I merge my R1 + R2 reads, trim them, then run them through a MSA? Or can I run R1s and R2s in through their respective MSAs (as raw reads) and see what my results are?

I would love your feedback, thank you!!

#msa #amplicon #its1 alignment • 111 views
ADD COMMENTlink modified 20 days ago by h.mon29k • written 21 days ago by hannahfreund30
1
gravatar for genomax
21 days ago by
genomax78k
United States
genomax78k wrote:

You can adapter trim the reads before merging them (sounds like your amplicon design will support merging) before doing an MSA. bbduk.sh followed by bbmerge.sh from BBMap suite would be good options.

ADD COMMENTlink written 21 days ago by genomax78k

Other "classic" approached include FLASH, PEAR (academic only license), or PANDAseq (alledgedlyn containing FLASH and PEAR) . There's also an older Biostars post here, with some good summary.

If at all, I would trim only with very conservative settings before merging - ideally a merging algorithm handles the quality at a position.

ADD REPLYlink modified 21 days ago • written 21 days ago by Carambakaracho2.0k
1
gravatar for h.mon
20 days ago by
h.mon29k
Brazil
h.mon29k wrote:

Answering your explicit question, you can follow genomax answer, plus adding a duplicate removal step: there is no sense in having thousands of identical sequences for performing multiple sequence alignment. dedupe.sh from the BBTools / BBMap suite is a good option, or also VSEARCH.

However, as you are sequencing ITS, I believe your goal is taxonomic classification and quantification. If that is the case, I would advise you to follow one of the many established pipelines, such as DADA2 or QIIME2.

ADD COMMENTlink written 20 days ago by h.mon29k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1323 users visited in the last hour