Should I merge, then trim PE reads before multiple sequence alignment?
2
0
Entering edit mode
4.2 years ago
lintonf • 0

Hi everyone!

I have a set of demultiplexed fungal ITS1 reads, and I'd like to run a multiple sequence alignment with these R1 and R2 reads using MUSCLE.

I've used MUSCLE with metagenomic shotgun data, not amplicon data, so I am not sure about how to preprocess my sequences before the MSA.

Should I merge my R1 + R2 reads, trim them, then run them through a MSA? Or can I run R1s and R2s in through their respective MSAs (as raw reads) and see what my results are?

I would love your feedback, thank you!

amplicon alignment MSA ITS1 • 1.7k views
ADD COMMENT
1
Entering edit mode
4.2 years ago
GenoMax 141k

You can adapter trim the reads before merging them (sounds like your amplicon design will support merging) before doing an MSA. bbduk.sh followed by bbmerge.sh from BBMap suite would be good options.

ADD COMMENT
0
Entering edit mode

Other "classic" approached include FLASH, PEAR (academic only license), or PANDAseq (alledgedlyn containing FLASH and PEAR) . There's also an older Biostars post here, with some good summary.

If at all, I would trim only with very conservative settings before merging - ideally a merging algorithm handles the quality at a position.

ADD REPLY
1
Entering edit mode
4.2 years ago
h.mon 35k

Answering your explicit question, you can follow genomax answer, plus adding a duplicate removal step: there is no sense in having thousands of identical sequences for performing multiple sequence alignment. dedupe.sh from the BBTools / BBMap suite is a good option, or also VSEARCH.

However, as you are sequencing ITS, I believe your goal is taxonomic classification and quantification. If that is the case, I would advise you to follow one of the many established pipelines, such as DADA2 or QIIME2.

ADD COMMENT

Login before adding your answer.

Traffic: 2518 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6