Align and trim thousands of genes
1
0
Entering edit mode
3.3 years ago
sankadinesh ▴ 20

Dear All, I have 20000 ASVs obtained from multiple studies with gene sizes ranging from 200 to 320. I would like to do multiple alignment and trim unaligned portion just like MEGA. Please suggest me a suitable software and protol to do it. Thanks

Regards, Dinesh

alignment gene sequencing next-gen • 860 views
ADD COMMENT
0
Entering edit mode

Two trimming program options included in this answer: A: How to clean multiple protein sequences alignement in order to make a phylogenic

ADD REPLY
0
Entering edit mode
3.3 years ago
Mensur Dlakic ★ 27k

Generally speaking this is a straightforward task, but we lack information from you regarding the details. That's why my suggestions will be general, but it should be a good enough starting point for you to adapt to your specific needs.

Here is a simple C-shell script that will do this (bash script would be fairly similar):

foreach i ( *.fasta )
mafft --maxiterate 1000 --localpair --thread 8 --nomemsave $i > $i:r.afa
trimal -in $i:r.afa -out $i:r.trimmed.afa -gt 0.5
end

This assumes that all your starting files are in the same directory and have a .fasta extension. Alignments are done with mafft in comprehensive mode (slowest), but you may want to choose a different program (clustalw, clustalo, muscle, etc). After that each alignment (ending in .afa) is trimmed with trimal such that all columns with more than half gapped positions are removed (resulting in .trimmed.afa files). This may or may not be what you want, so you should look up other available trimming option.

The whole script probably need not be longer than 3-4 lines like above, though you will probably want to adjust the exact commands. Lastly, I suggest you consider how to speed up the whole thing by utilizing most or all of your CPUs, and at that point it becomes a waiting game.

ADD COMMENT

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6