Remove Gaps from Multiple sequence alignment
3
0
Entering edit mode
11 months ago
ramsha • 0

I want to remove col that contains gaps in the MSA file... Any sort of python code that helps me???

col Remove in MSA • 760 views
0
Entering edit mode

Actually, I want to apply the complete deletion on the MSA file. complete deletion means sites containing missing data or alignment gaps are removed before the analysis begins.

0
Entering edit mode
11 months ago

Not sure if it's python code but I know that trimAL can be used for this.

0
Entering edit mode
11 months ago

Why python code, specifically? Unless you want to practice your programming skills there are good tools to do that out there. Also, do you want to remove all gaps (un-align) or remove a certain portion of gaps (e.g. columns with > 50% gaps) or uninformative columns? Still it is nice to have all the options.

• Jalview (grapahical interface, Edit -> remove all gaps)
• trimAL trimal -nogaps or trimal -noallgaps should work either way (can be installed via conda), it can also clip your sequence identifiers into a shorter compatible format. Some older phylogenetic software (phylip and thereby prottest3 - max. 10 characters sequence id, mrbayes, no length restriction, but sub-string 1:15 must uniquely indentify sequence) is darn picky about these, and it looks like you might run into problems with your identifiers. I have a perl-script though, that also attempts to keep the identifiers unique and readable, let me know if you need that too.
• sed '/^[^>]/s/-//g' input_file should also do as a quick command-line hack without any installation, however that will leave you with unequal length fasta lines which most tools are completely fine with, or pipe the output through EMBOSS seqret to fix the output
0
Entering edit mode
11 months ago
Sej Modha 5.2k

Another option is to use Gblocks to remove gaps and/or poorly aligned regions from MSA.