Remove Gaps from Multiple sequence alignment
3
0
Entering edit mode
5 weeks ago
ramsha • 0

I want to remove col that contains gaps in the MSA file... Any sort of python code that helps me???

col Remove in MSA • 300 views
ADD COMMENT
0
Entering edit mode

Actually, I want to apply the complete deletion on the MSA file. complete deletion means sites containing missing data or alignment gaps are removed before the analysis begins.

ADD REPLY
0
Entering edit mode
5 weeks ago

Not sure if it's python code but I know that trimAL can be used for this.

ADD COMMENT
0
Entering edit mode
5 weeks ago

Why python code, specifically? Unless you want to practice your programming skills there are good tools to do that out there. Also, do you want to remove all gaps (un-align) or remove a certain portion of gaps (e.g. columns with > 50% gaps) or uninformative columns? Still it is nice to have all the options.

  • Jalview (grapahical interface, Edit -> remove all gaps)
  • trimAL trimal -nogaps or trimal -noallgaps should work either way (can be installed via conda), it can also clip your sequence identifiers into a shorter compatible format. Some older phylogenetic software (phylip and thereby prottest3 - max. 10 characters sequence id, mrbayes, no length restriction, but sub-string 1:15 must uniquely indentify sequence) is darn picky about these, and it looks like you might run into problems with your identifiers. I have a perl-script though, that also attempts to keep the identifiers unique and readable, let me know if you need that too.
  • sed '/^[^>]/s/-//g' input_file should also do as a quick command-line hack without any installation, however that will leave you with unequal length fasta lines which most tools are completely fine with, or pipe the output through EMBOSS seqret to fix the output
ADD COMMENT
0
Entering edit mode
5 weeks ago
Sej Modha 5.0k

Another option is to use Gblocks to remove gaps and/or poorly aligned regions from MSA.

ADD COMMENT

Login before adding your answer.

Traffic: 1745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6