Assessing MSA Quality
4
0
Entering edit mode
9.5 years ago
maxjohn • 0

Hi this is my first post.

I am doing a Masters research project which revolves around assessing data quality in phylogenomic analyses. The first thing I need to do is assemble a bunch of scripts or open source command-line programs which can perform any assessment of MSA quality. The more refined these metrics are, the better - GBLOCKS for example is a bit too crude. If possible I'd particularly like to hear about python solutions. All I've come up with from my google searching are papers which don't link to any code resources - just very complicated mathematics which I can neither follow nor implement! Alternatively I find papers comparing software for creating MSAs to assess which is best - this is not something that I need, so please don't suggest comparing MSA software with idealised datasets - that's not what my project is about.

Any help would be much appreciated

Max

Data-quality MSA • 2.1k views
ADD COMMENT
2
Entering edit mode
9.5 years ago
scapella ▴ 390

Hi there,

If you want to consider different metrics about alignment quality - you can start by the already mentioned paper TCS. I guess you would like to identify those residues pairs which are less sensitive to the alignment algorithm and orientation, or in other words, the most consistent residues pairs across different methodologies. I would say it is a really difficult task - and because of that several post-processing alignment programs have been published over the yeas, starting with GBlocks in 2000 and then moving towards newer programs such as our trimAl tool, BMGE, Zorro, AliScore, Guidance, etc. with the main idea of tackling different sources of misaligned residues.

If you want to get to know the grounds of that I'd advise you to read about the Head or Tails approach and the Guidance paper. You can always read a paper this paper "Edgar,R.C. (2010) Quality measures for protein alignment benchmarks. Nucleic Acids Res., 1-9".

Anyway, in trimAl we have a bunch of options for computing (per column or as cumulative) gap scores, similarity scores, identity scores and, if more than one alignment is provided, the consistency scores.

Hope it helps.

S

ADD COMMENT
0
Entering edit mode
9.5 years ago
Siva ★ 1.9k

I have used trimAL which is similar to GBLOCKS but provides more options to assess the MSA quality.

ADD COMMENT
0
Entering edit mode
9.5 years ago
onuralp ▴ 190

Take a look at these two papers: Alignathon and TCS

ADD COMMENT
0
Entering edit mode
9.5 years ago
maxjohn • 0

@Siva yes I know Trimal that's my first port of call, I should have mentioned that in my original post. @onurlap thank you very much :) I'm aware that there are unlikely to be any complete packages that can do everything required but I think the plan will be to assemble enough individual quantitative metrics to combine to allow a qualitative gauging of the overall alignment quality / identify specific things to change to improve the alignment. Thanks again, and any other recommendations will be much appreciated.

ADD COMMENT

Login before adding your answer.

Traffic: 1824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6