Annotating programmatically mutations/substitutions for many sequence alignments
1
0
Entering edit mode
2.3 years ago
Agenor Neto ▴ 10

Hi Biostars community, I am trying to design a process and do not know how to solve it (at least in the way I want to do it). Let me explain myself...

I have a great dateset which include groups of homologous proteins, e.g., a certain protein X and its homologous for some other species (you could even think this group also as proteins from different individuals also). Well, for each group I know that there is a sequence pattern that exists in my protein X and I want to know if this pattern exists in the homologous. The most reasonable way to do it is first aligning the sequences. After doing the alignment, I could use a graphical interface to analyse it (using my eyes, interacting with the interface) to asses if the pattern is conserved or if it is not. It would be very nice if I could also annotate the informations from this analysis, for instance, "in the ith position from the pattern, there was a substitution that we must pay attention because probably it affects the function".

Now, the way I want to do it is in automatized way, because this process would be very time consumptive if I had 10 groups of sequences with, let's suppose 10 sequences in each of the groups and the pattern having a length L of 4 characters. The greater the number of sequences or L, the more I think this should be automatized. I am aware that this solution would have limitations and that at some point in this process I would have to do a critical analysis by my own and do some type of annotation. I really appreciate any suggestions, tools that can help me, papers, etc.

mutation substitution sequence annotation alignment • 664 views
ADD COMMENT
0
Entering edit mode
2.3 years ago
Mensur Dlakic ★ 27k

What you want to do is fairly trivial, provided you have enough knowledge and the right tools available. In fact, people around the world do it all the time.

Out of literally hundreds of other options, let's go with these two tools for sequence alignment and its visualization, respectively:

I will leave it to you to read up on what the programs are doing and how to use them.

Assuming your sequences are saved in FASTA format, in file myseq.fas. There are only two commands needed to align them and to show the conservation patterns in that alignment:

clustalo -i myseq.fas -o myseq.a2m
java -Xmx2G -jar jalview-all-2.11.1.4-j1.8.jar -open myseq.a2m -png myseq.png -nodisplay -noquestionnaire -nonews

This will produce a myseq.png output similar to the one below. Below you will not be able to see much since the image is downsized, so you may need to right-hand click inside the image, open it in the new tab, and then click on magnification lens in that new tab to see the full size image. Anyway, Jalview will color identical (or sometimes similar) residues in such a way that it is easy to see which positions are not conserved. You may also want to look at the bottom rows labeled Conservation and Quality, where higher bars mean more conservation.

Running the same two commands for any number of sequences is simply a matter of writing a short looping script that will process all the sequences in a given directory.

enter image description here

ADD COMMENT
0
Entering edit mode

Hello! First of all, thank you for your comment! Indeed, jalview is a good tool to understand conservation. I have used it before, but reading my question again, I think I did not express what I was wanting to say completely. Let's suppose I want to automatize this more (besides all the loops you suggested): for each of these alignments, I want to search for certain regions and take the measures of conservation for instance (from position 5 to 10). I don't know if jalview provides this type of input (like a csv, txt file, where I could retrieve this information with a code). This could be awesome since jalview provides many other tools like secondary structure prediction. Anyway, is there a way to retrieve these numbers and avoid (at least for a first moment) to look to all the PNG files? (And I am sorry if this is possible and there is explanations in jalview documentation, I did note read yet).

ADD REPLY

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6