Hi Biostars community, I am trying to design a process and do not know how to solve it (at least in the way I want to do it). Let me explain myself...
I have a great dateset which include groups of homologous proteins, e.g., a certain protein X and its homologous for some other species (you could even think this group also as proteins from different individuals also). Well, for each group I know that there is a sequence pattern that exists in my protein X and I want to know if this pattern exists in the homologous. The most reasonable way to do it is first aligning the sequences. After doing the alignment, I could use a graphical interface to analyse it (using my eyes, interacting with the interface) to asses if the pattern is conserved or if it is not. It would be very nice if I could also annotate the informations from this analysis, for instance, "in the ith position from the pattern, there was a substitution that we must pay attention because probably it affects the function".
Now, the way I want to do it is in automatized way, because this process would be very time consumptive if I had 10 groups of sequences with, let's suppose 10 sequences in each of the groups and the pattern having a length L of 4 characters. The greater the number of sequences or L, the more I think this should be automatized. I am aware that this solution would have limitations and that at some point in this process I would have to do a critical analysis by my own and do some type of annotation. I really appreciate any suggestions, tools that can help me, papers, etc.
Hello! First of all, thank you for your comment! Indeed, jalview is a good tool to understand conservation. I have used it before, but reading my question again, I think I did not express what I was wanting to say completely. Let's suppose I want to automatize this more (besides all the loops you suggested): for each of these alignments, I want to search for certain regions and take the measures of conservation for instance (from position 5 to 10). I don't know if jalview provides this type of input (like a csv, txt file, where I could retrieve this information with a code). This could be awesome since jalview provides many other tools like secondary structure prediction. Anyway, is there a way to retrieve these numbers and avoid (at least for a first moment) to look to all the PNG files? (And I am sorry if this is possible and there is explanations in jalview documentation, I did note read yet).