I am a student in a neuroscience lab that is interested in doing a project comparing differences in genes our lab has found to be important for behaviors involving song. As an assignment, i was given a task to align a specific gene, slit1, across the genomes of multiple species using Mafft. Having made my mafft alignment, I have imported it into canopy, using ALign io.read.
from Bio import AlignIO
align = AlignIO.read("beluga slit1 alignment cluster.txt", "clustal")
print(align)
Now, i would like to compare nucleotide differences between the species in my alignment, but am unsure as to what my next steps should be. I am assuming multiple pairwise alignments? Beyond this small project, I am interested in resources anyone might have both in comparative genomics and bioinformatics, as projects such as the one i described above are going to feature heavily in my future work, and I am a total novice when it comes to to computer programming and bioinformatics. I will likely be using python and R as my primary languages in this work. Thanks so much.
As an unrelated comment: it may work, but you should really avoid spaces in file names. It all works fine until it doesn't.
The Biopython tutorial and cookbook is a pretty decent resource.
What “comes next” depends what you want to know?
What, specifically is your question?
i am hoping to do multiple pairwise alignments of the sequences in my alignment to determine the nucleotide differences in this gene between any two species from my original alignment. to do this, how do i iterate through the multiple sequences such that each aligns with each other once?
I'm not really sure I follow. MAFFT has already given you a multiple sequence alignment right?
You just want to know all the pairwise alignment scores?
It might be easier to recreate the alignment in a multiple pairwise fashion than try to deconstruct the MSA. For example, you could adapt the code in this thread and replace the
pairwise2
alignments with systemcalls to MAFFT if you so desired.