Creating your own substitution matrix from an alignment
0
0
Entering edit mode
3.4 years ago
mdsiddra ▴ 30

Hey there..! I am doing some codes using Biopython cookbook. My concern is with the topic of creating the substitution matrix using my own values of matches and mismatches for amino acis letters. Biopython code helps me creating the matrix but I don't find how to use my own values in the matrix. I hope to seek help regarding this problem.

substitution matrix • 2.3k views
1
Entering edit mode

Hello mdsiddra,

Please use appropriate tags. Your question is about substitution matrix. That should have been a tag when you created the question. When you add appropriate tags, users that follow the tag (usually experts interested in helping others in that subject matter) get notified of your question, and this means you stand a better chance at getting a relevant, useful response faster.

It will help if you can elaborate on the question and provide links to the the code you wrote or are referring to. you can put some links to external resources.

Elaborate more on below

I don't find how to use my own values in the matrix

Thanks!

1
Entering edit mode

Well, thankyou for your answer, I would try to elaborate my question.

I'm working on Python version 3 and I have installed Biopython (https://biopython.org/) for using its libraries. The tutorial says one can insert its matrix externally but I'm unable to link the matrix.

I am using SubsMat module for creating a substitution matrix. Chapter#20 from this book explains how to create your own substitution matrix. There are 2 issues: 1) The given example generates a short matrix but I want to generate a matrix for all 20 amino acids. 2) Moreover, I don't find how to insert my own values for substitution of amino acids. 3) In the code given below if any of the amino acids used in the following line is removed ,

"replace_info = summary_align.replacement_dictionary(["A", "V", "L",  "F", "S", "Q"])


it generates an error as follows: "ValueError: Residues A, X not found in alphabet Gapped(IUPACProtein(), '-')"

Here's the final code I'm using for generating the matrix for a protein alignment. (The code given below will work if Biopython is installed because the code below calls the modules specified in Biopython):

from Bio import AlignIO
from Bio import Alphabet
from Bio.Alphabet import IUPAC
from Bio.Align import AlignInfo
from Bio import SubsMat

alpha = Alphabet.Gapped(IUPAC.protein)

filename = "seq_file2.aln"
summary_align = AlignInfo.SummaryInfo(c_align)
replace_info = summary_align.replacement_dictionary(["A", "V", "L",
"F", "S", "Q"])

my_arm = SubsMat.SeqMat(replace_info)
my_lom = SubsMat.make_log_odds_matrix(my_arm)
my_lom.print_mat()

1
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

1
Entering edit mode

Hi mdsiddra, does this code really fail in the last block? This is close to impossible to reproduce without your exact input file seq_file2.aln. But you certainly are able to figure out the problem on your own. My first guess from the error message is that your input data contains illegal characters. I'd recommend to add some additional print statements to make the program more verbose.