Nucleotide Substitution Matrix With Iupac Nucleotide Ambiguity Codes
1
3
Entering edit mode
10.9 years ago
tommivat ▴ 250

I'm looking for a substitution matrix for aligning short DNA sequences using IUPAC nucleotide ambiguity codes. I would guess there are existing solutions but I haven't found any despite of extensive googling.

dna alignment • 7.6k views
ADD COMMENT
4
Entering edit mode
10.9 years ago

ftp://ftp.ncbi.nih.gov/blast/matrices/NUC.4.4

#
# This matrix was created by Todd Lowe   12/10/92
#
# Uses ambiguous nucleotide codes, probabilities rounded to
#  nearest integer
#
# Lowest score = -4, Highest score = 5
#
    A   T   G   C   S   W   R   Y   K   M   B   V   H   D   N
A   5  -4  -4  -4  -4   1   1  -4  -4   1  -4  -1  -1  -1  -2
T  -4   5  -4  -4  -4   1  -4   1   1  -4  -1  -4  -1  -1  -2
G  -4  -4   5  -4   1  -4   1  -4   1  -4  -1  -1  -4  -1  -2
C  -4  -4  -4   5   1  -4  -4   1  -4   1  -1  -1  -1  -4  -2
S  -4  -4   1   1  -1  -4  -2  -2  -2  -2  -1  -1  -3  -3  -1
W   1   1  -4  -4  -4  -1  -2  -2  -2  -2  -3  -3  -1  -1  -1
R   1  -4   1  -4  -2  -2  -1  -4  -2  -2  -3  -1  -3  -1  -1
Y  -4   1  -4   1  -2  -2  -4  -1  -2  -2  -1  -3  -1  -3  -1
K  -4   1   1  -4  -2  -2  -2  -2  -1  -4  -1  -3  -3  -1  -1
M   1  -4  -4   1  -2  -2  -2  -2  -4  -1  -3  -1  -1  -3  -1
B  -4  -1  -1  -1  -1  -3  -3  -1  -1  -3  -1  -2  -2  -2  -1
V  -1  -4  -1  -1  -1  -3  -1  -3  -3  -1  -2  -1  -2  -2  -1
H  -1  -1  -4  -1  -3  -1  -3  -1  -3  -1  -2  -2  -1  -2  -1  
D  -1  -1  -1  -4  -3  -1  -1  -3  -1  -3  -2  -2  -2  -1  -1
N  -2  -2  -2  -2  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1  -1
ADD COMMENT
3
Entering edit mode

Just beware that this matrix derived from the fasta aligner. It is for distant homology searches. For intra-species alignment, the mismatch penalty is higher than a matching score.

ADD REPLY
0
Entering edit mode

Is it possible to change the penalties to reflect matches among the IUPACs? If so how?
It penalizes B (C,G,T) , D(A,G,T), H(A,C,T), V(A,C,G) negatively (negative score with all other NTs and ambiguous codes & self) and will be represented as mismatch in the alignment with their respective NTs & itself. For e.g. B will be a mismatch with B and B will be a mismatch with C/G/T.

ADD REPLY
0
Entering edit mode

Thanks for the spot on answer! My task is aligning human normal - tumor fragments. What kind of penalty would you suggest for opening end extending a gap?

ADD REPLY
0
Entering edit mode

I just found this answer on EMBOSS mailing list: "NUC4.2 (EDNAMAT) simply scores 5 for a match, and -4 for a mismatch. NUC4.4 (EDNAFULL) scores 5 for a match, but provides appropriate scores for ambiguity codes so that, for example, R:A scores +1 (rounded up average of -4, -4, 5, 5)". These two matrices are handled by the program "water" from EMBOSS, also available online. About defining gap penalties, this book should help: Durbin, R., Eddy, S. R., Krogh, A. & Mitchison, G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, 1998). URL http://www.worldcat.org/isbn/0521629713.

ADD REPLY

Login before adding your answer.

Traffic: 1531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6