Entering edit mode
2.7 years ago
Emy Alade
•
0
Hello! How can i calculate the percentage identity between a pair of sequences?
Tem1_SRR6260399.808938_1_250_- TTGATTTGGCATATGTGGACAATAAGACGACAAATGATTATCAGATTATTCGGGTTCCAGCTCTTGTTTAATTTTACATGGAGCATTTTCTTCTTTTATCTCCGGAGTCCGTTGCTAGGTTTTGCAAACATTTTGGTGCTGGATGTGCTTGTTGTTTATTATATGATAGAAAGTTATCCGGTGAAGAAATCTTCGGCATACCTTTTTGTTCCTTATCTTTTGTGGTTGATTCTTGCCACTTATCTTAAC Tem5_SRR6260399.418888_1_220_+ CTCCTTATGAAAAAAATAATCCCCATACTGATTGCCATACTCATCTGTTTTGGTGTAGGCTGTACTGCTTCTTATTTTCAGTCGGAGGCCATACTCAACTGGTATCCTACATTGGACAAACCTTCTCTTACACCACCTGATATGGCTTTTCCCATTGCTTGGAGCCTTATCTATCTGTGCATGGGAATTTCTCTCGGGTTGATTTGGCATATGTGG
Is this homework? (Do be honest, and the community will help.)
What have you tried already? Have the sequences been aligned? Are these the only sequences you need to deal with, or are there more?
I have 14 sequences that correspond to the TSPO protein found by HMM in a control condition. So I aligned them with Clustal and I saw that they align perfectly in the middle. The question is to know if it is the same protein. so I want to calculate the percentage of identity
Are you using
Clustal-Omega
? If yes, you can supply--distmat-out=<somefilename>
and--percent-id
to it on the command line (along with--full
), and it will output a pairwise distance matrix that reports percent identity. So something like this: