How to get variation data from MSA file generated using ClustalW for protein sequences?
0
0
Entering edit mode
3.3 years ago
#!/usr/bin/env python3  (conserved amino acid position)

from Bio import AlignIO
import sys
#data = np.genfromtxt("/home/tina/bin/MDR_aminoglycoside.csv", delimiter=",")
aln = "/home/tina/cd-hit/core_header_WP_000209090.1/seqret_out_WP_000209090.1.aln"
print("Alignment length %i" % alignment.get_alignment_length()) #print alignment length

algnmnt = AlignIO.read(open(aln, 'r'), 'clustal')

for col in range(0, algnmnt.get_alignment_length()):
    proteins = set(algnmnt[:,col])
    if len(proteins) == 1:
        print(f"Postion {col}: protein {''.join(proteins)}") #print position of each match

I have written this code which reads clustal.aln file and give number of conserved amino acid with their position and alignment length. Can anybody tell me how may I print the variation sites (position) too with number of variation for each amino acids?

clustalw MSA variation • 777 views
ADD COMMENT

Login before adding your answer.

Traffic: 1968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6