Entering edit mode
6.7 years ago
mdsiddra
▴
30
I am using python 3 and biopython 1.72 and I have a protein sequence file in phylip format. This is the example file:
6 200
Human ---------- ---------- ---------- ---------- ----------
Chimpanzee ---------- ---------- ---------- ---------- ----------
Dog ---------- ---------- ---------- ---------- ----------
Mouse ---------- ---------- ---------- ---------- ----------
Xenopus ---------- ---------- ---------- ---------- ----------
Amphioxus MQWTGFRVSM TTLMMIMGVV AVLIALLPAK AQQPHDKSLR TTSTLTDTGA
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
---------- ---------- ---------- ---------- ----------
SADEADMGSA HVELLDGDDD VGNGSDQMMV TLHLQSIFQC IRRPCEKVDR
---------- ---------- ---------- --------MR LRVRLLKRTW
---------- ---------- ---------- --------MR LRVRLLKRTW
---------- ---------- ---------- --------MK LRVRLQKRTW
---------- ---------- ---------- --------MK LRVRLQKRTQ
---------- ---------- ---------- --------MK LRVRVRKQTN
AIDPVTQRWR TANTRNDYQK INVCVVPAYD VSLSTGVRMK LRVKISGQKT
PLEVPETEPT LGHLRSHLRQ SLLCTWGYSS NTRFTITLNY KDPLTGDEET
PLEVPETEPT LGHLRSRLRQ SLLCTWGYSS NTRFTITLNY KDPLTGDEET
PLDLPDAEPT LGQLRAHLSQ ALLPSWGFGS DTRFAITLNN KDALTGDEET
PLEVPESEPT LGQLRAHLSQ VLLPTLGFSS DTRFAITLNN KDALTGDEET
RLELEAESPT LGDLRSKLSS VTLPALGYST EANFTITLNG KDALTGDQNT
RVDVGQDCHT LGTLRTLLAP VLGEQYGLGD DMPFEISLNG RDALLGDDKP
I want to navigate through the file in a way that ;
- If the columns of the file have same amino acid, it saves and prints the amino acid that exists in the column and puts a
'#'
in the file at the end of that column. - I need to know how can I search the columns by using the amino acids, if particular set of amino acids exist in that column or not. For example, if
'STA'
or'HY'
or'FVLIM'
or'NDEQHK'
exists in any of the columns, then put a'@'
in the file at the end of that column.
Following is the code I have been trying to manipulate the file with:
alignment = AlignIO.read(open("example.phy"), "phylip")
v1 = v2 = v3 = 0
for col in zip(*alignment):
num_unique = len(set(col))
if num_unique == 1 and col[0] != '-':
print (num_unique)
v1 += 1
elif num_unique > 1 and '-' not in col:
print (num_unique)
v2 += 1
elif '-' in col: # assumes 1 or more dashes
v3 += 1
print('Number of columns with the same amino acid: {}\n'
'Number of columns with at least 2 amino acids (no gaps): {}\n'
'Number of columns with one/more gaps: {}'
.format(v1, v2, v3))
These variables return the occurrences of amino acids in the columns of the file but I don't understand how to search using the amino acids. Moreover how can I put a character of my choice in a desired column of the file ???