Closed:How to handle the phylip sequence file through amino acid characters?
0
0
Entering edit mode
6.7 years ago
mdsiddra ▴ 30

I am using python 3 and biopython 1.72 and I have a protein sequence file in phylip format. This is the example file:

     6    200
Human      ---------- ---------- ---------- ---------- ---------- 
Chimpanzee ---------- ---------- ---------- ---------- ---------- 
Dog        ---------- ---------- ---------- ---------- ---------- 
Mouse      ---------- ---------- ---------- ---------- ---------- 
Xenopus    ---------- ---------- ---------- ---------- ---------- 
Amphioxus  MQWTGFRVSM TTLMMIMGVV AVLIALLPAK AQQPHDKSLR TTSTLTDTGA 

           ---------- ---------- ---------- ---------- ---------- 
           ---------- ---------- ---------- ---------- ---------- 
           ---------- ---------- ---------- ---------- ---------- 
           ---------- ---------- ---------- ---------- ---------- 
           ---------- ---------- ---------- ---------- ---------- 
           SADEADMGSA HVELLDGDDD VGNGSDQMMV TLHLQSIFQC IRRPCEKVDR 

           ---------- ---------- ---------- --------MR LRVRLLKRTW 
           ---------- ---------- ---------- --------MR LRVRLLKRTW 
           ---------- ---------- ---------- --------MK LRVRLQKRTW 
           ---------- ---------- ---------- --------MK LRVRLQKRTQ 
           ---------- ---------- ---------- --------MK LRVRVRKQTN 
           AIDPVTQRWR TANTRNDYQK INVCVVPAYD VSLSTGVRMK LRVKISGQKT 

           PLEVPETEPT LGHLRSHLRQ SLLCTWGYSS NTRFTITLNY KDPLTGDEET 
           PLEVPETEPT LGHLRSRLRQ SLLCTWGYSS NTRFTITLNY KDPLTGDEET 
           PLDLPDAEPT LGQLRAHLSQ ALLPSWGFGS DTRFAITLNN KDALTGDEET 
           PLEVPESEPT LGQLRAHLSQ VLLPTLGFSS DTRFAITLNN KDALTGDEET 
           RLELEAESPT LGDLRSKLSS VTLPALGYST EANFTITLNG KDALTGDQNT 
           RVDVGQDCHT LGTLRTLLAP VLGEQYGLGD DMPFEISLNG RDALLGDDKP

I want to navigate through the file in a way that ;

  1. If the columns of the file have same amino acid, it saves and prints the amino acid that exists in the column and puts a '#' in the file at the end of that column.
  2. I need to know how can I search the columns by using the amino acids, if particular set of amino acids exist in that column or not. For example, if 'STA'or 'HY' or 'FVLIM' or 'NDEQHK' exists in any of the columns, then put a '@' in the file at the end of that column.

Following is the code I have been trying to manipulate the file with:

alignment = AlignIO.read(open("example.phy"), "phylip")
v1 = v2 = v3 = 0
for col in zip(*alignment):
    num_unique = len(set(col))
    if num_unique == 1 and col[0] != '-':
        print (num_unique)
        v1 += 1

    elif num_unique > 1 and '-' not in col:
        print (num_unique)
        v2 += 1
    elif '-' in col:  # assumes 1 or more dashes
        v3 += 1

print('Number of columns with the same amino acid: {}\n'
      'Number of columns with at least 2 amino acids (no gaps): {}\n'
      'Number of columns with one/more gaps: {}'
       .format(v1, v2, v3))

These variables return the occurrences of amino acids in the columns of the file but I don't understand how to search using the amino acids. Moreover how can I put a character of my choice in a desired column of the file ???

python biopython • 251 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 9363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6