Entering edit mode
                    8.7 years ago
        slimane.khayi
        
    
        ▴
    
    80
    Dear colleagues, I have tab-limited file in this format
#CHROM                  POS REF  a1  a2    b1   b2   c1  c2
NW_008246507.1  16  T   C/C C/C T/T C/C C/C T/C
NW_008246507.1  1624    A   C/C C/C C/C C/C C/C C/C
NW_008246507.1  1656    C   T/T T/T T/T T/T T/T T/T
NW_008246507.1  1666    C   T/T T/T T/T T/T T/T T/T
NW_008246507.1  1679    C   T/T T/T T/T T/T T/T T/T
NW_008246507.1  1681    G   A/A A/A A/A A/A A/A A/A
NW_008246507.1  1682    T   A/A A/A A/A A/A A/A A/A
NW_008246507.1  1695    T   C/C C/C C/C C/C C/C C/C
I want to identify the unique SNPs for each species a, b, c (not strain a1, a2, b1..),have you any python script or any idea to do this job, I am not familiar within scripting languages. Thank you in advance for your help. Sincerely.
Could you please clarify what is a unique SNP in your example data, I find it difficult to see (and I think you are missing a couple of line breaks, as there are currently two positions per line...) -- Would T/C for species c at NW_008246507.1:16 be what you are looking for?