Tools to enumerate amino acid mutations per site?
1
0
Entering edit mode
5.8 years ago

Hi all,

I could make a python script to do this for me but I'd rather be lazy and use a tool if it already exists.

What I want to do is enumerate the various amino acid mutations at each position of a multiple sequence alignment.

E.G. the first sequence is the reference.

INPUT

>seq1
AABA-DC

>seq2
BABBCDD

>seq3
-ABBADD

OUTPUT

Position 1: B,1;  -,1;

Position 4: B,2;

Position 5: C,1;  A, 1;

Position 6: D, 2;

Cheers,

A Soggy Waffle

sequence • 1.6k views
ADD COMMENT
3
Entering edit mode
5.8 years ago

using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

$ java -jar dist/bioalcidaejdk.jar -e 'List<String> seqs=stream().map(S->S.toString()).collect(Collectors.toList()); final String ref=seqs.get(0);for(int pos=0;pos< ref.length();++pos) {Map<Character,Integer> count=new HashMap<>(); for(int x=1;x< seqs.size();++x) {char c1=ref.charAt(pos);char c2=seqs.get(x).charAt(pos); if(c1==c2) continue; count.put(c2,1+count.getOrDefault(c2,0)); } if(count.isEmpty()) continue; System.out.println("Position "+(pos+1)+":"+count); }  ' in.fasta

Position 1:{B=1, -=1}
Position 4:{B=2}
Position 5:{A=1, C=1}
Position 7:{D=2}
ADD COMMENT

Login before adding your answer.

Traffic: 2828 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6