Question

Tools to enumerate amino acid mutations per site?

0

Entering edit mode

6.1 years ago

A Soggy Waffle • 0

Hi all,

I could make a python script to do this for me but I'd rather be lazy and use a tool if it already exists.

What I want to do is enumerate the various amino acid mutations at each position of a multiple sequence alignment.

E.G. the first sequence is the reference.

INPUT

>seq1
AABA-DC

>seq2
BABBCDD

>seq3
-ABBADD

OUTPUT

Position 1: B,1;  -,1;

Position 4: B,2;

Position 5: C,1;  A, 1;

Position 6: D, 2;

Cheers,

A Soggy Waffle

sequence • 1.7k views

ADD COMMENT • link updated 6.1 years ago by Pierre Lindenbaum 163k • written 6.1 years ago by A Soggy Waffle • 0

score 3 · Accepted Answer · 2018-07-13

using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

$ java -jar dist/bioalcidaejdk.jar -e 'List<String> seqs=stream().map(S->S.toString()).collect(Collectors.toList()); final String ref=seqs.get(0);for(int pos=0;pos< ref.length();++pos) {Map<Character,Integer> count=new HashMap<>(); for(int x=1;x< seqs.size();++x) {char c1=ref.charAt(pos);char c2=seqs.get(x).charAt(pos); if(c1==c2) continue; count.put(c2,1+count.getOrDefault(c2,0)); } if(count.isEmpty()) continue; System.out.println("Position "+(pos+1)+":"+count); }  ' in.fasta

Position 1:{B=1, -=1}
Position 4:{B=2}
Position 5:{A=1, C=1}
Position 7:{D=2}