Question: Tools to enumerate amino acid mutations per site?
0
gravatar for A Soggy Waffle
22 months ago by
A Soggy Waffle0 wrote:

Hi all,

I could make a python script to do this for me but I'd rather be lazy and use a tool if it already exists.

What I want to do is enumerate the various amino acid mutations at each position of a multiple sequence alignment.

E.G. the first sequence is the reference.

INPUT

>seq1
AABA-DC

>seq2
BABBCDD

>seq3
-ABBADD

OUTPUT

Position 1: B,1;  -,1;

Position 4: B,2;

Position 5: C,1;  A, 1;

Position 6: D, 2;

Cheers,

A Soggy Waffle

sequence • 651 views
ADD COMMENTlink modified 22 months ago by Pierre Lindenbaum128k • written 22 months ago by A Soggy Waffle0
3
gravatar for Pierre Lindenbaum
22 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

$ java -jar dist/bioalcidaejdk.jar -e 'List<String> seqs=stream().map(S->S.toString()).collect(Collectors.toList()); final String ref=seqs.get(0);for(int pos=0;pos< ref.length();++pos) {Map<Character,Integer> count=new HashMap<>(); for(int x=1;x< seqs.size();++x) {char c1=ref.charAt(pos);char c2=seqs.get(x).charAt(pos); if(c1==c2) continue; count.put(c2,1+count.getOrDefault(c2,0)); } if(count.isEmpty()) continue; System.out.println("Position "+(pos+1)+":"+count); }  ' in.fasta

Position 1:{B=1, -=1}
Position 4:{B=2}
Position 5:{A=1, C=1}
Position 7:{D=2}
ADD COMMENTlink written 22 months ago by Pierre Lindenbaum128k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1089 users visited in the last hour