Question

Program to identify mutation betweens two sequences.

1

Entering edit mode

9.9 years ago

ganuongphap ▴ 40

Is there any programs or scripts that can give me a text file of mutation? The input files will be FASTA. They are short, contain only 100 amino acids and have change and double mutations.

For example I have two sequences:

AOEUIDHTN and

AOEUIIDHTS

I would like the output to be: I5II, N9S, etc in a file. Thanks in advance.

sequence fasta mutation program • 5.1k views

ADD COMMENT • link updated 9.6 years ago by Chrispin Chaguza ▴ 280 • written 9.9 years ago by ganuongphap ▴ 40

Ram · Answer 1 · 2014-06-13

4

Entering edit mode

9.9 years ago

Pierre Lindenbaum 161k

use diff :-D !

$ diff <(echo AOEUIDHTN| sed 's/\(.\)/\1\n/g') <(echo AOEUIIDHTS| sed 's/\(.\)/\1\n/g')
5a6
> I
9c10
< N
---
> S

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by Pierre Lindenbaum 161k

2

Entering edit mode

I would add tr '[:lower:]' '[:upper:]' to make sure all aminoacids are in the same case.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

What a clever answer, I have laught when I see this :D

However this way is no so intuitive. Because I have to extract the sequences from the FASTA files by another command and count the mutations by myself. A program should be better. Anyway, thanks for your help :D.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 9.9 years ago by ganuongphap ▴ 40

Ram · Answer 2 · 2014-06-13

If your sequences are short and you only want to find simple differences (e.g. no gaps, etc..), you can use the tool is diffseq from the EMBOSS package.

If you need something more accurate, you should use a sequence alignment tool. A tool that works from the command line and is easy to set up is exonerate.

I am not sure about how you can get the output you requested with exonerate, but there is probably a way. Put your sequences in to different files:

echo '>seq1\nACKKAKCAKCAIKCAKCKACNGHSCKAAEUIIDHTN' > seq1.fasta
echo '>seq2\nACKKAKCAKCAIIKCAKCKACNGHSKAAEUIIDHTN' > seq2.fasta

You can also put more than a sequence in the same file, if it helps you organize the files. Then, run exonerate with the following:

$: exonerate -q seq1 -t seq2 --showsugar --showcigar -n 1 -m affine:global --exhaustive

Command line: [exonerate -q seq1.fasta -t seq2.fasta --showsugar --showcigar -n 1 -m affine:global --exhaustive --showvulgar]
Hostname: [henikoff]
** (process:31120): WARNING **: Exhaustively generating suboptimal alignments will be VERY SLOW
C4 Alignment:
------------
         Query: seq1
        Target: seq2
         Model: affine:global:protein2protein
     Raw score: 177
   Query range: 0 -&gt; 36
  Target range: 0 -&gt; 36

  1 : ACKKAKCAKCA-IKCAKCKACNGHSCKAAEUIIDHTN : 36
      ||||||||||| ||||||||||||| |||||||||||
  1 : ACKKAKCAKCAIIKCAKCKACNGHS-KAAEUIIDHTN : 36

sugar: seq1 0 36 . seq2 0 36 . 177
cigar: seq1 0 36 . seq2 0 36 . 177  M 11 D 1 M 13 I 1 M 11
vulgar: seq1 0 36 . seq2 0 36 . 177 M 11 11 G 0 1 M 13 13 G 1 0 M 11 11

Have a look at the --showcigar, --showvulgar, and --showsugar options, and specially at the --ryo option for more output options and their explanation.

score 1 · Answer 3 · 2014-09-16

1

Entering edit mode

9.6 years ago

Chrispin Chaguza ▴ 280

Check out this tool developed by folks at the Sanger Institute https://github.com/sanger-pathogens/snp_sites. This program finds snp/mutation sites from a multi fasta alignment file (at least two aligned sequences).

ADD COMMENT • link 9.6 years ago by Chrispin Chaguza ▴ 280