Program to identify mutation betweens two sequences.
3
1
Entering edit mode
7.5 years ago
ganuongphap ▴ 40

Is there any programs or scripts that can give me a text file of mutation? The input files will be FASTA. They are short, contain only 100 amino acids and have change and double mutations.

For example I have two sequences:

AOEUIDHTN and

AOEUIIDHTS

I would like the output to be: I5II, N9S, etc in a file. Thanks in advance.

sequence fasta mutation program • 4.3k views
ADD COMMENT
4
Entering edit mode
7.5 years ago

use diff :-D !

$ diff <(echo AOEUIDHTN| sed 's/\(.\)/\1\n/g') <(echo AOEUIIDHTS| sed 's/\(.\)/\1\n/g')
5a6
> I
9c10
< N
---
> S
ADD COMMENT
2
Entering edit mode

I would add tr '[:lower:]' '[:upper:]' to make sure all aminoacids are in the same case.

ADD REPLY
0
Entering edit mode

What a clever answer, I have laught when I see this :D

However this way is no so intuitive. Because I have to extract the sequences from the FASTA files by another command and count the mutations by myself. A program should be better. Anyway, thanks for your help :D.

ADD REPLY
1
Entering edit mode
7.5 years ago

If your sequences are short and you only want to find simple differences (e.g. no gaps, etc..), you can use the tool is diffseq from the EMBOSS package.

If you need something more accurate, you should use a sequence alignment tool. A tool that works from the command line and is easy to set up is exonerate.

I am not sure about how you can get the output you requested with exonerate, but there is probably a way. Put your sequences in to different files:

echo '>seq1\nACKKAKCAKCAIKCAKCKACNGHSCKAAEUIIDHTN' > seq1.fasta
echo '>seq2\nACKKAKCAKCAIIKCAKCKACNGHSKAAEUIIDHTN' > seq2.fasta

You can also put more than a sequence in the same file, if it helps you organize the files. Then, run exonerate with the following:

$: exonerate -q seq1 -t seq2 --showsugar --showcigar -n 1 -m affine:global --exhaustive

Command line: [exonerate -q seq1.fasta -t seq2.fasta --showsugar --showcigar -n 1 -m affine:global --exhaustive --showvulgar]
Hostname: [henikoff]
** (process:31120): WARNING **: Exhaustively generating suboptimal alignments will be VERY SLOW
C4 Alignment:
------------
         Query: seq1
        Target: seq2
         Model: affine:global:protein2protein
     Raw score: 177
   Query range: 0 -&gt; 36
  Target range: 0 -&gt; 36

  1 : ACKKAKCAKCA-IKCAKCKACNGHSCKAAEUIIDHTN : 36
      ||||||||||| ||||||||||||| |||||||||||
  1 : ACKKAKCAKCAIIKCAKCKACNGHS-KAAEUIIDHTN : 36

sugar: seq1 0 36 . seq2 0 36 . 177
cigar: seq1 0 36 . seq2 0 36 . 177  M 11 D 1 M 13 I 1 M 11
vulgar: seq1 0 36 . seq2 0 36 . 177 M 11 11 G 0 1 M 13 13 G 1 0 M 11 11

Have a look at the --showcigar, --showvulgar, and --showsugar options, and specially at the --ryo option for more output options and their explanation.

ADD COMMENT
0
Entering edit mode

My sequences only have 100 residues, are they short? Also, they only have change and double mutation, are they simple?

The diffseq seems that the one I'm looking for, but it doesn't give me the desired result. I would like it to give me the answer looks like "N2C" or "I44II".

Thank you so much.

ADD REPLY
0
Entering edit mode

Then go with exonerate directly. I've updated the post with an example of how you can use it.

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
7.2 years ago

Check out this tool developed by folks at the Sanger Institute https://github.com/sanger-pathogens/snp_sites. This program finds snp/mutation sites from a multi fasta alignment file (at least two aligned sequences).

ADD COMMENT

Login before adding your answer.

Traffic: 2384 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6