Program to identify mutation betweens two sequences.
3
1
Entering edit mode
7.3 years ago
ganuongphap ▴ 40

Is there any programs or scripts that can give me a text file of mutation? The input files will be FASTA. They are short, contain only 100 amino acids and have change and double mutations.

For example I have two sequences:

AOEUIDHTN and

AOEUIIDHTS

I would like the output to be: I5II, N9S, etc in a file. Thanks in advance.

sequence fasta mutation program • 4.3k views
4
Entering edit mode
7.3 years ago

use diff :-D !

diff <(echo AOEUIDHTN| sed 's/$$.$$/\1\n/g') <(echo AOEUIIDHTS| sed 's/$$.$$/\1\n/g') 5a6 > I 9c10 < N --- > S  ADD COMMENT 2 Entering edit mode I would add tr '[:lower:]' '[:upper:]' to make sure all aminoacids are in the same case. ADD REPLY 0 Entering edit mode What a clever answer, I have laught when I see this :D However this way is no so intuitive. Because I have to extract the sequences from the FASTA files by another command and count the mutations by myself. A program should be better. Anyway, thanks for your help :D. ADD REPLY 1 Entering edit mode 7.3 years ago If your sequences are short and you only want to find simple differences (e.g. no gaps, etc..), you can use the tool is diffseq from the EMBOSS package. If you need something more accurate, you should use a sequence alignment tool. A tool that works from the command line and is easy to set up is exonerate. I am not sure about how you can get the output you requested with exonerate, but there is probably a way. Put your sequences in to different files: echo '>seq1\nACKKAKCAKCAIKCAKCKACNGHSCKAAEUIIDHTN' > seq1.fasta echo '>seq2\nACKKAKCAKCAIIKCAKCKACNGHSKAAEUIIDHTN' > seq2.fasta  You can also put more than a sequence in the same file, if it helps you organize the files. Then, run exonerate with the following: : exonerate -q seq1 -t seq2 --showsugar --showcigar -n 1 -m affine:global --exhaustive

Command line: [exonerate -q seq1.fasta -t seq2.fasta --showsugar --showcigar -n 1 -m affine:global --exhaustive --showvulgar]
Hostname: [henikoff]
** (process:31120): WARNING **: Exhaustively generating suboptimal alignments will be VERY SLOW
C4 Alignment:
------------
Query: seq1
Target: seq2
Model: affine:global:protein2protein
Raw score: 177
Query range: 0 -&gt; 36
Target range: 0 -&gt; 36

1 : ACKKAKCAKCA-IKCAKCKACNGHSCKAAEUIIDHTN : 36
||||||||||| ||||||||||||| |||||||||||
1 : ACKKAKCAKCAIIKCAKCKACNGHS-KAAEUIIDHTN : 36

sugar: seq1 0 36 . seq2 0 36 . 177
cigar: seq1 0 36 . seq2 0 36 . 177  M 11 D 1 M 13 I 1 M 11
vulgar: seq1 0 36 . seq2 0 36 . 177 M 11 11 G 0 1 M 13 13 G 1 0 M 11 11


Have a look at the --showcigar, --showvulgar, and --showsugar options, and specially at the --ryo option for more output options and their explanation.

0
Entering edit mode

My sequences only have 100 residues, are they short? Also, they only have change and double mutation, are they simple?

The diffseq seems that the one I'm looking for, but it doesn't give me the desired result. I would like it to give me the answer looks like "N2C" or "I44II".

Thank you so much.

0
Entering edit mode

Then go with exonerate directly. I've updated the post with an example of how you can use it.

0
Entering edit mode
1
Entering edit mode
7.1 years ago

Check out this tool developed by folks at the Sanger Institute https://github.com/sanger-pathogens/snp_sites. This program finds snp/mutation sites from a multi fasta alignment file (at least two aligned sequences).