how to find and replace the atypical aas in a protein fasta-file, remembering their protein position
2
0
Entering edit mode
6.3 years ago
natasha.sernova ★ 3.8k

Dear ALL,

I am trying to replace atypical aminoacid  (one letter symbol like U), that is either leucine or isoleucine.

I have a long bacterial protein fasta-file with the single line headers and single line seqs.

I need to produce the modified fasta-file with the replaced atypical aminoacids and an additional file saying where they where situated before the substitution. I have a very strange result - probably I made a mistake in curly brace positions.

I’am looking at this file for too long, I’ve not seen any obvious mistake.

Please, help me to find it! Thank you very much!

Sincerely yours,

Natasha

#!/bin/env perl
use strict;
use warnings;
my $file_name1; my$file_name2;
my $file_name3; my$file_name4;
my $line; my @seq_line; my @string;$file_name1 = $ARGV[0];$file_name2 = $ARGV[1];$file_name3 = $ARGV[2];$file_name4 = $ARGV[3]; open(IN1, "<".$file_name2) or die; # fasta-file with lines as headers and protein seqs
open(OUT1, ">".$file_name3) or die; # my file headers and seqpositions of the wrong aa symb open(OUT2, ">".$file_name4) or die; #modified fasta –file - programs do not like atypical aas.

while(my $line=<IN1>) { next if$line =~ /^\s$/; # skip empty line # next if$line =~ /^>/;

#I have to print this line to my output file

if($line =~ /^>/) { print OUT1$line;

#                                print OUT2 $line."\n"; # next;} # split the whole seq into the aminoacids. No new-line symbol my @seq_line = split(//,$line);

for(my $i=0;$i< scalar(@seq_line); $i++) { if ($seq_line[$i] eq "U"){$line=~s/U/L/;

print OUT1 "position".$i." .atypical.protsymbol"."_".$file_name3."\n";

}

elsif ($seq_line[$i] eq "u"){

$line=~s/u/l/; print OUT1 "position ".$i." .atypical.protsymbol"."_".$file_name3."\n"; } } print OUT2$line."\n";

}

next;

}

close(IN1);

close(OUT1);

close(OUT2);

protein fasta perl • 1.6k views
3
Entering edit mode
6.3 years ago
dylan.storey ▴ 60

The fastest way to do this would be to use the unix tool sed

If you're actually interested in the position you'll need to use regex captures + pos or a special internal variable something along the lines of ...  go look at

http://perldoc.perl.org/functions/pos.html

http://perldoc.perl.org/perlvar.html

#Find all the positions
while ( $sequence =~/(U)/g){ push (@positions , pos($sequence))
}

#replace all U with L
$seqeunces =~ tr/U/L/; OR: while ($sequence =~/U/g){
push (@positions ,$-[0]); save begining of match push (@positions ,$+[0]); # save end of match
)
#replace all U with L
$seqeunces =~ tr/U/L/; ADD COMMENT 0 Entering edit mode Thank you very much, I will try. The position is the main point - I won't be able to find the particular L without it. And it will give me a respective header, am I right? ADD REPLY 2 Entering edit mode 6.3 years ago 5heikki 9.8k This replaces all "U" with "I" outside headers: awk '{if(/^>/)print$0; else {gsub("U","I"); print $0}}' file.fasta Whereas this prints all positions (1-based) of "U" outside headers (I googled this one so I'm not certain, but it looks ok): awk '{if(/^>/) print$0; else {s=\$0; i=1; idx=0; while(i>0){i=match(s, /U/); if(i>0) {idx += i;print idx;s=substr(s, i+1);}}}}' file.fasta

Also, how come there's no gmatch (like sub/gsub)?

0
Entering edit mode

Thank you very much! So, I have to run the second expression first, and then, knowing all existing positions of the atypical leucines in the file, replace them to the normal leucines with the first expression, right? Have I understood the idea?

1
Entering edit mode

exactly. This second expression is required if you want to know the positions without replacing it to normal LEU, otherwise the simple sed or awk expressions are enough to replace U with L.

1
Entering edit mode

Yeah, note that neither edits the files in place but instead you have to direct the output with > newFile

0
Entering edit mode

you can also use the -i flag with sed if you're feeling confident.

0
Entering edit mode

The sed command edits also headers though..

0
Entering edit mode

Thank you, it's important to remember.

0
Entering edit mode

I've run it, it's perfect! Thousand thanks for your help!