Question: replacing words using perl script, excel as reference
0
gravatar for peacezah
2.8 years ago by
peacezah10
peacezah10 wrote:

I have this list, how do I replace sequence id from the old name (left column) to new name (right column). i have been thinking about the condition like this; if found word in column left, will then replace with word in column at the right. any idea by using perl script?

>NC_006351.1_00512 76172 76077  len=96
GTGACGCTGCCCGTCGGCGCCTTGTCGAAAGGCGCGAGCTTCGAAGTCGGCGCGCAGGTC
CAGCGGCCGACCGGCGCGCTGGCGTTGTTCGAGTAA
>NC_006351.1_00969 110672 110583  len=90
GTGTCGGCGAAAAACGACACGTTCTCGCGCCTCGGCAGCCGCGACGCGCACGAAGGCCGA
CAAAACACGCCGGTCGTCTTGACCGCGTAG
>NC_006351.1_01005 116974 117090  len=117
TTGCTCGTCGGGCGGATCATGCCGACGCCCGAAGCCGAATCCGAATCCGAATCCGAATCC
GACGCCGACGCCGAGGCGCAGAAGCGCTTCGCCGGGCTGCGCTACACGGGCACGTAA
 

NC_006351.1_512 BPSS_001
NC_006351.1_969 BPSS_002
NC_006351.1_005 BPSS_003
NC_006351.1_178 BPSS_004
perl script fasta • 1.2k views
ADD COMMENTlink modified 2.8 years ago by biolab950 • written 2.8 years ago by peacezah10
1

duplicate of:

replace fasta headers with another name in a text file

Change Fasta File Header

 

 

ADD REPLYlink written 2.8 years ago by Pierre Lindenbaum98k
1
gravatar for biolab
2.8 years ago by
biolab950
biolab950 wrote:

I give you an example, you can try the script. It works on my PC. If not working, please show  error message. Hope it ok.

FASTA File:

>N1   110672 110583  len=90
GTGTCGGCGAAAAACGACACGTTCTCGCGCCTCGGCAGCCGCGACGCGCACGAAGGCCGA
>N2   116974 117090  len=117
TTGCTCGTCGGGCGGATCATGCCGACGCCCGAAGCCGAATCCGAATCCGAATCCGAATCC

LIST File:

N1   BPSS1
N2   BPSS2

Script:

#!/usr/bin/perl
use strict;
use warnings;
print "Usage: perl $0 listfile fastafile\n" and exit unless $ARGV[1];

open my $list, '<', $ARGV[0];
my  %h = map { s/\r//; split /\s+/ } <$list>;
close $list;

open my $fasta, '<', $ARGV[1];
while(<$fasta>) {
    next if /^\s+$/;
    s/\r//;
    chomp;
    /^>(\S+)\s+(.+)/ ? print ">$h{$1} $2\n" : print "$_\n";
}
close $fasta;
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by biolab950
1

I like your approach, but the use of global variables is quite dangerous and is discouraged. You can type perldoc -f open to see recommendations for opening files (or see the web link), and I recommend putting use strict and use warnings in every script (those are enabled in recent versions of Perl, which will generate warnings from this code). 

ADD REPLYlink written 2.8 years ago by SES7.9k

Hi, SES, thanks for your suggestions.  I have modified the script.   I have a question about the disadvantage of bareword filehandle: if I use IN as filehandle, later on I accidentally use IN again, this will generate error.  Is the case the only disadvantage of bareword filehandle?  I suppose there exist other dangerous cases, could you briefly make an example?  Thanks for your comments.

ADD REPLYlink written 2.8 years ago by biolab950

The code is much improved, thanks! The main problem with bare file handles is they have global scope. If you use IN at the top of your script you will be able to use that file handle anywhere, and this can happen by accident. The problem with

open IN, $ARGV[0];

is that you didn't specify the mode, so it is possible to say print IN "some text\n"; and now your data is gone with no warnings! It is also a good practice to write or die ... after the open statement to make sure the file can be opened for reading or writing. A nice convenience to avoid writing this is to just put use autodie; at the top of your script and all these tests will be handled for you (here is a link about autodie).

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by SES7.9k

Thanks a lot SES. Your comment is really helpful!

ADD REPLYlink written 2.8 years ago by biolab950

Hey!thanks..it works but there`s still problem. the 

110672 110583  len=90

behind all ID were disappeared.

I just want to replace the id itself. not including the location and length..

by the way, thank you

ADD REPLYlink written 2.8 years ago by peacezah10

Hi peacezah, I have modified the script.

ADD REPLYlink written 2.8 years ago by biolab950
0
gravatar for PoGibas
2.8 years ago by
PoGibas4.7k
Vilnius
PoGibas4.7k wrote:

See use python to change the header of a fasta file based on a dictionary in another file by tangming2005.

You only need awk example:

awk -f foo.awk dict.dat user.dat

NR == FNR {
  rep[$1] = $2
  next
} 

{
    for (key in rep) {
      gsub(key, rep[key])
    }
    print
}
ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by PoGibas4.7k

it doesn`t help

 

ADD REPLYlink written 2.8 years ago by peacezah10

why? Can you tell what error it gives?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by PoGibas4.7k
0
gravatar for roy.granit
2.8 years ago by
roy.granit600
Israel
roy.granit600 wrote:

I would use the left column as hash keys and the right as the hash items. Then parse each row with a '>' ,separate using the spaces and make the replacement. 

ADD COMMENTlink written 2.8 years ago by roy.granit600
0
gravatar for biolab
2.8 years ago by
biolab950
biolab950 wrote:

I give a perl solution.  I noticed in fasta file one id is NC_006351.1_00512 76172 76077  len=96, whereas in following list file you change it to NC_006351.1_512.   Is it right?

#!/usr/bin/perl
open LIST, $ARGV[0];
%h = map { split /\s+/ } <LIST>;
close LIST;

open FASTA, $ARGV[1];
while(<FASTA>) {
    next if /^\s+$/;
    chomp;
    if (/^>(\S+\_)\S{2}(\S{3})\s+/) {
        $id = $1 . $2;
        print ">$h{$id}\n";
    } else {
        print "$_\n";
    }
}
close FASTA;

Usage: perl changeID.pl  listfile  fastafile

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by biolab950

Sorry. I edited the ID before posted it here. But what I wanted to do is still the same.

I did what have you suggest but surprisingly the whole id lost

ADD REPLYlink written 2.8 years ago by peacezah10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1556 users visited in the last hour