replacing words using perl script, excel as reference
4
0
Entering edit mode
9.4 years ago
peacezah ▴ 10

I have this list, how do I replace sequence id from the old name (left column) to new name (right column). I have been thinking about the condition like this; if found word in column left, will then replace with word in column at the right. any idea by using perl script?

>NC_006351.1_00512 76172 76077  len=96
GTGACGCTGCCCGTCGGCGCCTTGTCGAAAGGCGCGAGCTTCGAAGTCGGCGCGCAGGTC
CAGCGGCCGACCGGCGCGCTGGCGTTGTTCGAGTAA
>NC_006351.1_00969 110672 110583  len=90
GTGTCGGCGAAAAACGACACGTTCTCGCGCCTCGGCAGCCGCGACGCGCACGAAGGCCGA
CAAAACACGCCGGTCGTCTTGACCGCGTAG
>NC_006351.1_01005 116974 117090  len=117
TTGCTCGTCGGGCGGATCATGCCGACGCCCGAAGCCGAATCCGAATCCGAATCCGAATCC
GACGCCGACGCCGAGGCGCAGAAGCGCTTCGCCGGGCTGCGCTACACGGGCACGTAA
NC_006351.1_512     BPSS_001
NC_006351.1_969     BPSS_002
NC_006351.1_005     BPSS_003
NC_006351.1_178     BPSS_004
perl fasta • 4.0k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
1
Entering edit mode
9.4 years ago
biolab ★ 1.4k

I give you an example, you can try the script. It works on my PC. If not working, please show error message. Hope it ok.

FASTA File:

>N1   110672 110583  len=90
GTGTCGGCGAAAAACGACACGTTCTCGCGCCTCGGCAGCCGCGACGCGCACGAAGGCCGA
>N2   116974 117090  len=117
TTGCTCGTCGGGCGGATCATGCCGACGCCCGAAGCCGAATCCGAATCCGAATCCGAATCC

LIST File:

N1   BPSS1
N2   BPSS2

Script:

#!/usr/bin/perl
use strict;
use warnings;
print "Usage: perl $0 listfile fastafile\n" and exit unless $ARGV[1];

open my $list, '<', $ARGV[0];
my  %h = map { s/\r//; split /\s+/ } <$list>;
close $list;

open my $fasta, '<', $ARGV[1];
while(<$fasta>) {
    next if /^\s+$/;
    s/\r//;
    chomp;
    /^>(\S+)\s+(.+)/ ? print ">$h{$1} $2\n" : print "$_\n";
}
close $fasta;
ADD COMMENT
1
Entering edit mode

I like your approach, but the use of global variables is quite dangerous and is discouraged. You can type perldoc -f open to see recommendations for opening files (or see the web link), and I recommend putting use strict and use warnings in every script (those are enabled in recent versions of Perl, which will generate warnings from this code).

ADD REPLY
0
Entering edit mode

Hi, SES, thanks for your suggestions. I have modified the script. I have a question about the disadvantage of bareword filehandle: if I use IN as filehandle, later on I accidentally use IN again, this will generate error. Is the case the only disadvantage of bareword filehandle? I suppose there exist other dangerous cases, could you briefly make an example? Thanks for your comments.

ADD REPLY
0
Entering edit mode

The code is much improved, thanks! The main problem with bare file handles is they have global scope. If you use IN at the top of your script you will be able to use that file handle anywhere, and this can happen by accident. The problem with

open IN, $ARGV[0];

is that you didn't specify the mode, so it is possible to say print IN "some text\n";and now your data is gone with no warnings! It is also a good practice to write or die ... after the open statement to make sure the file can be opened for reading or writing. A nice convenience to avoid writing this is to just put use autodie; at the top of your script and all these tests will be handled for you (here is a link about autodie).

ADD REPLY
0
Entering edit mode

Thanks a lot SES. Your comment is really helpful!

ADD REPLY
0
Entering edit mode

Hey! Thanks..it works but there's still problem. The 110672 110583 len=90 behind all ID were disappeared. I just want to replace the id itself, not including the location and length.

By the way, thank you

ADD REPLY
0
Entering edit mode

Hi peacezah, I have modified the script.

ADD REPLY
0
Entering edit mode
9.4 years ago
PoGibas 5.1k

See use python to change the header of a fasta file based on a dictionary in another file by tangming2005.

You only need awk example:

awk -f foo.awk dict.dat user.dat

NR == FNR {
  rep[$1] = $2
  next
} 

{
    for (key in rep) {
      gsub(key, rep[key])
    }
    print
}
ADD COMMENT
0
Entering edit mode

It doesn't help

ADD REPLY
0
Entering edit mode

Why? Can you tell what error it gives?

ADD REPLY
0
Entering edit mode
9.4 years ago
roy.granit ▴ 880

I would use the left column as hash keys and the right as the hash items. Then parse each row with a '>', separate using the spaces and make the replacement.

ADD COMMENT
0
Entering edit mode
9.4 years ago
biolab ★ 1.4k

I give a perl solution. I noticed in fasta file one id is NC_006351.1_00512 76172 76077 len=96, whereas in following list file you change it to NC_006351.1_512. Is it right?

#!/usr/bin/perl
open LIST, $ARGV[0];
%h = map { split /\s+/ } <LIST>;
close LIST;

open FASTA, $ARGV[1];
while(<FASTA>) {
    next if /^\s+$/;
    chomp;
    if (/^>(\S+\_)\S{2}(\S{3})\s+/) {
        $id = $1 . $2;
        print ">$h{$id}\n";
    } else {
        print "$_\n";
    }
}
close FASTA;

Usage:

perl changeID.pl listfile fastafile
ADD COMMENT
0
Entering edit mode

Sorry. I edited the ID before posted it here. But what I wanted to do is still the same.

I did what have you suggest but surprisingly the whole id lost

ADD REPLY

Login before adding your answer.

Traffic: 3173 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6