Question: How to match a FASTA header for extraction using Perl?
0
gravatar for Mimmi Ahlmén
8 months ago by
Mimmi Ahlmén10 wrote:

Hi!

So I have a FASTA file containing sequences, I want to replace old FASTA headers with new ones, and the first step to do so is to match with the header names. It's the name I want the match with, so after the '>'. How do I do this? All sequences have headers somewhat like this:

>Halobacterium_salinarum

This is the part of the code where I find the headers:

     while (my $line = <$IN>) {  if ($line =~ /^>/) {
     my $x =           # Here I want to match with "Halobacterium_salinarum" 
                       # and all the other different species names

I have tried for hours to find out in the right match characters. Is it "any word character": \w? I also want to save the old species name in a hash, then I should save it like this: (\w+) and finish with \s cause thats where the name ends, right?

perl • 389 views
ADD COMMENTlink modified 7 months ago by JC8.0k • written 8 months ago by Mimmi Ahlmén10

Try the script form following article.

https://www.perlmonks.org/?node_id=975419

ADD REPLYlink written 8 months ago by arup1.4k

So, people still use Perl for Bioinformatics!

ADD REPLYlink written 7 months ago by Santosh Anand4.9k

Probably using bioperl will ease your life:

use Bio::SeqIO;
use strict;
use warnings;

my $fasta  = Bio::SeqIO->new(-file => $file , -format => 'Fasta');
while ( my $seq = $fasta->next_seq() ) {
  my $header = $seq->id;
  if ($header =~ m/>(.+)/){
     print "My species name = $1\n";
  }
}
ADD REPLYlink written 7 months ago by Juke-342.2k
1
gravatar for Juke-34
7 months ago by
Juke-342.2k
Sweden
Juke-342.2k wrote:
while (my $line = <$IN>) {
  if ($header =~ m/>(.+)/){
     print "My species name = $1\n";
 }
}
ADD COMMENTlink written 7 months ago by Juke-342.2k
0
gravatar for JC
7 months ago by
JC8.0k
Mexico
JC8.0k wrote:

The \w in Perl matches any alphanumeric char and the underscore, and using (\w+) should match any word and stop to the first no-word char (space or new line). If you want to save this in a hash:

#!/usr/bin/perl

use strict;
use warnings;

my %species = ();
while (<>) {
    if ( m/^>(\w+)/ ) {
         $species{$1}++;
}

print "Species\tCount\n";
while (my ($sp, $cnt) = each %species) {
    print "$sp\t$cnt\n";
}
ADD COMMENTlink written 7 months ago by JC8.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2072 users visited in the last hour