Perl script problem: Remove duplicates from my fasta file with help of perl
0
1
Entering edit mode
5.2 years ago

Hi everyone! I have some problem with my Perl script. So I'm trying to compare two files with each other. An example, in my tablefile.txt I have genes that show me what kind of species it contains:

> EOG090X0039     8      IE_sup1,IM_nor1,OE_aff1,IT_ras1,IT_ine1,OH_azt1,OD_pul1,OD_mag1

In my other file( fast format), my gene file, I have species and its sequences.So I,m trying to see if the species in my gene file(fasta file) exist in my tablefile.txt file. All species that tablefile.txt contain I want to store and remove the rest of the species from my fasta file. I will be really grateful for all help.

#!/usr/bin/perl -w
use warnings;
use strict;
use 5.010;

my $tablefile = shift @ARGV;

if ( ! open ( FILE , "<" , $tablefile ) ) {
        die "Error can't find the file: $tablefile because $!";
}

if ( ! open ( FILE_GENE , "<" , @ARGV ) ) {
        die "Error can't find the file: @ARGV because $!";
}

####TABLEFILE.TXT#####
my $table_gene;
my @table_specie = ();
my %table_taxa;

####GENE_FILES########
my $fasta_taxa;
my @fasta_seq = ();
my %fasta_hash;
####COMPARISON########
my @matches = ();

#######################READ FIRST FILE(TABLE.TXT)########################
 while(my $line = <FILE>){
   chomp $line;
   $line =~ s /\s+/:/ig;

   if ( $line =~ m /^(E\w+)\:/){
   $table_gene = $1;
   }

   if ($line =~ m /\w+\:\d+\:(\w.+)$/){
     @table_specie = $1;
   }

   $table_taxa{ $table_gene } = "@table_specie";
}
########################OPEN GENE FILES##################################
foreach my $genefile (@ARGV){

        if($genefile == $table_gene){

          while(my $gene_line = <FILE_GENE>){
              chomp $gene_line;

            if($gene_line =~ m /^\>(\w+)/){
               $fasta_taxa = $1;
            }else{
            $fasta_hash{$fasta_taxa} .= $gene_line;
            }
        }
         @matches = grep { exists $fasta_hash{$fasta_taxa} } @table_specie;
   }
 }
perl Fasta • 1.3k views
ADD COMMENT
1
Entering edit mode

So I'm trying to compare two files with each other.

you should just use a simple linux pipeline with cut/sort/join/ etc...

ADD REPLY
0
Entering edit mode

please, validate/close your previous questions:

C: Import files in newick format with python script ; C: Detecting error in Ubuntu environment ;

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLY

Login before adding your answer.

Traffic: 2808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6