Question: extract data from txt files
0
gravatar for fufuyou
3.7 years ago by
fufuyou100
United States
fufuyou100 wrote:

I want to extract gene expression values from a txt file based on gene names.

for example:

A 1500

B 2500

C 3500

D 3400

I want  to extract A and C gene expression values using gene name file extract it.

A 1500

C 3500

I have wroten a perl code, but it is not work well. Could you help me modify it or give me a new code?

Thanks,

#!/usr/bin/perl -w
use strict;
open(IN,"file1");
open(NAME,"file2");
open(OUT,">result");
while(<IN>){
  chomp;
  @data=split / /,$_;
  $hash{$data[0]}=$data[1];
}
while(<NAME>){
    chomp;
    print OUT "$_ $hash{$_}\n";
}
close IN;
close NAME;
close OUT;
rna-seq • 1.7k views
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by fufuyou100
4

you are reinventing the wheel: the solution is sorting both files and using linux join: http://linux.die.net/man/1/join

ADD REPLYlink written 3.7 years ago by Pierre Lindenbaum120k

Hello fufuyou!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=62682

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 3.7 years ago by Pierre Lindenbaum120k
0
gravatar for Zaag
3.7 years ago by
Zaag720
Amsterdam
Zaag720 wrote:

In the shell:

for i in `cat file2` ;

do awk '$1eq"$i" {print } ' file1 >> result ;

done ;

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Zaag720

using $i like this in awk looks like a bad idea. If both file1 and file2 have too many lines, your solution will take a long time.

ADD REPLYlink written 3.7 years ago by Pierre Lindenbaum120k

True. Didn't know about linux's join command so thanks for that link.

ADD REPLYlink written 3.7 years ago by Zaag720

I use this shell only get the name. I  can not get the gene expression values.

ADD REPLYlink written 3.7 years ago by fufuyou100
0
gravatar for fufuyou
3.7 years ago by
fufuyou100
United States
fufuyou100 wrote:

Thanks  Pierre Lindenbaum and Zaag, 

What is wrong of my perl code?

ADD COMMENTlink written 3.7 years ago by fufuyou100
1

Please note that I vote for Pierre's answer! Nonetheless, I will try to shed some light on your programming experiments.

I am not sure, that

@data=split / /,$_;

works as you expect. I believe `/ /` really tries to match $_ against the regular expression containing a single space. Using some parenthesis for your code, one gets:

@data=split( $_ =~ / /, $_ );

The reg-ex matching returns the result by means of 1 and '' (or something similar) which evaluate to boolean TRUE and FALSE in Perl. Consequently, Perl will evaluate the statement to:

@data=split('', $_);

or

@data=split('1', $_);

depending on the content of $_, i.e., does $_ contain a single space. Finally, the line will be splitted by '1' or '' which is not what you want, right?!

 

Tip 1:

Try to debug your script using Data::Dumper but be aware that you might get a long list of results:

#!/usr/bin/perl -w
use strict;
use Data::Dumper;

open(IN,"file1");
open(NAME,"file2");
open(OUT,">result");
while(<IN>){
  chomp;
  @data=split / /,$_;
  $hash{$data[0]}=$data[1];
}
print Dumper(\%hash);
while(<NAME>){
    chomp;
    print OUT "$_ $hash{$_}\n";
}
close IN;
close NAME;
close OUT;

Tip 2:

If you really want to do it by yourself, think about loading the file into memory which is smaller (I guess these are the names). Applying your script to very large data sets may eventually crash your computer.

Tip 3:

Really try to go for Pierre's solution ;-)

 

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Manuel Landesfeind1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 817 users visited in the last hour