Question: A script for extracting information related to a list of gene names from a file
0
gravatar for ahmad.iut
3.2 years ago by
ahmad.iut70
ahmad.iut70 wrote:

Dear Biostars,

I have a text file containing several rows and columns like this:

"Gene Name"  "Gene Id"   "description"     "GO"
A 1 phosphatase GO:001256
B 2 synthesize GO:013154
C 3 methylase GO:000054
D 4 kinase GO:001254
E 5 oxigenase GO:001354
F 6 synthesize

GO:001254

In addition, I have another text file just containing one column and several rows like this:

Gene Name
A
D
C
B

  I need to extract the rows of file 1 that contain gene names listed in file 2.

Does anybody have any idea how to do that?

PS: I know how to do that by excel but it does not work with huge rows of information.

 

Thank you

rna-seq script data mining gene • 1.5k views
ADD COMMENTlink modified 3.2 years ago by Pierre Lindenbaum118k • written 3.2 years ago by ahmad.iut70
2
gravatar for Benn
3.2 years ago by
Benn6.6k
Netherlands
Benn6.6k wrote:

You can do it with R, with the subset function works pretty intuitively.

ADD COMMENTlink written 3.2 years ago by Benn6.6k

In R:

file1<-read.table("file1.txt", sep="\t", header=T)

file2<-read.table("file2.txt", sep="\t", header=T)

Selection<-file1[file1$"Gene name" %in% file2$"Gene Name",]

You don't even have to use subset function

 

 

 

 

 

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by Benn6.6k

Dear Nota,

Thank you for your answer. These command in R just gave me the headers:

Gene.Name   Gene.Id     description GO

ADD REPLYlink written 3.2 years ago by ahmad.iut70

OK, R substitutes the spaces in the header to dots.

So you can use:

Selection<-file1[file1$Gene.Name %in% file2$Gene.Name,]

 

ADD REPLYlink written 3.2 years ago by Benn6.6k
1

Thank you so much Nota, It worked well. the problem was the spaces in headers (like Gene Name).

ADD REPLYlink written 3.2 years ago by ahmad.iut70
2
gravatar for Pierre Lindenbaum
3.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

using linux:

 

join -1 1 -2 1 <(sort -k1,1 file1.txt) <(sort -k1,1 file2.txt)  > joined.txt

 

using knime.org:

load both files (Read File) in two tables and join using a "Join" node https://www.knime.org/files/nodedetails/_manipulation_column_column_split_combine_Joiner.html

 

ADD COMMENTlink modified 3.2 years ago • written 3.2 years ago by Pierre Lindenbaum118k

Dear Lindenbaum,

The command worked perfectly. Thank you very much

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by ahmad.iut70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1249 users visited in the last hour