Question: R for loop
0
gravatar for Kritika
4.2 years ago by
Kritika260
India
Kritika260 wrote:

hello all i have one silly question related to R i have two files file1 :-

V1     V2       V3
exon  Rv0001  CCP42723
exon  Rv0002  CCP42724

total rows in file 1 is 4110

file2 :

id    Gene_name   
1     CCP42723 
2     CCP42724

total rows in file 2 is 4114

i want to match column V3 of file1 and column Gene_name of file2 and my output file should only contain column V2 of file1 . Means if my 1strow of file1 and 2 are matching it should give output of V2 first row of file2 in new file. my new file will only have V2 column of file1 which shows matching

i tried for loop but it seems i m going wrong file3$Gene_name <- 0 for (i in 1:4110) { file_new_id = file2[file2$V3 %in% file3[i,2],]$V2 } any help highly appreciated thanks

for-loop rna-seq R • 1.1k views
ADD COMMENTlink modified 4.2 years ago by PoGibas4.8k • written 4.2 years ago by Kritika260
1

Dont use for use merge then extract the require column

merge(x = data1, y = data2, by.x = 'V3', by.y = 'Gene_name')[, 'V2']

Or, import dplyr and use select

library(dplyr); select(data1, V2 %in% data2$Gene_name)$V2
ADD REPLYlink written 4.2 years ago by russhh5.3k
1
gravatar for Sam
4.2 years ago by
Sam2.7k
New York
Sam2.7k wrote:

The easiest way will be

file1[file1$V3%in%file2$Gene_name,]$V2
ADD COMMENTlink written 4.2 years ago by Sam2.7k

yes i tried this but i want to output this in new vector with this command i tried

file3$new_ID=file2[file2$V3 %in% file3$Gene_name,]$V2 but error

Error in $<-.data.frame(*tmp*, "new_ID", value = c(79L, 80L, 81L, : replacement has 4110 rows, data has 4114

so i need to run for loop 4110 times

ADD REPLYlink written 4.2 years ago by Kritika260
1

Well, if what you want is just the V2 from file 1, then there is no need to reassign to file1 data structure but just getting the vector out. The problem with what you did here is that if not all item in file1$V3 is found in file2$Gene_name, the resulting vector will have a different length. If you then want to assign this vector with say N items to file1 which has M rows, then R will complain. Also, it is rather confusing as you have this file1, file2 and file3 but you never mention what format you want your file3 to be

ADD REPLYlink written 4.2 years ago by Sam2.7k

file4$new_id <- 0 for (i in 1:length(file3$Gene_name)) { file4$new_id <- file2[file2$V3 %in% file3[i,2],]$V2 } output > file4$new_id factor(0) 4109 Levels: EBG00000313313 EBG00000313314 EBG00000313315 EBG00000313316 ... Rv3924c

i should V2 columns of file 1 in my file4 with column name new_id which is matching in file2$gene_name and file1$V2

ADD REPLYlink written 4.2 years ago by Kritika260
2

You file names are getting out of hand... So if I understand correctly, you have 3 files, let's call them A B and C

For file A, it is of the following format V1 | V2 | V3 exon | Rv0001 | CCP42723 For file B id | Gene_name 1 | CCP42723

Then what is your file C's format? If you only want to output all V2 from A where the V3 of A is matched with Gene_name in B, you can just directly write it to a new file or data frame, so something like (assuming C is a new data structure, as you have not told us what it looks like)

C=data.frame(new_id=A[A$V3%in%B$Gene_name,]$V2)
ADD REPLYlink written 4.2 years ago by Sam2.7k

thanks sam it worked !!!!!!!

ADD REPLYlink written 4.2 years ago by Kritika260
0
gravatar for venu
4.2 years ago by
venu6.6k
Germany
venu6.6k wrote:

Not exactlt a bioinformatics problem. Try this simple UNIX command

awk 'FNR==NR {a[$1]; next}; $1 in a' file2.txt file1.txt | cut -d ' ' -f 3
ADD COMMENTlink written 4.2 years ago by venu6.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1911 users visited in the last hour