How to filter genes in R
2
0
Entering edit mode
7.7 years ago
nkabo ▴ 80

I have a list of gene names in file a.File a has 1 variable and 264 observations. Another list(file b) which includes several gene names and information about these genes, file b has 10 variables and 16558 observations. I do it in R and I want to check file b for the genes in file a and I want to extract whole row an put into an empty data frame(filec) when the gene name specified in file a matches with the gene name in file b. I write something like this:

filea <- read.csv("filea.csv", sep = ";")
fileb <-read.csv("fileb.csv", sep = ";")
filec <- data.frame()

for (i in 1:dim(filea)[1]) {
  for (j in 1:dim(fileb)[1]) {
    if (as.character(filea[i, 1]) == as.character(fileb[j, 1])) {   
     filec <- merge(fileb[j, 1], filec)
    }
  }
}

But it gives error. What can I do? Thanks in advance.

R • 3.8k views
ADD COMMENT
1
Entering edit mode

Err, why don't you drop the for loops? You'll also want to directly merge filea and fileb.

BTW, you might want to use the dplyr package, it provides more efficient join methods (e.g., left_join()).

ADD REPLY
3
Entering edit mode
7.7 years ago
Guangchuang Yu ★ 2.6k

all you want is just:

filec <- fileb[fileb[,1] %in% filea[,1], ]
ADD COMMENT
0
Entering edit mode

It helped, thank you for your answer :)

ADD REPLY
1
Entering edit mode
7.7 years ago
Biogeek ▴ 470

Seconded. Use the set you want as a data frame and the records as another data frame. Make sure the Column ids are set as V1 and then use plyr to join by 'V1' . Define this command as a new variable so you get the output as a data frame. Save as table using write.table function. No need for a loop. Just make sure you untick na strings and don't use quotes when inputting to R studio. Can cause problems with plyr joining.

ADD COMMENT

Login before adding your answer.

Traffic: 3296 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6