Question: Matching gene ids in R
0
gravatar for aj123
15 days ago by
aj12380
United States
aj12380 wrote:

I have two .csv files. I need to match the 1st rows IDs in one file with the 1st column IDs in another file. im trying this-

If the first file contains a column named "ids" then i do this to extract the sample ids-

first_file_samples <- first_data_frame$ids

And for the second file, if all of the column names are ids-

second_file_samples <- colnames(second_data_frame)

then use this above function to extract sample ids for the second file. Then extract out the intersection between these vectors-

intersect_sample_ids <- intersect(first_file_samples, second_file_samples)

To filter out the first file, then-

 subset_first_file <- first_data_frame %>% filter(ids %in% intersect_sample_ids)

 subset_second_file <- second_data_frame %>% select(all_of(intersect_sample_ids))

But it does not seem to be working. Please tell me what could be going wrong?

rna-seq R • 90 views
ADD COMMENTlink modified 15 days ago by bkleiboeker20 • written 15 days ago by aj12380

Hi, can you include the first few lines from each of the files, and an example of the desired output? The easiest way to share the data would be the output of dput(head(first_file_samples)) as an example for the first file.

ADD REPLYlink written 15 days ago by rpolicastro3.3k
0
gravatar for bkleiboeker
15 days ago by
bkleiboeker20
bkleiboeker20 wrote:

Here's an workaround I use sometimes to extract a column from one df to another by aligning one row, like geneIDs:

Say column 2 in df1 contains logCPM values, then we could 'collect' those values in a new column in df2 (call it df2$logCPM) by like geneID using

df2$logCPM<-as.matrix(df1)[,2][match(df2$ID,df1$ID)]

I'm curious to see if there's a better way to do what you're saying, but I would do it using the above code one column at a time to combine the information in the two dataframes into one single dataframe with all relevant info. The worst part about my solution is the inherent use of a magic number (the column number of desired information in df1), so I'm hopeful I can learn a more dynamic solution to this problem as well!

ADD COMMENTlink written 15 days ago by bkleiboeker20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2585 users visited in the last hour
_