Question: How to find common data in replicates?
2
gravatar for BIOAWY
3 months ago by
BIOAWY70
HUJI
BIOAWY70 wrote:

I have 3 replicates data outputs in separate three .csv files.E

Each files the first column is Gene identification numbers (AGI: my experimental targets). The three files consisted of different experimental outcomes. i.e. first file 250 targets, second ~500, and third ~ 2000.

Each file consisted with other information for the identified targets, e.g.

Capture

I want to identify common targets (AGIs) in these three .csv files and make a new working table combining other details of the common AGIs.

How can I identify the common targets (AGIs) using R programming?

R gene • 248 views
ADD COMMENTlink modified 3 months ago by c.chakraborty80 • written 3 months ago by BIOAWY70
2

Vlookup in excel and join in Linux (twice)

ADD REPLYlink written 3 months ago by cpad01128.8k

Thank you as always cpad0112. Unfortunately, I do not know Linux. I appreciate your help as always.

ADD REPLYlink written 3 months ago by BIOAWY70
1

inner join. inner join.

ADD REPLYlink written 3 months ago by shenwei3564.1k
3
gravatar for zx8754
3 months ago by
zx87544.9k
London
zx87544.9k wrote:

Using tidyverse, try this example

# dummy data
rep1 <- data.frame(AGI = c(1:5,6), value = 1)
rep2 <- data.frame(AGI = c(2:5,6), value = 2)
rep3 <- data.frame(AGI = c(1,5,6), value = 3)

library(tidyverse)

bind_rows(list(rep1, rep2, rep3), .id = "Rep") %>% 
  spread(key = "Rep", value = "value", sep = "") %>%
  filter(complete.cases(.)) %>% 
  mutate(AverageValue = rowMeans(.[, -1]))

#   AGI Rep1 Rep2 Rep3 AverageValue
# 1   5    1    2    3            2
# 2   6    1    2    3            2

Tip: We could create the list of dataframes automatically, something like:

myReps <- lapply(list.files(), read.csv)

Then carry one with code as above:

bind_rows(myReps, .id = "Rep") %>% 
...etc
ADD COMMENTlink modified 3 months ago • written 3 months ago by zx87544.9k
1

You forgot stringsAsFactors=FALSE :-)

ADD REPLYlink written 3 months ago by Ram17k

Yes, good point, in my dummy data gene names are numeric.

ADD REPLYlink written 3 months ago by zx87544.9k

Thank you very much zx8754 :) Appreciate your help with this.

ADD REPLYlink written 3 months ago by BIOAWY70
3
gravatar for b.nota
3 months ago by
b.nota4.9k
Netherlands
b.nota4.9k wrote:

You can use merge with all.x=F and all.y=F arguments.

Something like:

firstMerge <- merge(df1, df2, all.x=F, all.y=F)
secondMerge <- merge(firstMerge, df3, all.x=F, all.y=F)
ADD COMMENTlink written 3 months ago by b.nota4.9k
merge(df1, merge(df2, df3, all=FALSE), all=FALSE)
ADD REPLYlink written 3 months ago by Ram17k
1

Or Reduce, this is best when we have too many dataframes to merge:

Reduce(function(...) merge(..., by = "AGI"), list(df1, df2, df3))
ADD REPLYlink written 3 months ago by zx87544.9k
2
gravatar for cpad0112
3 months ago by
cpad01128.8k
India
cpad01128.8k wrote:

Using sqldf package, SQL like inner joins:

df1 <- data.frame(AGI = sample(1:6), value = rnorm(6,2,1))
df2 <- data.frame(AGI = sample(2:9), value = rnorm(8,0,1.5))
df3 <- data.frame(AGI = c(4,8,5), value = rnorm(3,0,2))

library(sqldf)

sqldf("select df1.AGI,
              df1.value as df1,
              df2.value as df2,
              df3.value as df3
       from df1 inner join df2
                   on df1.AGI=df2.AGI
                inner join df3
                   on df2.AGI=df3.AGI")

#   AGI       df1        df2        df3
# 1   4 0.1842957 -0.9538043 0.07660008
# 2   5 1.2598496  1.1615558 1.03723099

Or using shell:

$ join -1 1 -2 1 <(sort -k1 df1.txt) <(sort -k1 df2.txt) | join -1 1 -2 1 - <(sort -k1 df3.txt) | sort -h

"AGI" "value" "value" "value"
4 0.184295742608555 -0.953804335859278 0.0766000774913751
5 1.25984960266285 1.1615557563389 1.03723099264702
ADD COMMENTlink modified 3 months ago by zx87544.9k • written 3 months ago by cpad01128.8k
0
gravatar for c.chakraborty
3 months ago by
c.chakraborty80 wrote:

We can use Excel, VLOOKUP, for each identifier for each of the arrays, and then VLOOKUP for the values in each array-

ID          Val1     Val2   Val3
Abc    =VLOOKUP(cellA1, from array.csv rep1 col A:B, return value of cell B, False)
Def
Ghi

and so on and you can get the values for each like that. Also for average you can do =AVERAGE(A2, B2, C2) for the cells with values and that will give you mean. You can also get Median. standard deviation and so on Excel.

If you're using windows, separate cells and arrays with comma and for Mac use semi-colon. You can do it all on Excel.

ADD COMMENTlink modified 3 months ago by zx87544.9k • written 3 months ago by c.chakraborty80

You can do it all on Excel

Don't. Use R. Unless you know exactly the kinds of assumptions Excel makes, and the idiosyncrasies of copy-paste in or out of Excel, and the idiosyncrasies of Excel formulas across types of lookups, do not use Excel.

ADD REPLYlink written 3 months ago by Ram17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1252 users visited in the last hour