Question: How to find common data in replicates?
2
gravatar for Wuschel
8 months ago by
Wuschel110
HUJI
Wuschel110 wrote:

I have 3 replicates data outputs in separate three .csv files.E

Each files the first column is Gene identification numbers (AGI: my experimental targets). The three files consisted of different experimental outcomes. i.e. first file 250 targets, second ~500, and third ~ 2000.

Each file consisted with other information for the identified targets, e.g.

Capture

I want to identify common targets (AGIs) in these three .csv files and make a new working table combining other details of the common AGIs.

How can I identify the common targets (AGIs) using R programming?

R gene • 367 views
ADD COMMENTlink modified 8 months ago by c.chakraborty110 • written 8 months ago by Wuschel110
2

Vlookup in excel and join in Linux (twice)

ADD REPLYlink written 8 months ago by cpad011211k

Thank you as always cpad0112. Unfortunately, I do not know Linux. I appreciate your help as always.

ADD REPLYlink written 8 months ago by Wuschel110
1

inner join. inner join.

ADD REPLYlink written 8 months ago by shenwei3564.5k
3
gravatar for zx8754
8 months ago by
zx87546.6k
London
zx87546.6k wrote:

Using tidyverse, try this example

# dummy data
rep1 <- data.frame(AGI = c(1:5,6), value = 1)
rep2 <- data.frame(AGI = c(2:5,6), value = 2)
rep3 <- data.frame(AGI = c(1,5,6), value = 3)

library(tidyverse)

bind_rows(list(rep1, rep2, rep3), .id = "Rep") %>% 
  spread(key = "Rep", value = "value", sep = "") %>%
  filter(complete.cases(.)) %>% 
  mutate(AverageValue = rowMeans(.[, -1]))

#   AGI Rep1 Rep2 Rep3 AverageValue
# 1   5    1    2    3            2
# 2   6    1    2    3            2

Tip: We could create the list of dataframes automatically, something like:

myReps <- lapply(list.files(), read.csv)

Then carry one with code as above:

bind_rows(myReps, .id = "Rep") %>% 
...etc
ADD COMMENTlink modified 8 months ago • written 8 months ago by zx87546.6k
1

You forgot stringsAsFactors=FALSE :-)

ADD REPLYlink written 8 months ago by RamRS20k

Yes, good point, in my dummy data gene names are numeric.

ADD REPLYlink written 8 months ago by zx87546.6k

Thank you very much zx8754 :) Appreciate your help with this.

ADD REPLYlink written 8 months ago by Wuschel110
3
gravatar for b.nota
8 months ago by
b.nota6.2k
Netherlands
b.nota6.2k wrote:

You can use merge with all.x=F and all.y=F arguments.

Something like:

firstMerge <- merge(df1, df2, all.x=F, all.y=F)
secondMerge <- merge(firstMerge, df3, all.x=F, all.y=F)
ADD COMMENTlink written 8 months ago by b.nota6.2k
merge(df1, merge(df2, df3, all=FALSE), all=FALSE)
ADD REPLYlink written 8 months ago by RamRS20k
1

Or Reduce, this is best when we have too many dataframes to merge:

Reduce(function(...) merge(..., by = "AGI"), list(df1, df2, df3))
ADD REPLYlink written 8 months ago by zx87546.6k
2
gravatar for cpad0112
8 months ago by
cpad011211k
India
cpad011211k wrote:

Using sqldf package, SQL like inner joins:

df1 <- data.frame(AGI = sample(1:6), value = rnorm(6,2,1))
df2 <- data.frame(AGI = sample(2:9), value = rnorm(8,0,1.5))
df3 <- data.frame(AGI = c(4,8,5), value = rnorm(3,0,2))

library(sqldf)

sqldf("select df1.AGI,
              df1.value as df1,
              df2.value as df2,
              df3.value as df3
       from df1 inner join df2
                   on df1.AGI=df2.AGI
                inner join df3
                   on df2.AGI=df3.AGI")

#   AGI       df1        df2        df3
# 1   4 0.1842957 -0.9538043 0.07660008
# 2   5 1.2598496  1.1615558 1.03723099

Or using shell:

$ join -1 1 -2 1 <(sort -k1 df1.txt) <(sort -k1 df2.txt) | join -1 1 -2 1 - <(sort -k1 df3.txt) | sort -h

"AGI" "value" "value" "value"
4 0.184295742608555 -0.953804335859278 0.0766000774913751
5 1.25984960266285 1.1615557563389 1.03723099264702
ADD COMMENTlink modified 8 months ago by zx87546.6k • written 8 months ago by cpad011211k
0
gravatar for c.chakraborty
8 months ago by
c.chakraborty110
c.chakraborty110 wrote:

We can use Excel, VLOOKUP, for each identifier for each of the arrays, and then VLOOKUP for the values in each array-

ID          Val1     Val2   Val3
Abc    =VLOOKUP(cellA1, from array.csv rep1 col A:B, return value of cell B, False)
Def
Ghi

and so on and you can get the values for each like that. Also for average you can do =AVERAGE(A2, B2, C2) for the cells with values and that will give you mean. You can also get Median. standard deviation and so on Excel.

If you're using windows, separate cells and arrays with comma and for Mac use semi-colon. You can do it all on Excel.

ADD COMMENTlink modified 8 months ago by zx87546.6k • written 8 months ago by c.chakraborty110

You can do it all on Excel

Don't. Use R. Unless you know exactly the kinds of assumptions Excel makes, and the idiosyncrasies of copy-paste in or out of Excel, and the idiosyncrasies of Excel formulas across types of lookups, do not use Excel.

ADD REPLYlink written 8 months ago by RamRS20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2240 users visited in the last hour