Question: How to find common data in replicates?
1
gravatar for BIOAWY
4 weeks ago by
BIOAWY40
HUJI
BIOAWY40 wrote:

I have 3 replicates data outputs in separate three .csv files.E

Each files the first column is Gene identification numbers (AGI: my experimental targets). The three files consisted of different experimental outcomes. i.e. first file 250 targets, second ~500, and third ~ 2000.

Each file consisted with other information for the identified targets, e.g.

Capture

I want to identify common targets (AGIs) in these three .csv files and make a new working table combining other details of the common AGIs.

How can I identify the common targets (AGIs) using R programming?

R gene • 197 views
ADD COMMENTlink modified 4 weeks ago by c.chakraborty70 • written 4 weeks ago by BIOAWY40
2

Vlookup in excel and join in Linux (twice)

ADD REPLYlink written 4 weeks ago by cpad01127.6k

Thank you as always cpad0112. Unfortunately, I do not know Linux. I appreciate your help as always.

ADD REPLYlink written 4 weeks ago by BIOAWY40
1

inner join. inner join.

ADD REPLYlink written 4 weeks ago by shenwei3564.0k
3
gravatar for zx8754
4 weeks ago by
zx87544.7k
London
zx87544.7k wrote:

Using tidyverse, try this example

# dummy data
rep1 <- data.frame(AGI = c(1:5,6), value = 1)
rep2 <- data.frame(AGI = c(2:5,6), value = 2)
rep3 <- data.frame(AGI = c(1,5,6), value = 3)

library(tidyverse)

bind_rows(list(rep1, rep2, rep3), .id = "Rep") %>% 
  spread(key = "Rep", value = "value", sep = "") %>%
  filter(complete.cases(.)) %>% 
  mutate(AverageValue = rowMeans(.[, -1]))

#   AGI Rep1 Rep2 Rep3 AverageValue
# 1   5    1    2    3            2
# 2   6    1    2    3            2

Tip: We could create the list of dataframes automatically, something like:

myReps <- lapply(list.files(), read.csv)

Then carry one with code as above:

bind_rows(myReps, .id = "Rep") %>% 
...etc
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by zx87544.7k
1

You forgot stringsAsFactors=FALSE :-)

ADD REPLYlink written 4 weeks ago by Ram16k

Yes, good point, in my dummy data gene names are numeric.

ADD REPLYlink written 4 weeks ago by zx87544.7k

Thank you very much zx8754 :) Appreciate your help with this.

ADD REPLYlink written 4 weeks ago by BIOAWY40
3
gravatar for b.nota
4 weeks ago by
b.nota4.3k
Netherlands
b.nota4.3k wrote:

You can use merge with all.x=F and all.y=F arguments.

Something like:

firstMerge <- merge(df1, df2, all.x=F, all.y=F)
secondMerge <- merge(firstMerge, df3, all.x=F, all.y=F)
ADD COMMENTlink written 4 weeks ago by b.nota4.3k
merge(df1, merge(df2, df3, all=FALSE), all=FALSE)
ADD REPLYlink written 4 weeks ago by Ram16k
1

Or Reduce, this is best when we have too many dataframes to merge:

Reduce(function(...) merge(..., by = "AGI"), list(df1, df2, df3))
ADD REPLYlink written 4 weeks ago by zx87544.7k
2
gravatar for cpad0112
4 weeks ago by
cpad01127.6k
India
cpad01127.6k wrote:

Using sqldf package, SQL like inner joins:

df1 <- data.frame(AGI = sample(1:6), value = rnorm(6,2,1))
df2 <- data.frame(AGI = sample(2:9), value = rnorm(8,0,1.5))
df3 <- data.frame(AGI = c(4,8,5), value = rnorm(3,0,2))

library(sqldf)

sqldf("select df1.AGI,
              df1.value as df1,
              df2.value as df2,
              df3.value as df3
       from df1 inner join df2
                   on df1.AGI=df2.AGI
                inner join df3
                   on df2.AGI=df3.AGI")

#   AGI       df1        df2        df3
# 1   4 0.1842957 -0.9538043 0.07660008
# 2   5 1.2598496  1.1615558 1.03723099

Or using shell:

$ join -1 1 -2 1 <(sort -k1 df1.txt) <(sort -k1 df2.txt) | join -1 1 -2 1 - <(sort -k1 df3.txt) | sort -h

"AGI" "value" "value" "value"
4 0.184295742608555 -0.953804335859278 0.0766000774913751
5 1.25984960266285 1.1615557563389 1.03723099264702
ADD COMMENTlink modified 4 weeks ago by zx87544.7k • written 4 weeks ago by cpad01127.6k
0
gravatar for c.chakraborty
4 weeks ago by
c.chakraborty70 wrote:

We can use Excel, VLOOKUP, for each identifier for each of the arrays, and then VLOOKUP for the values in each array-

ID          Val1     Val2   Val3
Abc    =VLOOKUP(cellA1, from array.csv rep1 col A:B, return value of cell B, False)
Def
Ghi

and so on and you can get the values for each like that. Also for average you can do =AVERAGE(A2, B2, C2) for the cells with values and that will give you mean. You can also get Median. standard deviation and so on Excel.

If you're using windows, separate cells and arrays with comma and for Mac use semi-colon. You can do it all on Excel.

ADD COMMENTlink modified 4 weeks ago by zx87544.7k • written 4 weeks ago by c.chakraborty70

You can do it all on Excel

Don't. Use R. Unless you know exactly the kinds of assumptions Excel makes, and the idiosyncrasies of copy-paste in or out of Excel, and the idiosyncrasies of Excel formulas across types of lookups, do not use Excel.

ADD REPLYlink written 4 weeks ago by Ram16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1097 users visited in the last hour