Question: How to calculate overlap of peptides between different categories to create Ven diagram
0
gravatar for ishackm
7 weeks ago by
ishackm70
ishackm70 wrote:

Hi all,

I have the following dataset:

  ï..TGEClass.known         TGEClass.uknown
1             GVVEVTHDLQK             GVVEVTHDLQK
2           LFYADHPFIFLVR           LFYADHPFIFLVR
3       SALQSINEWAAQTTDGK       SALQSINEWAAQTTDGK
4  AVLSAEQLRDEEVHAGLGELLR  AVLSAEQLRDEEVHAGLGELL

I would like to calculate please the number of peptides that are present in both categories and those that are not.

I have tried to use the Venn count function from limma but that only accepts numerical values:

a <- vennCounts(c3)
a
     hw hm hr Counts
[1,]  0  0  0    113
[2,]  0  0  1     18
[3,]  0  1  0      8
[4,]  0  1  1      8
[5,]  1  0  0     12
[6,]  1  0  1      8
[7,]  1  1  0     11
[8,]  1  1  1     22

How I can convert my peptide dataset like that dataset above so that I can make a Venn diagram. I have researched everywhere I can but still failed to find the solution.

I would really appreciate it if someone could help me solve this problem.

Many Thanks,

Ishack

ADD COMMENTlink modified 7 weeks ago by lieven.sterck5.6k • written 7 weeks ago by ishackm70
1
gravatar for SMK
7 weeks ago by
SMK1.8k
SMK1.8k wrote:

Hi Ishack,

Try this:

df <-
  data.frame(
    TGEClass.known = c(
      "GVVEVTHDLQK",
      "LFYADHPFIFLVR",
      "SALQSINEWAAQTTDGK",
      "AVLSAEQLRDEEVHAGLGELLR"
    ),
    TGEClass.uknown = c(
      "GVVEVTHDLQK",
      "LFYADHPFIFLVR",
      "SALQSINEWAAQTTDGK",
      "AVLSAEQLRDEEVHAGLGELL"
    )
  )


# Present in both TGEClass.known and TGEClass.uknown
length(intersect(df$TGEClass.known, df$TGEClass.uknown))

# TGEClass.known only
length(setdiff(df$TGEClass.known, df$TGEClass.uknown))

# TGEClass.uknown only
length(setdiff(df$TGEClass.uknown, df$TGEClass.known))
ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by SMK1.8k

Hi SMK, Thanks very much for your answer but how can I get a table like this automatically, it is quite long to do it manually?

hw hm hr Counts
[1,]  0  0  0    113
[2,]  0  0  1     18
[3,]  0  1  0      8
[4,]  0  1  1      8
[5,]  1  0  0     12
[6,]  1  0  1      8
[7,]  1  1  0     11
[8,]  1  1  1     22
ADD REPLYlink written 7 weeks ago by ishackm70

What are hw, hm, and hr?

ADD REPLYlink written 7 weeks ago by SMK1.8k

Sorry those are meant to say TGEClass.uknown and TGEClass known. Please ignore the hw, hm and hr, I want table like that for TGEClass known and TGEClass unknown

ADD REPLYlink written 7 weeks ago by ishackm70

Perhaps:

> df.venn <- data.frame(
+   TGEClass.known = c(1, 1, 0),
+   TGEClass.unknown = c(1, 0, 1),
+   Counts = c(length(
+     intersect(df$TGEClass.known, df$TGEClass.uknown)
+   ), length(
+     setdiff(df$TGEClass.known, df$TGEClass.uknown)
+   ), length(
+     setdiff(df$TGEClass.uknown, df$TGEClass.known)
+   ))
+ )
> df.venn
  TGEClass.known TGEClass.unknown Counts
1              1                1      3
2              1                0      1
3              0                1      1
> as.matrix(df.venn)
     TGEClass.known TGEClass.unknown Counts
[1,]              1                1      3
[2,]              1                0      1
[3,]              0                1      1
ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by SMK1.8k

Hi SMK thanks a lot thats what was look for. Just one final question if you don't mind.

I have a lot of data frames like the one above but each one has a different number of categories and also different categories, would it be possible to intersect and setdif between all the different columns automatically?

ADD REPLYlink written 7 weeks ago by ishackm70

Got an idea from the function: venn, here demonstrating 2 sets and 3 sets:

> library(gplots)
> # Two sets
> df1 <-
+   data.frame(
+     TGEClass.known = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     ),
+     TGEClass.uknown = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELL"
+     )
+   )
> venn.tab1 <- venn(as.list(df1), show.plot = FALSE)
> attr(venn.tab1, "intersections") <- NULL
> attr(venn.tab1, "class") <- NULL
> print(venn.tab1)
   num TGEClass.known TGEClass.uknown
00   0              0               0
01   1              0               1
10   1              1               0
11   3              1               1
> # Three sets
> df2 <-
+   data.frame(
+     TGEClass.set1 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     ),
+     TGEClass.set2 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELL"
+     ),
+     TGEClass.set3 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGKK",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     )
+   )
> venn.tab2 <- venn(as.list(df2), show.plot = FALSE)
> attr(venn.tab2, "intersections") <- NULL
> attr(venn.tab2, "class") <- NULL
> print(venn.tab2)
    num TGEClass.set1 TGEClass.set2 TGEClass.set3
000   0             0             0             0
001   1             0             0             1
010   1             0             1             0
011   0             0             1             1
100   0             1             0             0
101   1             1             0             1
110   1             1             1             0
111   2             1             1             1
ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by SMK1.8k

Hi SMK, Unfortunately, I found just now that I can't do a Venn diagram for more than 5 categories.

Can you help me create a df that looks like this please?

TGE-Class     Count
T1              1
T2              1
Both            6

Thanks very much

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by ishackm70
1
> library(gplots)
> df <-
+   data.frame(
+     T1 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "SALQSINEWAAQTTDGLL",
+       "SALQSINEWAAQTTDGTT",
+       "SALQSINEWAAQTTDGQQ",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     ),
+     T2 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "SALQSINEWAAQTTDGLL",
+       "SALQSINEWAAQTTDGTT",
+       "SALQSINEWAAQTTDGQQ",
+       "AVLSAEQLRDEEVHAGLGELL"
+     )
+   )
> venn.tab <- venn(as.list(df), show.plot = FALSE)
> t(t(unlist(lapply(attr(venn.tab, "intersections"), length))))
      [,1]
T1       1
T2       1
T1:T2    6
ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by SMK1.8k

Hi SMK,

Thanks very much for your quick response, I have been trying all day to fix this. You are a life saver!

ADD REPLYlink written 6 weeks ago by ishackm70

Hi SMK, sorry for the lateness, is there a way to see the number of unique peptides from each category when there are blanks in columns, please?

the length code sees the blank cells as unique peptides, unfortunately.

ADD REPLYlink written 5 weeks ago by ishackm70

Hi ishackm,

You can remove the empty element in list before you use venn:

l <- as.list(df)
l <- lapply(l, function(x) { x[!x == ""] })
venn.tab <- venn(l, show.plot = FALSE)
ADD REPLYlink written 5 weeks ago by SMK1.8k

Hi SMK , thank you again for your quick response. Much Appreciated.

ADD REPLYlink written 5 weeks ago by ishackm70

Cool, glad it helps!

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by SMK1.8k
1
gravatar for zx8754
7 weeks ago by
zx87548.0k
London
zx87548.0k wrote:

Convert to TRUE/FALSE, then use limma venn counts:

# example data
df <-data.frame(
  TGEClass.known = c(
    "GVVEVTHDLQK",
    "LFYADHPFIFLVR",
    "SALQSINEWAAQTTDGK",
    "AVLSAEQLRDEEVHAGLGELLR"
  ),
  TGEClass.uknown = c(
    "GVVEVTHDLQK",
    "LFYADHPFIFLVR",
    "SALQSINEWAAQTTDGK",
    "AVLSAEQLRDEEVHAGLGELL"
  ), stringsAsFactors = FALSE
)

library(data.table)

x <- dcast(cbind(stack(as.list(df)), x = TRUE), 
           values ~ ind, 
           value.var = "x", 
           fill = FALSE)[, -1]    

limma::vennCounts(x)
#   TGEClass.known TGEClass.uknown Counts
# 1              0               0      0
# 2              0               1      1
# 3              1               0      1
# 4              1               1      3

limma::vennDiagram(x)
ADD COMMENTlink written 7 weeks ago by zx87548.0k

Hi, I ran the code you gave me but it is giving me an error:

    df = read.csv("FN1.csv")
    FN1 = as.vector(df)



    library(data.table)

    x <- dcast(cbind(stack(as.list(FN1)), x = TRUE), 
               values ~ ind, 
               value.var = "x", 
               fill = FALSE)[, -1]    
    limma::vennCounts

(x)

Error in stack.default(as.list(FN1)) : 
  at least one vector element is required

What im I doing wrong here please?

ADD REPLYlink written 7 weeks ago by ishackm70

You need to share your example CSV: FN1.csv, so that we can reproduce the problem.

ADD REPLYlink written 7 weeks ago by zx87548.0k

Sorry for the late reply,

this is the csv I am using:

T2  T3
QHDMGHMMR   QHDMGHMMR
RPGGEPSPEGTTGQSYNQYSQR  RPGGEPSPEGTTGQSYNQYSQR
KTDELPQLVTLPHPNLHGPEILDVPSTVQK  KTDELPQLVTLPHPNLHGPEILDVPSTVQK
HRPRPYPPNVGEEIQIGHIPR   HRPRPYPPNVGEEIQIGHIPR
QHDMGHMMR   QHDMGHMMR
DQCIVDDITYNVNDTFHK  DQCIVDDITYNVNDTFHK
YYRITYGETGGNSPVQEFTVPGSK    YYRITYGETGGNSPVQEFTVPGSK

The code:

test = read.csv("test.csv", stringsAsFactors = FALSE)


library(gplots)
# example data



library(data.table)

x <- dcast(cbind(stack(as.list(df2)), x = TRUE), 
           values ~ ind, 
           value.var = "x", 
           fill = FALSE)[, -1]    

limma::vennCounts(x)
limma::vennDiagram(x)

The error:

Aggregation function missing: defaulting to length
Error in vapply(indices, fun, .default) : values must be type 'logical',
 but FUN(X[[1]]) result is type 'integer'

How can I fix this please?

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by ishackm70

Yes, as the your columns overlap fully TRUE/FALSE is not working, replace TRUE/FALSE with 1/0 in dcast, see below example:

# example data
df <-read.table(text = "
T2  T3
QHDMGHMMR   QHDMGHMMR
RPGGEPSPEGTTGQSYNQYSQR  RPGGEPSPEGTTGQSYNQYSQR
KTDELPQLVTLPHPNLHGPEILDVPSTVQK  KTDELPQLVTLPHPNLHGPEILDVPSTVQK
HRPRPYPPNVGEEIQIGHIPR   HRPRPYPPNVGEEIQIGHIPR
QHDMGHMMR   QHDMGHMMR
DQCIVDDITYNVNDTFHK  DQCIVDDITYNVNDTFHK
YYRITYGETGGNSPVQEFTVPGSK    YYRITYGETGGNSPVQEFTVPGSK", stringsAsFactors = FALSE, header = TRUE)

library(data.table)

x <- dcast(cbind(stack(as.list(df)), x = 1), 
           values ~ ind, 
           value.var = "x", 
           fill = 0)[, -1]

limma::vennCounts(x)

#   T2 T3 Counts
# 1  0  0      0
# 2  0  1      0
# 3  1  0      0
# 4  1  1      6
# attr(,"class")
# [1] "VennCounts"
ADD REPLYlink written 7 weeks ago by zx87548.0k

Thanks very much for your quick response

ADD REPLYlink written 7 weeks ago by ishackm70

Hi, Unfortunately, I found just now that I can't do a Venn diagram for more than 5 categories.

Can you help me create a df that looks like this please?

TGE-Class     Count
T1              1
T2              1
Both            6

Thanks very much

ADD REPLYlink modified 6 weeks ago • written 7 weeks ago by ishackm70
0
gravatar for lieven.sterck
7 weeks ago by
lieven.sterck5.6k
VIB, Ghent, Belgium
lieven.sterck5.6k wrote:

if you are looking for exact mactches (so no peptide can be subset of another) you can use your lists as such as input for DrawVenn . It's an online tool for drawing venn diagrams

ADD COMMENTlink written 7 weeks ago by lieven.sterck5.6k

Thanks to all for their help and support. This is exactly what was looking for

ADD REPLYlink written 7 weeks ago by ishackm70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 825 users visited in the last hour