Question: How to calculate overlap of peptides between different categories to create Ven diagram
0
gravatar for ishackm
12 months ago by
ishackm90
ishackm90 wrote:

Hi all,

I have the following dataset:

  ï..TGEClass.known         TGEClass.uknown
1             GVVEVTHDLQK             GVVEVTHDLQK
2           LFYADHPFIFLVR           LFYADHPFIFLVR
3       SALQSINEWAAQTTDGK       SALQSINEWAAQTTDGK
4  AVLSAEQLRDEEVHAGLGELLR  AVLSAEQLRDEEVHAGLGELL

I would like to calculate please the number of peptides that are present in both categories and those that are not.

I have tried to use the Venn count function from limma but that only accepts numerical values:

a <- vennCounts(c3)
a
     hw hm hr Counts
[1,]  0  0  0    113
[2,]  0  0  1     18
[3,]  0  1  0      8
[4,]  0  1  1      8
[5,]  1  0  0     12
[6,]  1  0  1      8
[7,]  1  1  0     11
[8,]  1  1  1     22

How I can convert my peptide dataset like that dataset above so that I can make a Venn diagram. I have researched everywhere I can but still failed to find the solution.

I would really appreciate it if someone could help me solve this problem.

Many Thanks,

Ishack

ADD COMMENTlink modified 12 months ago by lieven.sterck7.9k • written 12 months ago by ishackm90
1
gravatar for SMK
12 months ago by
SMK1.9k
SMK1.9k wrote:

Hi Ishack,

Try this:

df <-
  data.frame(
    TGEClass.known = c(
      "GVVEVTHDLQK",
      "LFYADHPFIFLVR",
      "SALQSINEWAAQTTDGK",
      "AVLSAEQLRDEEVHAGLGELLR"
    ),
    TGEClass.uknown = c(
      "GVVEVTHDLQK",
      "LFYADHPFIFLVR",
      "SALQSINEWAAQTTDGK",
      "AVLSAEQLRDEEVHAGLGELL"
    )
  )


# Present in both TGEClass.known and TGEClass.uknown
length(intersect(df$TGEClass.known, df$TGEClass.uknown))

# TGEClass.known only
length(setdiff(df$TGEClass.known, df$TGEClass.uknown))

# TGEClass.uknown only
length(setdiff(df$TGEClass.uknown, df$TGEClass.known))
ADD COMMENTlink modified 12 months ago • written 12 months ago by SMK1.9k

Hi SMK, Thanks very much for your answer but how can I get a table like this automatically, it is quite long to do it manually?

hw hm hr Counts
[1,]  0  0  0    113
[2,]  0  0  1     18
[3,]  0  1  0      8
[4,]  0  1  1      8
[5,]  1  0  0     12
[6,]  1  0  1      8
[7,]  1  1  0     11
[8,]  1  1  1     22
ADD REPLYlink written 12 months ago by ishackm90

What are hw, hm, and hr?

ADD REPLYlink written 12 months ago by SMK1.9k

Sorry those are meant to say TGEClass.uknown and TGEClass known. Please ignore the hw, hm and hr, I want table like that for TGEClass known and TGEClass unknown

ADD REPLYlink written 12 months ago by ishackm90

Perhaps:

> df.venn <- data.frame(
+   TGEClass.known = c(1, 1, 0),
+   TGEClass.unknown = c(1, 0, 1),
+   Counts = c(length(
+     intersect(df$TGEClass.known, df$TGEClass.uknown)
+   ), length(
+     setdiff(df$TGEClass.known, df$TGEClass.uknown)
+   ), length(
+     setdiff(df$TGEClass.uknown, df$TGEClass.known)
+   ))
+ )
> df.venn
  TGEClass.known TGEClass.unknown Counts
1              1                1      3
2              1                0      1
3              0                1      1
> as.matrix(df.venn)
     TGEClass.known TGEClass.unknown Counts
[1,]              1                1      3
[2,]              1                0      1
[3,]              0                1      1
ADD REPLYlink modified 12 months ago • written 12 months ago by SMK1.9k

Hi SMK thanks a lot thats what was look for. Just one final question if you don't mind.

I have a lot of data frames like the one above but each one has a different number of categories and also different categories, would it be possible to intersect and setdif between all the different columns automatically?

ADD REPLYlink written 12 months ago by ishackm90

Got an idea from the function: venn, here demonstrating 2 sets and 3 sets:

> library(gplots)
> # Two sets
> df1 <-
+   data.frame(
+     TGEClass.known = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     ),
+     TGEClass.uknown = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELL"
+     )
+   )
> venn.tab1 <- venn(as.list(df1), show.plot = FALSE)
> attr(venn.tab1, "intersections") <- NULL
> attr(venn.tab1, "class") <- NULL
> print(venn.tab1)
   num TGEClass.known TGEClass.uknown
00   0              0               0
01   1              0               1
10   1              1               0
11   3              1               1
> # Three sets
> df2 <-
+   data.frame(
+     TGEClass.set1 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     ),
+     TGEClass.set2 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELL"
+     ),
+     TGEClass.set3 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGKK",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     )
+   )
> venn.tab2 <- venn(as.list(df2), show.plot = FALSE)
> attr(venn.tab2, "intersections") <- NULL
> attr(venn.tab2, "class") <- NULL
> print(venn.tab2)
    num TGEClass.set1 TGEClass.set2 TGEClass.set3
000   0             0             0             0
001   1             0             0             1
010   1             0             1             0
011   0             0             1             1
100   0             1             0             0
101   1             1             0             1
110   1             1             1             0
111   2             1             1             1
ADD REPLYlink modified 12 months ago • written 12 months ago by SMK1.9k

Hi SMK, Unfortunately, I found just now that I can't do a Venn diagram for more than 5 categories.

Can you help me create a df that looks like this please?

TGE-Class     Count
T1              1
T2              1
Both            6

Thanks very much

ADD REPLYlink modified 12 months ago • written 12 months ago by ishackm90
1
> library(gplots)
> df <-
+   data.frame(
+     T1 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "SALQSINEWAAQTTDGLL",
+       "SALQSINEWAAQTTDGTT",
+       "SALQSINEWAAQTTDGQQ",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     ),
+     T2 = c(
+       "GVVEVTHDLQK",
+       "LFYADHPFIFLVR",
+       "SALQSINEWAAQTTDGK",
+       "SALQSINEWAAQTTDGLL",
+       "SALQSINEWAAQTTDGTT",
+       "SALQSINEWAAQTTDGQQ",
+       "AVLSAEQLRDEEVHAGLGELL"
+     )
+   )
> venn.tab <- venn(as.list(df), show.plot = FALSE)
> t(t(unlist(lapply(attr(venn.tab, "intersections"), length))))
      [,1]
T1       1
T2       1
T1:T2    6
ADD REPLYlink modified 12 months ago • written 12 months ago by SMK1.9k

Hi SMK,

Thanks very much for your quick response, I have been trying all day to fix this. You are a life saver!

ADD REPLYlink written 12 months ago by ishackm90

Hi SMK, sorry for the lateness, is there a way to see the number of unique peptides from each category when there are blanks in columns, please?

the length code sees the blank cells as unique peptides, unfortunately.

ADD REPLYlink written 11 months ago by ishackm90

Hi ishackm,

You can remove the empty element in list before you use venn:

l <- as.list(df)
l <- lapply(l, function(x) { x[!x == ""] })
venn.tab <- venn(l, show.plot = FALSE)
ADD REPLYlink written 11 months ago by SMK1.9k

Hi SMK , thank you again for your quick response. Much Appreciated.

ADD REPLYlink written 11 months ago by ishackm90

Cool, glad it helps!

ADD REPLYlink modified 11 months ago • written 11 months ago by SMK1.9k
1
gravatar for zx8754
12 months ago by
zx87549.3k
London
zx87549.3k wrote:

Convert to TRUE/FALSE, then use limma venn counts:

# example data
df <-data.frame(
  TGEClass.known = c(
    "GVVEVTHDLQK",
    "LFYADHPFIFLVR",
    "SALQSINEWAAQTTDGK",
    "AVLSAEQLRDEEVHAGLGELLR"
  ),
  TGEClass.uknown = c(
    "GVVEVTHDLQK",
    "LFYADHPFIFLVR",
    "SALQSINEWAAQTTDGK",
    "AVLSAEQLRDEEVHAGLGELL"
  ), stringsAsFactors = FALSE
)

library(data.table)

x <- dcast(cbind(stack(as.list(df)), x = TRUE), 
           values ~ ind, 
           value.var = "x", 
           fill = FALSE)[, -1]    

limma::vennCounts(x)
#   TGEClass.known TGEClass.uknown Counts
# 1              0               0      0
# 2              0               1      1
# 3              1               0      1
# 4              1               1      3

limma::vennDiagram(x)
ADD COMMENTlink written 12 months ago by zx87549.3k

Hi, I ran the code you gave me but it is giving me an error:

    df = read.csv("FN1.csv")
    FN1 = as.vector(df)



    library(data.table)

    x <- dcast(cbind(stack(as.list(FN1)), x = TRUE), 
               values ~ ind, 
               value.var = "x", 
               fill = FALSE)[, -1]    
    limma::vennCounts

(x)

Error in stack.default(as.list(FN1)) : 
  at least one vector element is required

What im I doing wrong here please?

ADD REPLYlink written 12 months ago by ishackm90

You need to share your example CSV: FN1.csv, so that we can reproduce the problem.

ADD REPLYlink written 12 months ago by zx87549.3k

Sorry for the late reply,

this is the csv I am using:

T2  T3
QHDMGHMMR   QHDMGHMMR
RPGGEPSPEGTTGQSYNQYSQR  RPGGEPSPEGTTGQSYNQYSQR
KTDELPQLVTLPHPNLHGPEILDVPSTVQK  KTDELPQLVTLPHPNLHGPEILDVPSTVQK
HRPRPYPPNVGEEIQIGHIPR   HRPRPYPPNVGEEIQIGHIPR
QHDMGHMMR   QHDMGHMMR
DQCIVDDITYNVNDTFHK  DQCIVDDITYNVNDTFHK
YYRITYGETGGNSPVQEFTVPGSK    YYRITYGETGGNSPVQEFTVPGSK

The code:

test = read.csv("test.csv", stringsAsFactors = FALSE)


library(gplots)
# example data



library(data.table)

x <- dcast(cbind(stack(as.list(df2)), x = TRUE), 
           values ~ ind, 
           value.var = "x", 
           fill = FALSE)[, -1]    

limma::vennCounts(x)
limma::vennDiagram(x)

The error:

Aggregation function missing: defaulting to length
Error in vapply(indices, fun, .default) : values must be type 'logical',
 but FUN(X[[1]]) result is type 'integer'

How can I fix this please?

ADD REPLYlink modified 12 months ago • written 12 months ago by ishackm90

Yes, as the your columns overlap fully TRUE/FALSE is not working, replace TRUE/FALSE with 1/0 in dcast, see below example:

# example data
df <-read.table(text = "
T2  T3
QHDMGHMMR   QHDMGHMMR
RPGGEPSPEGTTGQSYNQYSQR  RPGGEPSPEGTTGQSYNQYSQR
KTDELPQLVTLPHPNLHGPEILDVPSTVQK  KTDELPQLVTLPHPNLHGPEILDVPSTVQK
HRPRPYPPNVGEEIQIGHIPR   HRPRPYPPNVGEEIQIGHIPR
QHDMGHMMR   QHDMGHMMR
DQCIVDDITYNVNDTFHK  DQCIVDDITYNVNDTFHK
YYRITYGETGGNSPVQEFTVPGSK    YYRITYGETGGNSPVQEFTVPGSK", stringsAsFactors = FALSE, header = TRUE)

library(data.table)

x <- dcast(cbind(stack(as.list(df)), x = 1), 
           values ~ ind, 
           value.var = "x", 
           fill = 0)[, -1]

limma::vennCounts(x)

#   T2 T3 Counts
# 1  0  0      0
# 2  0  1      0
# 3  1  0      0
# 4  1  1      6
# attr(,"class")
# [1] "VennCounts"
ADD REPLYlink written 12 months ago by zx87549.3k

Thanks very much for your quick response

ADD REPLYlink written 12 months ago by ishackm90

Hi, Unfortunately, I found just now that I can't do a Venn diagram for more than 5 categories.

Can you help me create a df that looks like this please?

TGE-Class     Count
T1              1
T2              1
Both            6

Thanks very much

ADD REPLYlink modified 12 months ago • written 12 months ago by ishackm90
0
gravatar for lieven.sterck
12 months ago by
lieven.sterck7.9k
VIB, Ghent, Belgium
lieven.sterck7.9k wrote:

if you are looking for exact mactches (so no peptide can be subset of another) you can use your lists as such as input for DrawVenn . It's an online tool for drawing venn diagrams

ADD COMMENTlink written 12 months ago by lieven.sterck7.9k

Thanks to all for their help and support. This is exactly what was looking for

ADD REPLYlink written 12 months ago by ishackm90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1295 users visited in the last hour