How to calculate overlap of peptides between different categories to create Ven diagram
3
0
Entering edit mode
3.9 years ago
ishackm ▴ 110

Hi all,

I have the following dataset:

  ï..TGEClass.known         TGEClass.uknown
1             GVVEVTHDLQK             GVVEVTHDLQK
3       SALQSINEWAAQTTDGK       SALQSINEWAAQTTDGK
4  AVLSAEQLRDEEVHAGLGELLR  AVLSAEQLRDEEVHAGLGELL


I would like to calculate please the number of peptides that are present in both categories and those that are not.

I have tried to use the Venn count function from limma but that only accepts numerical values:

a <- vennCounts(c3)
a
hw hm hr Counts
[1,]  0  0  0    113
[2,]  0  0  1     18
[3,]  0  1  0      8
[4,]  0  1  1      8
[5,]  1  0  0     12
[6,]  1  0  1      8
[7,]  1  1  0     11
[8,]  1  1  1     22


How I can convert my peptide dataset like that dataset above so that I can make a Venn diagram. I have researched everywhere I can but still failed to find the solution.

I would really appreciate it if someone could help me solve this problem.

Many Thanks,

Ishack

ven diagram peptide venn count r • 2.9k views
1
Entering edit mode
3.9 years ago
AK ★ 2.2k

Hi Ishack,

Try this:

df <-
data.frame(
TGEClass.known = c(
"GVVEVTHDLQK",
"SALQSINEWAAQTTDGK",
"AVLSAEQLRDEEVHAGLGELLR"
),
TGEClass.uknown = c(
"GVVEVTHDLQK",
"SALQSINEWAAQTTDGK",
"AVLSAEQLRDEEVHAGLGELL"
)
)

# Present in both TGEClass.known and TGEClass.uknown
length(intersect(df$TGEClass.known, df$TGEClass.uknown))

# TGEClass.known only
length(setdiff(df$TGEClass.known, df$TGEClass.uknown))

# TGEClass.uknown only
length(setdiff(df$TGEClass.uknown, df$TGEClass.known))

0
Entering edit mode

Hi SMK, Thanks very much for your answer but how can I get a table like this automatically, it is quite long to do it manually?

hw hm hr Counts
[1,]  0  0  0    113
[2,]  0  0  1     18
[3,]  0  1  0      8
[4,]  0  1  1      8
[5,]  1  0  0     12
[6,]  1  0  1      8
[7,]  1  1  0     11
[8,]  1  1  1     22

0
Entering edit mode

What are hw, hm, and hr?

0
Entering edit mode

Sorry those are meant to say TGEClass.uknown and TGEClass known. Please ignore the hw, hm and hr, I want table like that for TGEClass known and TGEClass unknown

0
Entering edit mode

Perhaps:

> df.venn <- data.frame(
+   TGEClass.known = c(1, 1, 0),
+   TGEClass.unknown = c(1, 0, 1),
+   Counts = c(length(
+     intersect(df$TGEClass.known, df$TGEClass.uknown)
+   ), length(
+     setdiff(df$TGEClass.known, df$TGEClass.uknown)
+   ), length(
+     setdiff(df$TGEClass.uknown, df$TGEClass.known)
+   ))
+ )
> df.venn
TGEClass.known TGEClass.unknown Counts
1              1                1      3
2              1                0      1
3              0                1      1
> as.matrix(df.venn)
TGEClass.known TGEClass.unknown Counts
[1,]              1                1      3
[2,]              1                0      1
[3,]              0                1      1

0
Entering edit mode

Hi SMK thanks a lot thats what was look for. Just one final question if you don't mind.

I have a lot of data frames like the one above but each one has a different number of categories and also different categories, would it be possible to intersect and setdif between all the different columns automatically?

0
Entering edit mode

Got an idea from the function: venn, here demonstrating 2 sets and 3 sets:

> library(gplots)
> # Two sets
> df1 <-
+   data.frame(
+     TGEClass.known = c(
+       "GVVEVTHDLQK",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     ),
+     TGEClass.uknown = c(
+       "GVVEVTHDLQK",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELL"
+     )
+   )
> venn.tab1 <- venn(as.list(df1), show.plot = FALSE)
> attr(venn.tab1, "intersections") <- NULL
> attr(venn.tab1, "class") <- NULL
> print(venn.tab1)
num TGEClass.known TGEClass.uknown
00   0              0               0
01   1              0               1
10   1              1               0
11   3              1               1
> # Three sets
> df2 <-
+   data.frame(
+     TGEClass.set1 = c(
+       "GVVEVTHDLQK",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     ),
+     TGEClass.set2 = c(
+       "GVVEVTHDLQK",
+       "SALQSINEWAAQTTDGK",
+       "AVLSAEQLRDEEVHAGLGELL"
+     ),
+     TGEClass.set3 = c(
+       "GVVEVTHDLQK",
+       "SALQSINEWAAQTTDGKK",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     )
+   )
> venn.tab2 <- venn(as.list(df2), show.plot = FALSE)
> attr(venn.tab2, "intersections") <- NULL
> attr(venn.tab2, "class") <- NULL
> print(venn.tab2)
num TGEClass.set1 TGEClass.set2 TGEClass.set3
000   0             0             0             0
001   1             0             0             1
010   1             0             1             0
011   0             0             1             1
100   0             1             0             0
101   1             1             0             1
110   1             1             1             0
111   2             1             1             1

0
Entering edit mode

Hi SMK, Unfortunately, I found just now that I can't do a Venn diagram for more than 5 categories.

Can you help me create a df that looks like this please?

TGE-Class     Count
T1              1
T2              1
Both            6


Thanks very much

1
Entering edit mode
> library(gplots)
> df <-
+   data.frame(
+     T1 = c(
+       "GVVEVTHDLQK",
+       "SALQSINEWAAQTTDGK",
+       "SALQSINEWAAQTTDGLL",
+       "SALQSINEWAAQTTDGTT",
+       "SALQSINEWAAQTTDGQQ",
+       "AVLSAEQLRDEEVHAGLGELLR"
+     ),
+     T2 = c(
+       "GVVEVTHDLQK",
+       "SALQSINEWAAQTTDGK",
+       "SALQSINEWAAQTTDGLL",
+       "SALQSINEWAAQTTDGTT",
+       "SALQSINEWAAQTTDGQQ",
+       "AVLSAEQLRDEEVHAGLGELL"
+     )
+   )
> venn.tab <- venn(as.list(df), show.plot = FALSE)
> t(t(unlist(lapply(attr(venn.tab, "intersections"), length))))
[,1]
T1       1
T2       1
T1:T2    6

0
Entering edit mode

Hi SMK,

Thanks very much for your quick response, I have been trying all day to fix this. You are a life saver!

0
Entering edit mode

Hi SMK, sorry for the lateness, is there a way to see the number of unique peptides from each category when there are blanks in columns, please?

the length code sees the blank cells as unique peptides, unfortunately.

0
Entering edit mode

Hi ishackm,

You can remove the empty element in list before you use venn:

l <- as.list(df)
l <- lapply(l, function(x) { x[!x == ""] })
venn.tab <- venn(l, show.plot = FALSE)

0
Entering edit mode

Hi SMK , thank you again for your quick response. Much Appreciated.

0
Entering edit mode

Cool, glad it helps!

1
Entering edit mode
3.9 years ago
zx8754 11k

Convert to TRUE/FALSE, then use limma venn counts:

# example data
df <-data.frame(
TGEClass.known = c(
"GVVEVTHDLQK",
"SALQSINEWAAQTTDGK",
"AVLSAEQLRDEEVHAGLGELLR"
),
TGEClass.uknown = c(
"GVVEVTHDLQK",
"SALQSINEWAAQTTDGK",
"AVLSAEQLRDEEVHAGLGELL"
), stringsAsFactors = FALSE
)

library(data.table)

x <- dcast(cbind(stack(as.list(df)), x = TRUE),
values ~ ind,
value.var = "x",
fill = FALSE)[, -1]

limma::vennCounts(x)
#   TGEClass.known TGEClass.uknown Counts
# 1              0               0      0
# 2              0               1      1
# 3              1               0      1
# 4              1               1      3

limma::vennDiagram(x)

0
Entering edit mode

Hi, I ran the code you gave me but it is giving me an error:

    df = read.csv("FN1.csv")
FN1 = as.vector(df)

library(data.table)

x <- dcast(cbind(stack(as.list(FN1)), x = TRUE),
values ~ ind,
value.var = "x",
fill = FALSE)[, -1]
limma::vennCounts

(x)

Error in stack.default(as.list(FN1)) :
at least one vector element is required


What im I doing wrong here please?

0
Entering edit mode

You need to share your example CSV: FN1.csv, so that we can reproduce the problem.

0
Entering edit mode

Sorry for the late reply,

this is the csv I am using:

T2  T3
QHDMGHMMR   QHDMGHMMR
RPGGEPSPEGTTGQSYNQYSQR  RPGGEPSPEGTTGQSYNQYSQR
KTDELPQLVTLPHPNLHGPEILDVPSTVQK  KTDELPQLVTLPHPNLHGPEILDVPSTVQK
HRPRPYPPNVGEEIQIGHIPR   HRPRPYPPNVGEEIQIGHIPR
QHDMGHMMR   QHDMGHMMR
DQCIVDDITYNVNDTFHK  DQCIVDDITYNVNDTFHK
YYRITYGETGGNSPVQEFTVPGSK    YYRITYGETGGNSPVQEFTVPGSK


The code:

test = read.csv("test.csv", stringsAsFactors = FALSE)

library(gplots)
# example data

library(data.table)

x <- dcast(cbind(stack(as.list(df2)), x = TRUE),
values ~ ind,
value.var = "x",
fill = FALSE)[, -1]

limma::vennCounts(x)
limma::vennDiagram(x)


The error:

Aggregation function missing: defaulting to length
Error in vapply(indices, fun, .default) : values must be type 'logical',
but FUN(X[[1]]) result is type 'integer'


How can I fix this please?

0
Entering edit mode

Yes, as the your columns overlap fully TRUE/FALSE is not working, replace TRUE/FALSE with 1/0 in dcast, see below example:

# example data
df <-read.table(text = "
T2  T3
QHDMGHMMR   QHDMGHMMR
RPGGEPSPEGTTGQSYNQYSQR  RPGGEPSPEGTTGQSYNQYSQR
KTDELPQLVTLPHPNLHGPEILDVPSTVQK  KTDELPQLVTLPHPNLHGPEILDVPSTVQK
HRPRPYPPNVGEEIQIGHIPR   HRPRPYPPNVGEEIQIGHIPR
QHDMGHMMR   QHDMGHMMR
DQCIVDDITYNVNDTFHK  DQCIVDDITYNVNDTFHK
YYRITYGETGGNSPVQEFTVPGSK    YYRITYGETGGNSPVQEFTVPGSK", stringsAsFactors = FALSE, header = TRUE)

library(data.table)

x <- dcast(cbind(stack(as.list(df)), x = 1),
values ~ ind,
value.var = "x",
fill = 0)[, -1]

limma::vennCounts(x)

#   T2 T3 Counts
# 1  0  0      0
# 2  0  1      0
# 3  1  0      0
# 4  1  1      6
# attr(,"class")
# [1] "VennCounts"

0
Entering edit mode

Thanks very much for your quick response

0
Entering edit mode

Hi, Unfortunately, I found just now that I can't do a Venn diagram for more than 5 categories.

Can you help me create a df that looks like this please?

TGE-Class     Count
T1              1
T2              1
Both            6


Thanks very much

0
Entering edit mode
3.9 years ago

if you are looking for exact mactches (so no peptide can be subset of another) you can use your lists as such as input for DrawVenn . It's an online tool for drawing venn diagrams

0
Entering edit mode

Thanks to all for their help and support. This is exactly what was looking for