Question: plotting interactions in R with two data sets
1
frymor10 wrote:

Hi all,

I have a data set of two postions on the genome with a third value for number of interactions. I would like to plot this data set so I can see how many interactions are on each position.

the data set looks like that (this is only a subset of the complete, very long list):

```partner1    partner2    Interactions 1    10001    11 1    15001    1 1    20001    1 1    25001    4 1    30001    8 5001    20001    1 5001    40001    3 5001    45001    15 5001    50001    1 10001    15001    3 10001    20001    3 10001    25001    6 10001    30001    12 15001    70001    2 15001    90001    6 15001    95001    5 15001    100001    1 20001    4195001    30 20001    4200001    62 20001    4205001    81 20001    4210001    3 25001    30001    5 25001    40001    22 25001    45001    13 4200001    4210001    318 4200001    4215001    2 4205001    4210001    308 4205001    4215001    2 4210001    4215001    1```

i would like to have the column 'partner1' on the x-axis, the column 'partner2' on the y-axis and the number of interactions (3rd column) in the plot with the option to have there either a point, the number itself of a colored gradient like in the heatmaps.

Does anyone know of an R package for creating such plots, or for that matter, any other way of doing it?

thanks

Assa

interactions scatterplot R genome • 3.4k views
modified 19 months ago by theobroma221.1k • written 5.1 years ago by frymor10
1

i think the best way to represent this sort of data would be with a heatmap. is there a directionality between partner one and partner 2? e.g. the values  "1    5000   8" are different from "5000    1    8" in your table

yes there is a difference. The information on the two partner columns are genomic positions. So it make a difference whether the first or the second partner is on a specific position. Doesn't it?

How would you put the data into a heatmap?

4
Irsan6.9k wrote:

There are many possibilities, one of them is using ggplot2 (R-library)

`library(ggplot2)`

`ggplot(data) + geom_tile(aes(x=factor(partner1),y=factor(partner2),fill=Interactions))` I have tried with ggplot.

```require(ggplot2) pl1 <-  ggplot(subset, aes(y = factor(partner1),  x = factor(partner2))) +  geom_tile(aes(fill = Interactions)) +  scale_fill_continuous(low = "blue", high = "green") + scale_size(range = c(1, 200))```

With the small subset I get a similar plot to the one you posted. But with the complete data set I get a different picture.

Is there a simple explanation for that? Does the order of the columns of the two partner columns make a difference?

thanks

1

`data\$partner1 <- factor(data\$partner1, levels=sort(unique(data\$partner1))) `

(and also for partner2) then plot without the factor() part

That still didn't change anything. I still get the plot on only half of the window. I can't figure why, as I have for both columns the same amount of factors (842 vs. 843).

is it possible to make the legend a bit more comprehensive? I won't to have more than just 5 different categories. I need a much bigger separation - something like 20 or 25 different color points.

4
dariober10k wrote:

## Dummy data

`dat<- data.frame(partner1= 1:100, partner1= 1:100, Interactions= 1:100)`

```ncols<- length(unique(dat\$Interactions)) cols<- data.frame(     colour= colorRampPalette(c("blue", "red"))(ncols),     Interactions= sort(unique(dat\$Interactions)), stringsAsFactors= FALSE)```

`dat<- merge(dat, cols)`

```## Unocmment to Make colour transparent, it might look better #trasp<- '80' #dat\$colour<- paste(dat\$colour, trasp, sep= '')```

```## Plot symbol plot(x= dat\$partner1, y= dat\$partner2, pch= 19, col= dat\$colour, cex= 2)```

```## As text plot(x= dat\$partner1, y= dat\$partner2, type= 'n') text(x= dat\$partner1, y= dat\$partner2, labels= dat\$Interactions, col= dat\$colour, cex= 0.5)```

Thanks I will give it a try...

1
t.candelli60 wrote:

i'm going to use the "pheatmap" package to draw a heatmap of your data. with the code below i generate a matrix from your dataframe so that it can be used as an argument for pheatmap.

`library(pheatmap)`

```names<-unique(c(data[,1], data[,2])) mat<-matrix(data=0, nrow=length(names), ncol=length(names)) rownames(mat)<-sort(names) colnames(mat)<-sort(names)```

```for (i in 1:nrow(data)) {   partner1 <- as.character(data[i,1])   partner2 <- as.character(data[i,2])   interactions <- data[i,3]      mat[partner1, partner2] <- interactions }```

`pheatmap(mat, cluster_cols=F,  cluster_rows=F) `

0
Manu Prestat3.9k wrote:

A good solution to such a problem is to draw a network representation where:

- partners are nodes

- column 3 is the thickness of the link

THE SOFT for that is Cytoscape

0
theobroma221.1k wrote:

I would use a circle plot and have the ribbon thickness represent the strength of the interaction.