Question: Circos or circlize plot for overlapping values/colors
0
gravatar for User 7754
4.4 years ago by
User 7754230
United Kingdom
User 7754230 wrote:

Hi,

I have a file with many genes across the genome, and each with a different color depending on whether a variant within the gene has been associated with a phenotype. I would like to create a plot using circos or circlize representing stacked layers where the genes overlap, with colors assigned based on the phenotype (but if the gene is associated with only one phenotype then the layer will only be one, so not stacked). The purpose of this is to immediately visualise which genes are associated with multiple phenotypes (from the stacking), and which phenotypes are associated with the genes. The colors will indicate whether a gene has associations with one type of phenotype (e.g. cancer) or the other (e.g. diabetes). I was thinking of using the "tiles" plot in circos, where the tiles are color-coded. Is there an option to color-code the tiles based on another value? I have also tried with 'highlights' and with 'heatmap' (using the phenotype colors as the factor levels) but I don't think this is the way to go because I cannot see the overlaps if I use these plots, which is what I am mostly interested in.

If I use circlize, I am trying to plot overlapping regions like the plot here:
http://jokergoo.github.io/circlize/example/gene_model.html
But using different colors already specified in the data file. 
If there is a way, could you please direct me to the right function? 

This is an example of the data file in R:

df = structure(list(Chr = c("chr1", "chr1", "chr1", "chr1", "chr1", 
"chr2", "chr2", "chr2", "chr3", "chr3", "chr4", "chr4", "chr6", 
"chr6", "chr6", "chr7", "chr7", "chr7", "chr8", "chr8", "chr9", 
"chr9", "chr10", "chr11", "chr12", "chr13", "chr13", "chr19", 
"chr19", "chr20", "chr21", "chr22"), pos.start = c(10678425L, 
159391160L, 109318306L, 154509258L, 229805966L, 26989551L, 202937054L, 
16209774L, 142169092L, 8925911L, 113873068L, 78144140L, 29882328L, 
31321038L, 2754229L, 91908370L, 149706362L, 4754575L, 105108497L, 
81375712L, 107169073L, 95049590L, 117466805L, 125738394L, 123893076L, 
73886275L, 29029377L, 48616438L, 48616760L, 16070165L, 18529136L, 
19608500L), pos.end = c(11678425L, 160391160L, 110318306L, 155509258L, 
230805966L, 27989551L, 203937054L, 17209774L, 143169092L, 9925911L, 
114873068L, 79144140L, 30882328L, 32321038L, 3754229L, 92908370L, 
150706362L, 5754575L, 106108497L, 82375712L, 108169073L, 96049590L, 
118466805L, 126738394L, 124893076L, 74886275L, 30029377L, 49616438L, 
49616760L, 17070165L, 19529136L, 20608500L), Gene = c("ANGPTL7", 
"CCDC19", "CELSR2", "DCST1", "GALNT2", "ATRAID", "BMPR2", "FAM49A", 
"PAQR9", "THUMPD3-AS1", "CAMK2D", "CNOT6L", "ABCF1", "RDBP", 
"SLC22A23", "CDK6", "GIMAP7", "WIPI2", "LRP12", "ZNF704", "ABCA1", 
"ASPN", "GFRA1", "ST3GAL4", "CCDC92", "KLF12", "MTUS2", "CA11", 
"SPHK2", "KIF16B", "BTG3", "TRMT2A"), color = c("moccasin", "navy", 
"moccasin", "yellow", "moccasin", "moccasin", "yellow", "cyan", 
"yellow", "green", "goldenrod4", "magenta", "navy", "moccasin", 
"moccasin", "yellow", "moccasin", "yellow", "moccasin", "yellow", 
"moccasin", "cyan", "navy", "moccasin", "navy", "moccasin", "yellow", 
"moccasin", "moccasin", "moccasin", "cyan", "moccasin")), .Names = c("Chr", 
"pos.start", "pos.end", "Gene.name", "color"), row.names = c(917L, 
953L, 956L, 1005L, 1087L, 1997L, 2003L, 2077L, 2534L, 2560L, 
2937L, 2956L, 3495L, 5182L, 4625L, 6612L, 6642L, 6491L, 7060L, 
7124L, 7487L, 7501L, 7991L, 8468L, 8897L, 9424L, 9471L, 11476L, 
11226L, 11786L, 12117L, 12279L), class = "data.frame")

The part of the configuration file for the plot in circos is this:

 <plots>
<plot>
      type            = tile 
      file        = data/data1.txt
      r0   = 0.98r
      r1   = conf(.,r0)+0.03r
      orientation = center
      layers      = 24
      margin      = 0.02u
      thickness   = 24
      padding     = 8
      stroke_thickness = 0.001
      stroke_color     = vlgrey
</plot>
</plots>

However, I can't get the colors in the tiles plot to show up correctly: there are some colors that do not come up (maybe because they overlap with too many others? is there a way to prioritise which color needs to be plotted first?), and I have black lines while I do not have a color 'black' for any of the phenotypes (again maybe because the lines are areas where there are too many overlaps?). I have tried adjusting the layers, and the stroke_thickness, but these black lines remain and the correct colors sill do not show. I am attaching the plot I am getting now. 

I really appreciate any suggestions!

Thank you in advance for your help!

Fra

circlize circos • 2.5k views
ADD COMMENTlink modified 3.6 years ago by Biostar ♦♦ 20 • written 4.4 years ago by User 7754230

Pgibas kindly suggested to use circlize in R. Can I get something like this 

http://jokergoo.github.io/circlize/example/gene_model.html

I did this so far, but it gives me an error ("Error in n - i : non-numeric argument to binary operator")

circos.genomicTrackPlotRegion(ylim = c(0.5, n + 0.5), panel.fun = function(region, value, ...) {
    gi = get.cell.meta.data("sector.index")
    tr = data.1$Gene[data.1$Chr == gi]
    for(i in tr) {
        region = data.frame(data.1$pos.start[all.hm$Gene==tr], data.1$pos.end[data.1$Gene.name==tr])
        circos.lines(c(min(data.1$pos.start[all.hm$Gene==tr]), max(data.1$pos.end[data.1$Gene.name==tr])), c(n-i, n-i), col = data.1$color[data.1$Gene==tr])
    }
}, bg.border = NA, track.height = 0.3)

 

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by User 7754230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1260 users visited in the last hour