Tutorial: Creating a karyotype plot with genes using karyoploteR
9
gravatar for bernatgel
6 months ago by
bernatgel1.2k
Barcelona, Spain
bernatgel1.2k wrote:

Note: This post was an answer to another Tutorial doing the same with ggplot. Moved here as suggested by others.

I had to create whole genome plots with data on them more than than a few times, and it's not easy to get it right! So I finally decided to create a package to make my life easier.

The package is called karyoploteR. It's in Bioconductor so you can find it in the package landing page where you'll find the vignette and other documentation. If you want more extensive documentation and a few more complex examples you can go to karyoploteR's examples and tutorial page. It uses R base graphics instead of ggplot so the way of building the plots is a bit different, but it shouldn't be a problem.

Since most of the code is encapsulated in the package functions you can skip most of the code and data you needed but I think it's almost as customizable as plots created from scratch.

To reproduce your example we'll only need the CNVkit data (in this case converted to a GenomicRanges with toGRanges):

sample_cns <- toGRanges(
                data.frame(
                  chromosome = c("chr5", "chr19", "chr5", "chr9", "chr1", "chr11", "chr4", "chr17", "chr11", "chr1", "chr17", "chr5", "chr17", "chr17", "chr19", "chr1", "chr1", "chr22", "chr22", "chr17",  "chr10", "chr11", "chr19", "chr2", "chr5"),
                  start = c(131893016L, 4183350L, 112043414L, 15465517L, 27022894L, 108098351L, 13571634L, 41197694L, 108180886L, 6166339L, 48262862L, 112162804L, 78326739L, 11501815L, 2164183L, 97544531L, 97564044L, 41489008L, 41572250L, 37844086L, 123239094L, 118307227L, 36208920L, 140990754L, 56111400L), 
                  end = c(131978056L, 4224502L, 112179823L, 15465578L, 27107247L, 108236235L, 13629211L, 41276113L, 108236235L, 6240083L, 48278874L, 112179823L, 78367298L, 11872844L, 2229791L, 98386478L, 97771853L, 41574960L, 41574960L, 37884297L, 123353331L, 118392887L, 36229458L, 142888298L, 56189507L),
                  cn = c(1L, 1L, 1L, 7L, 1L, 1L, 3L, 3L, 1L, 1L, 3L, 1L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 1L, 1L, 4L, 1L, 1L),
                  gene = c("AC116366.7", "ANKRD24", "APC", "SNAPC3", "ARID1A", "ATM", "BOD1L1", "BRCA1", "C11orf65", "CHD5", "COL1A1", "CTC-554D6.1", "CTD-2047H16.4", "DNAH9", "DOT1L", "DPYD", "DPYD-AS1", "EP300", "EP300-AS1", "ERBB2", "FGFR2", "KMT2A", "KMT2B", "LRP1B", "MAP3K1"), stringsAsFactors=FALSE))
seqlevels(sample_cns) <- paste0("chr", c(1:22, "X", "Y"))
sample_cns <- sort(sample_cns)

We don't need chromosome lengths or centromere positions, since it's included in the package for human (and a bunch of other organisms), but you can use it to plot custom genomes if you need it, for example for non-model organisms of with unfinished genomes.

To create the plot, we first create an empty plot with plotKaryotypeand after that we just add data and non-data graphical elements such as base numbers, etc... In this case, we would need only four lines of code (the last one a bit long to show some of the customization options).

kp <- plotKaryotype()
kpAddBaseNumbers(kp)
cn_col <- ifelse(sample_cns$cn>2, "red", "blue")
kpPlotMarkers(kp, data=sample_cns, labels=sample_cns$gene, label.color = cn_col, text.orientation = "horizontal", r1=0.7, cex=0.8, marker.parts = c(0.2, 0.7, 0.1))

And would get something like this

Plot of the human karyotype with gained and lost genes plotted

As you can see the cytobands are automatically added (this can be deactivated too) and we don't need to worry about basic chromosome data (length, names, etc...).

Once we have this we can change if in many different ways. For example, we can decide to plot the gains above the ideograms and the losses below them just changing a single parameter (plot.type=2 to get two plotting areas) and adding a second call to kpPlotMarkers.

kp <- plotKaryotype(plot.type=2)
kpAddBaseNumbers(kp)
gains <- sample_cns[sample_cns$cn>2]
losses <- sample_cns[sample_cns$cn<2]
kpPlotMarkers(kp, data=gains, labels=gains$gene, label.color = "red", text.orientation = "horizontal", r1=0.8, cex=0.8, marker.parts = c(0.2, 0.7, 0.1), data.panel=1)
kpPlotMarkers(kp, data=losses, labels=losses$gene, label.color = "blue", text.orientation = "horizontal", r1=0.8, cex=0.8, marker.parts = c(0.2, 0.7, 0.1), data.panel=2)

Human karyotype with gained genes above the ideogram and lost genes below it

There are many other options and customizations possible, and it's actually capable of plotting almost anything on any genome.

ADD COMMENTlink modified 6 days ago • written 6 months ago by bernatgel1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 633 users visited in the last hour