Question: genome coverage visualization
0
gravatar for gubrins
9 weeks ago by
gubrins60
gubrins60 wrote:

Heys,

I'm working with whole-genome sequencing data and I already did the alignment and I have my bam files. I calculated the coverage per genome position of those bam files with bedtools:

bedtools genomecov -ibam file.bam -d > coverage.txt

My problem is that coverage.txt is 50GB. I tried to visualize this with R in the cluster but it runs out of memory. So my doubts: do you know how could I reduce the size of this file? Maybe there is a better way to calculate and visualize genome coverage?

Any help would be appreciated!

coverage bam bedtools genome • 232 views
ADD COMMENTlink modified 9 weeks ago by bernatgel2.8k • written 9 weeks ago by gubrins60
1

You can try to divide and compute mean coverage for regions instead of trying to visualize individual positions. You can find some other ideas here.

ADD REPLYlink written 9 weeks ago by newbio17320

Thanks for your answer, the problem is that I don't have identified regions yet, so it would be problematic

ADD REPLYlink written 8 weeks ago by gubrins60
4
gravatar for bernatgel
9 weeks ago by
bernatgel2.8k
Barcelona, Spain
bernatgel2.8k wrote:

Hi Gabri,

You should be able to do that with karyoploteR.

If you want a full genome view of the coverage, I'd recommend you to plot a binned version of the coverage instead of a per base value since you won't have enough pixels in your image for that kind of resolution anyway. With karyoploteR I'd use the kpPlotBAMDensity function, that will take a BAM file directly. compute the read density with the window sizes you specify (defaults to 1 megabase windows), so 3000 data points for a human genome, and plot them on the karyotype. For example, to plot the coverage in a full genome with default 1Mb windows you can do something like

library(karyoploteR)

kp <- plotKaryotype(genome="hg38")
kpPlotBAMDensity(kp, data="yourBAMfile.bam")

and that's it. It will take a while (that'a lot of data to process) but it will do it sequentially and never load the whole dataset into memory, so eventually it will end and generate an image like this:

WGS full genome coverage plot directly from the BAM file

You can also restrict the plot to a single chromosome or zoom in to a small region in the genome. or add other data onto it. You can find more information in the karyoploteR tutorial although kpPlotBAMDensity is still missing from it.

ADD COMMENTlink written 9 weeks ago by bernatgel2.8k

Thank you very much Bernat for your complete answer, your package is awesome! The problem for me is that I'm not working with human data, and my reference genome has a huge amount of contigs, so visualizing like this would be problematic. For this reason I was trying to get a simple plot with coverage on X and number of bases on Y. I already did that for individual contigs, but I have too many. Any ideas of how could I plot it? I tried in a computer with 100GB of RAM but didn't made it... Thanks!

ADD REPLYlink written 8 weeks ago by gubrins60
2

Hi Gabri, karyoploteR can work with any genome, or even a collection of contigs, if needed. I agree that this representation does not work for a large number of chromosomes/contigs, but it can plot them one next to the other in a horizontal line. To do that, you'll need to set the plot.type=4 in plotKaryotype.

Since computing the density from BAM files is sequential, it should not take a huge amount of RAM.

As an example, I created a fake genome with 1000 contings oz sizes between 10Kb and 100Kb.

library(karyoploteR)
cont.length <- sort(runif(n = 1000, min = 10000, max=100000), decreasing = TRUE)
gg <- toGRanges(data.frame(paste0("c", 1:1000), rep(1, 1000), cont.length))

and we can plot them with plot.type=4. We should adjust a few plotting parameters to get what we want.

pp <- getDefaultPlotParams(plot.type = 4)
pp$ideogramlateralmargin <- 0
pp$leftmargin <- 0.07

kp <- plotKaryotype(genome = gg,  plot.type=4, ideogram.plotter = NULL, labels.plotter = NULL, plot.params = pp)
kpAddCytobandsAsLine(kp)
kpDataBackground(kp)
kpText(kp, chr="c350", x=1, y=0.5, labels = "Your data goes here")

karyoplot with many small contigs and no data

and we can plot the density on top of it (in this case the density of a set of random regions)

rr <- createRandomRegions(nregions = 30000, length.mean = 5000, length.sd = 2000, genome=gg, non.overlapping = FALSE)

kp <- plotKaryotype(genome = gg,  plot.type=4, ideogram.plotter = NULL, labels.plotter = NULL, plot.params = pp)
kpAddCytobandsAsLine(kp)
kpAddLabels(kp, "Contigs", data.panel = "ideogram")
kpDataBackground(kp)
kpAddLabels(kp, "Data Density", srt=90)
kpPlotDensity(kp, data=rr, col="#BB22FF", window.size = 1000, border=NA)

karyoplot with many contigs and random data

and you can then show a handful of contigs if needed for better analysis

kp <- plotKaryotype(genome = gg, chromosomes=c("c1", "c2", "c3", "c10"), plot.type=4, ideogram.plotter = NULL, labels.plotter = NULL, plot.params = pp)
kpAddCytobandsAsLine(kp)
kpAddChromosomeNames(kp, srt=45)
kpAddLabels(kp, "Contigs", data.panel = "ideogram")
kpDataBackground(kp)
kpAddLabels(kp, "Data Density", srt=90)
kpPlotDensity(kp, data=rr, col="#BB22FF", window.size = 1000, border=NA)
kpAddChromosomeSeparators(kp, col="blue")

karyoplot with only 4 contigs and the density of random data

ADD REPLYlink written 8 weeks ago by bernatgel2.8k
1

Thanks again for your great answer! Although is not exactly what I need right now, I'm sure I'll use it in the future for checking the coverage of specific regions of my interest! Really interesting package, thanks again!

ADD REPLYlink written 8 weeks ago by gubrins60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1714 users visited in the last hour
_