Question: Calculate Gene Density Per Kb And Plot Density Over Position For All Scaffolds Of A Draft Genome Using R
gravatar for Saima
6.4 years ago by
United States
Saima10 wrote:

I am trying to calculate gene density per 1kb for all scaffolds of a draft genome with IRanges package in R. Based on related posts in biostar and other forums, I have come up with this so far:

test_scaffolds.bed :
scaffold_1 0 2500
scaffold_2 0 6001
scaffold_1 5 900 geneA . -
scaffold_1 950 1850 geneB . -
scaffold_1 1100 2600 geneC . + scaffold_2 200 3200 geneD . +

s<-import("test_scaffolds.bed",asRangedData=FALSE)#bed file with scaffold names, start and end cooordinates 
bins <- IRangesList(lapply(seqlengths(s),

I am not sure how to get no. of each genes per each bin after this, I tried countOverlaps(bins,genes), which gives me

scaffold_1 scaffold_2 2 2

which is not what I wanted. I will really appreciate any suggestions about improving the code to get the desired plot, and also about the best way to calculate the gene density in this case. Thanks!

gene R • 9.2k views
ADD COMMENTlink modified 4.4 years ago by archie90 • written 6.4 years ago by Saima10
gravatar for Irsan
6.4 years ago by
Irsan7.0k wrote:

Make your life easy and learn to use the popular ggplot2 R-package. You only need a flat text file with column1 = chromosome name and column2 = start position of the gene. Then do this:

# check if ggplot2 is installed, if so, load it, 
# if not, install and load it
if("ggplot2" %in% rownames(installed.packages())){
} else {

# import a text file with gene positions
# columns should be: chr, position (no end or gene name required)
genes <- read.table("genes.txt",sep="\t",header=T)

# make sure the chromosomes are ordered in the way you want
# them to appear in the plot
genes$chr <- with(genes, factor(chr, levels=paste("chr",c(1:22,"X","Y"),sep=""), ordered=TRUE))

# make a density plot of genes over the provided chromosomes (or scaffolds ...)
plottedGenes <- ggplot(genes) + geom_histogram(aes(x=pos),binwidth=1000000) + facet_wrap(~chr,ncol=2) + ggtitle("RefSeq genes density over human genome 19") + xlab("Genomic position (bins 1 Mb)") + ylab("Number of genes")

# save it to an image

See resulting image here (can anyone tell me why the attach image button in the rich text editor does not work for me? What can I do? Paste Images In Questions And Answers)

ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by Irsan7.0k

That's a pretty cool approach, thanks Irsan. Since, you have the gene density plotted, how do you manage to get (extract) the plotted data into a dataframe? I mean, I would like to save the values for each bin.

ADD REPLYlink written 6.4 years ago by Bioch'Ti1000

The ggplot()-function performs a statistical transformation (in this case binning) on the provided data-frame internally but does not save the transformed data. When you still want to get it out you can do this

allPlottingData <- ggplot_build(plottedGenes)
df <- allPlottingData[["data"]][[1]]

But I am not sure if this is what you are looking for. If you have problems with binning genomic data you can also put your data in BED-format and use a combination of bedtools makewindows and bedtools intersect to calculate the amount of genes on certain genomic intervals

ADD REPLYlink modified 6.4 years ago • written 6.4 years ago by Irsan7.0k

Thanks for your help, that's exactly what I was looking for ! This library is definitely a must !

ADD REPLYlink written 6.4 years ago by Bioch'Ti1000

Thanks a lot for the ggplot solution, it is much better and easier than what I was attempting, and its great that I can extract the plotted data!

ADD REPLYlink written 6.4 years ago by Saima10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1748 users visited in the last hour