Question

Calculate Gene Density Per Kb And Plot Density Over Position For All Scaffolds Of A Draft Genome Using R

1

Entering edit mode

12.2 years ago

Saima ▴ 10

I am trying to calculate gene density per 1kb for all scaffolds of a draft genome with IRanges package in R. Based on related posts in biostar and other forums, I have come up with this so far:

test_scaffolds.bed :
scaffold_1 0 2500
scaffold_2 0 6001
test_genes.bed:
scaffold_1 5 900 geneA . -
scaffold_1 950 1850 geneB . -
scaffold_1 1100 2600 geneC . + scaffold_2 200 3200 geneD . +

library(IRanges)
library(rtracklayer)
genes<import("test_genes.bed",asRangedData=FALSE)
s<-import("test_scaffolds.bed",asRangedData=FALSE)#bed file with scaffold names, start and end cooordinates 
seqlengths(s)<-width(s)
bins <- IRangesList(lapply(seqlengths(s),
                           function(seqlen)
                             IRanges(breakInChunks(seqlen,1000))))

I am not sure how to get no. of each genes per each bin after this, I tried countOverlaps(bins,genes), which gives me

scaffold_1 scaffold_2 2 2

which is not what I wanted. I will really appreciate any suggestions about improving the code to get the desired plot, and also about the best way to calculate the gene density in this case. Thanks!

r gene • 15k views

ADD COMMENT • link updated 10.2 years ago by Omics data mining ▴ 260 • written 12.2 years ago by Saima ▴ 10

score 23 · Answer 1 · 2013-04-23

23

Entering edit mode

12.2 years ago

Irsan ★ 7.8k

Make your life easy and learn to use the popular ggplot2 R-package. You only need a flat text file with column1 = chromosome name and column2 = start position of the gene. Then do this:

# check if ggplot2 is installed, if so, load it, 
# if not, install and load it
if("ggplot2" %in% rownames(installed.packages())){
    library(ggplot2)
} else {
    install.packages("ggplot2")
    library(ggplot2)
}

# import a text file with gene positions
# columns should be: chr, position (no end or gene name required)
genes <- read.table("genes.txt",sep="\t",header=T)

# make sure the chromosomes are ordered in the way you want
# them to appear in the plot
genes$chr <- with(genes, factor(chr, levels=paste("chr",c(1:22,"X","Y"),sep=""), ordered=TRUE))

# make a density plot of genes over the provided chromosomes (or scaffolds ...)
plottedGenes <- ggplot(genes) + geom_histogram(aes(x=pos),binwidth=1000000) + facet_wrap(~chr,ncol=2) + ggtitle("RefSeq genes density over human genome 19") + xlab("Genomic position (bins 1 Mb)") + ylab("Number of genes")

# save it to an image
png("genes.png",width=1000,height=1500)
print(plottedGenes)
dev.off()

See resulting image here (can anyone tell me why the attach image button in the rich text editor does not work for me? What can I do? Paste Images In Questions And Answers)

ADD COMMENT • link 12.2 years ago by Irsan ★ 7.8k

0

Entering edit mode

That's a pretty cool approach, thanks Irsan. Since, you have the gene density plotted, how do you manage to get (extract) the plotted data into a dataframe? I mean, I would like to save the values for each bin.

ADD REPLY • link 12.2 years ago by Bioch'Ti ★ 1.1k

3

Entering edit mode

The ggplot()-function performs a statistical transformation (in this case binning) on the provided data-frame internally but does not save the transformed data. When you still want to get it out you can do this

allPlottingData <- ggplot_build(plottedGenes)
df <- allPlottingData[["data"]][[1]]

But I am not sure if this is what you are looking for. If you have problems with binning genomic data you can also put your data in BED-format and use a combination of bedtools makewindows and bedtools intersect to calculate the amount of genes on certain genomic intervals

ADD REPLY • link 12.2 years ago by Irsan ★ 7.8k

0

Entering edit mode

Thanks for your help, that's exactly what I was looking for ! This library is definitely a must !

ADD REPLY • link 12.2 years ago by Bioch'Ti ★ 1.1k

0

Entering edit mode

Thanks a lot for the ggplot solution, it is much better and easier than what I was attempting, and its great that I can extract the plotted data!

ADD REPLY • link 12.2 years ago by Saima ▴ 10