Whole genome pie chart
4
0
Entering edit mode
6.3 years ago
nash.claire ▴ 460

Hi everyone

I have a really easy one for you today and it's annoying me I haven't found the answer myself yet.

My PI would like me to create a pie chart of types of genomic locations that occur in the whole genome. For example, what percentage of the whole genome is intronic, exonic, intergenic, a 5'UTR etc etc. I'm wondering which file I would use to create this and what tool? I'm thinking some sort of bed file of the whole genome to then annotate with Homer but I'm not sure exactly which file and format to go with. I have to do the hg19 UCSC genome as well as the newest rat Rnor6.0 ensembl genome.

genome sequence • 2.3k views
ADD COMMENT
1
Entering edit mode

I'd download the GTF and use GenomicFeatures in R, but that's me.

ADD REPLY
0
Entering edit mode

no need to do that, just install the Homo.sapiens package from bioConductor. It's the same data.

ADD REPLY
4
Entering edit mode
6.3 years ago
> library(dplyr)
> library(Homo.sapiens)

# Get the Human TxDb object, and restrict it to standard chromosomes (no random or Un chromosomes)    
> Tx.human = TxDb.Hsapiens.UCSC.hg19.knownGene
> keepStandardChromosomes(Tx.human)

# Total number of bases in the human genome. 
> tot.wholegenome = sum(as.numeric(seqlengths(exons(Tx.human))))
[1] 3095693983

# Total bases covered by exons
> tot.exons = exons(Tx.human) %>% 
    reduce %>%    # merge overlapping exons to avoid double-counting
    width %>%     # get width of each exon
    sum
[1] 85928932

Now you have both the total number of bases in the genome, and the bases covered by exons. You can plot it with your library of preference (e.g. ggplot2)

To get introns, intergenic regions, etc.. just use the genes(), cds(), and other TxDb functions, and intersect them.

ADD COMMENT
0
Entering edit mode
6.3 years ago
cbio ▴ 450

I'm just going to throw out an easy way to do this using the ChIPseeker R package from Bioconductor. You would first annotate your peaks, and then use the annoPie function to achieve your desired results automatically.

EDIT: Also mis-read the question. Whoops.

ADD COMMENT
0
Entering edit mode
6.3 years ago
nash.claire ▴ 460

Thanks guys. I'm going to give Giovanni's R based solution a try later. I'm not that familiar with R but it looks easy and my colleague is going to help me. I'll let you know how it goes.

ADD COMMENT

Login before adding your answer.

Traffic: 1630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6