Question: Construction Of Average Gene Profile
gravatar for Dataminer
7.0 years ago by
Dataminer2.7k wrote:


I want to construct a average gene profile of my ChIP-seq data (BAM file). What I am aiming at is to get the normalised length of all genes in refseq or ensembl and bin them in say 10 bins and construct the profile, this will assist to visualise if the binding is on TSS, Genebody, TES etc.

Is their already a ready made solution in offering ...... by someone who has done this before?

As an example figure 2C

Thank you

perl python chip-seq R • 4.7k views
ADD COMMENTlink modified 6.6 years ago by Biostar ♦♦ 20 • written 7.0 years ago by Dataminer2.7k
gravatar for Madelaine Gogol
7.0 years ago by
Madelaine Gogol5.1k
Kansas City
Madelaine Gogol5.1k wrote:

Not in the category of ready-made, but...

I have done this in R, but my code isn't really pretty. I'm sure there's better ways to do it in R (probably not involving slow for loops), but I'm still learning to use all these sequencing packages... Here's some crappy combination of R code and pseudocode as a start. Maybe someone else will be so annoyed by this that they post actual nice R code.

   #First, I don't actually know if I'm using all these packages, I just include them in case.

   #read in a bam file
   #summarize into a coverage Rle
   cvg <- coverage(temp)
   #may want to scale coverage here based on number of reads

   #then I loop through a bed file of genes pulling out the coverage values for each and sticking them in a data frame
   for each gene i
      #use approx to create the bins, in this case 20... and save it into some data frame...

Additionally you need to remember to use rev() on the coverage values for genes on the C/- strand... Also if you're adding bases up and downstream of a gene you need to check that you don't go past the start or end of the chromosome.

You may also take a look at the R package Repitools for a possibly more ready-made solution, but I haven't tried it...

ADD COMMENTlink written 7.0 years ago by Madelaine Gogol5.1k

@Madelaine: could you please provide the complete code that you used, thank you.

ADD REPLYlink written 7.0 years ago by Dataminer2.7k

Errr... This should be enough to understand how to basically set it up if you know R.

The code I summarized this from is doing a lot more involving summarizing up and downstream regions of genes in addition to the gene body, checking for overlapping genes, etc., so I feel it would be too confusing and messy to really be a big help in its entirety.

ADD REPLYlink written 7.0 years ago by Madelaine Gogol5.1k

I agree with Madelaine that it would probably clutter things up to add more. One comment though. She's using bioconductor so you would need to install that. If you are just starting R (and probably for a lot of people that know R but don't do bioinformatics with it), that might not be obvious. Here is how you install it, All the packages she's using are on the bioconductor website.

ADD REPLYlink written 7.0 years ago by KCC4.0k

I followed the link from seqanwser. actually I found ngsplot can do exactly the same thing you (I) want to do.

ADD REPLYlink written 6.7 years ago by Ming Tang2.5k

Looking at this code, I was wondering if it is sensitive to genes on the forward vs. the reverse strand. One needs to reverse the order of traversal of the gene based on the direction in which the gene is transcribed to characterize the TSS and TES correctly.

ADD REPLYlink written 6.6 years ago by KCC4.0k

Yep, that's why I said "Additionally you need to remember to use rev() on the coverage values for genes on the C/- strand... "

ADD REPLYlink written 6.4 years ago by Madelaine Gogol5.1k
gravatar for Istvan Albert
7.0 years ago by
Istvan Albert ♦♦ 82k
University Park, USA
Istvan Albert ♦♦ 82k wrote:

There are libraries to assist you in this, although each would probably need to be tuned to your problem set. For example in Python you can user the HTSeq package like so:

ADD COMMENTlink written 7.0 years ago by Istvan Albert ♦♦ 82k
gravatar for KCC
7.0 years ago by
Cambridge, MA
KCC4.0k wrote:

Have you looked at CEAS from the Liu Lab, ? It gives you information on how the profile of your signal is changing in the TSS, TTS and the Genebody. It can also do this for specific lists of genes.

Also, if you have a BED file then Cistrome (also from the Liu lab) can give you a profile of your reads going into the BED regions. It's as simple as entering your wig, your bed file and clicking for the algorithm to run. The output is a PDF file. I think you will have to sign up to use it though.

For both of these, you need to find a way to translate your bam into a wig I think.

ADD COMMENTlink written 7.0 years ago by KCC4.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1149 users visited in the last hour