Question

Pol II chip-seq data analysis

0

Entering edit mode

7.6 years ago

devi.dash ▴ 10

I Know the chip seq analysis . I have no idea on pol ii chip.

Simple thing I know from the paper that one can take ratio of the normalized read counts from Promoter region as well as Gene body region to calculate the Pausing Index.

But I have a doubt -

One question arises in my mind If a gene have multiple TSS how we are able to get a pausing Index of that gene? Should I go for H3K4me3 Chip-seq to get the TSS first and Go for Pausing calculation. I do not know it is a nice Idea or not?

Please Refer any Nice paper to calculate the pol II pausing.

Another question comes to my mind Which one is better to calculate Pausing index Pol II or GrO seq?

ChIP-Seq • 5.2k views

ADD COMMENT • link updated 7.2 years ago by xiaodli • 0 • written 7.6 years ago by devi.dash ▴ 10

0

Entering edit mode

I am new to his group. But I did some Pol II ChIP-seq and want to perform a Pol II traveling ratio/pausing index calculation and see how it turns out in my data. However, although I can use bowtie, MACS, samtools, my programming skill is too limited. I don't know how to write a program/script for calculating pausing index/traveling ration. I wonder whether anyone have a script for this purpose. Thanks a lot.

ADD REPLY • link 7.2 years ago by xiaodli • 0

0

Entering edit mode

I've never done this analysis before, but I think that computeMatrix from deepTools has an option for that

--unscaled5prime ("Number of bases at the 5-prime end of the region to exclude from scaling. By default, each region is scaled to a given length (see the –regionBodyLength option"). In some cases it is useful to look at unscaled signals around region boundaries, so this setting specifies the number of unscaled bases on the 5-prime end of each boundary

For more information http://deeptools.readthedocs.io/en/latest/content/tools/computeMatrix.html

ADD REPLY • link 7.2 years ago by Lila M ★ 1.2k

score 3 · Answer 1 · 2016-09-20

We are currently performing a pausing index analysis and based our promoter definition on the one described by this PAF1 paper.

However, as mentioned by Ryan Dale, many different labs use different definitions. In our case we used the -100 to +300 bp for the promoter but used +301 to end of transcript for the gene body. Our reasoning was that the size of the "promoter region" can vary based on a gene's size (i.e especially for long genes with isoforms that have multiple start sites, we observed that using +300 to +2kb window for the gene body sometimes covered a region made up entirely of alternative promoters - thus inflating the gene body coverage values).

As Devon Ryan mentioned it is tough to figure out the exact TSS being used (often multiple are being used). In our analysis, we used a combination of multiple polII initiation markers (mainly different pol2 ChIP-seqs) and defined the promoter with the MOST coverage as the active TSS. We then selected the isoform with that TSS as representative of the given gene. Obviously, there are some serious potential drawbacks but it is very difficult to quantify coverage in an active gene-body if it overlaps with another active TSS.

We now have PRO-Seq data and it is looking very good (correlates with polII positioning/abundance). Although the peaks are super sharp at promoters potentially suggesting that it might be smart to opt for a more refined promoter definition (~75 bp).

We have H3K4me3 data and the peak at the promoter is very broad. We are NOT using this to calculate pausing. As suggested by others a polII ChIP-seq is likely ideal. In our case we have the PRO-Seq data but IMO our Rbp3 ChIP-seq is a pretty close approximation (although some people have success with Ser2 ChIP-seq).

score 2 · Answer 2 · 2016-09-20

I'm not sure PolII ChIP would be cleaner. For example say you have these two transcripts for a gene:

|->
---------------- A
    |->
    ------------- B

If you calculate a pausing index per transcript, then any PolII in B will decrease the calculated pausing index for A. Maybe this means you should exclude any PolII reads in B (and the corresponding bp used to calculate density) from the calculation of A's pausing index? But then how good is your estimate if you're tossing most of the downstream info? And how would you calculate a pausing index for B if all of it overlaps A?

It seems like this will get tricky, and I'm not aware of any papers handling this. Or handling replicates robustly either. Plus everyone seems to disagree on what "promoter" and "downstream" should be. Anyway, here are some options for further reading:

promoter +/-250 bp; downstream +500 to end (PMID: 21074046)
promoter -50bp to +300; downstream +301 to 3kb after the transcript; transcripts selected by highest H3K4me3 occupancy (PMCID: PMC4893286)
promoter -100bp to +300bp; downsteram +301 to +2kb (PMID: 26279188)

score 1 · Answer 3 · 2016-09-20

Pausing index is per-transcript.
Nah, just use the PolII data, it'll be cleaner.

Regarding PolII ChIPseq vs. GROseq (there's also PROseq), I'm not really sure. In theory GRO or PROseq are more directly measuring what you want, but in practice I've not compared things to see if one is really yielding better data (though I have somewhat limited experience with these).