I have H3K4me3 and H3K27me3 datasets (replicates) from mouse ESCs. I generated a list of
- active promoters (K4me3+ K27me3-)
- bivalent promoters (K4me3+ K27me3+)
- nonActive promoters (K4me3- K27me3-)
I am working with protein coding Ensembl annotations, where unique TSS are included and I made a promoter set of it. I also have expression profiling from mouse ESCs and appended the isoform level expression values (using Tuxedo-Cufflinks/Cuffdiff) to the above promoter sets. What I see is a bit of discrepancy and would like your inputs.
64% of my promoters in the active list (according to histone marks) have significant H3K4me3 peak on them but are not expressed according to expression dataset. So, my questions are:
1. How Cufflinks/Cuffdiff calculates and how well it performs at isoform level calculations (considering lot of overlapping transcripts with nearby TSS from same or different genes). For example, 7 protein coding transcripts of gene Lypla1, 3 are expressed but 4 is not.
chr1 4806788 4808788 Lypla1 ENSMUST00000134384 + 1.7341 chr1 4806823 4808823 Lypla1 ENSMUST00000027036 + 66.4909 chr1 4806830 4808830 Lypla1 ENSMUST00000150971 + 0.6529 #this one chr1 4806896 4808896 Lypla1 ENSMUST00000119612 + 18.9053 chr1 4806898 4808898 Lypla1 ENSMUST00000137887 + 3.5388 chr1 4806911 4808911 Lypla1 ENSMUST00000115529 + 15.4683 chr1 4807237 4809237 Lypla1 ENSMUST00000131119 + 3.6185
2. According to this screenshot, if loaded the Ensembl transcript id (highlighted in red). This id according to expression data has a value of <1 FPKM (definitely not expressed) but there is a K4me3 sig. peak here. So, again how well are these isoform level calculations. For sure with normal ChIP protocols we cannot differentiate that his K4me3 peak is because of which trancsript but combining with expression profiling, we can depending upon how robust and credible the values are.
Thanks for your time