ChromHMM: chromatin states to genome % coverage?
Entering edit mode
4.5 years ago
Sy80 ▴ 10

Hi all,

I'm trying to wrap my head around ChromHMM. short version: I can't gather from the ChromHMM/ENCODE papers how one goes from chromatin state to % genome coverage of states across different cell lines?

long version: We have three conditions (wild-type, het, ko cells) with 6 histone mark ChIP-seq data per condition.

We are trying to create a joint model using all the data (18 histone marks total) from our peak calls and then use the jointly learned chromatin states to determine genome coverage per chromatin state for our 3 conditions (to determine similarities/differences between conditions).

First, we merged all our peaks (each histone mark separately) for all conditions and created a virtual chromosome per histone mark and used this virtual genome to learn a 12-state model.

My question is, how can we use the joint chromatin states to get genome coverage per state and per cell line separately??

Hope my question was clear...I appreciate any input. Thanks!

ChIP-Seq ChromHMM • 2.1k views
Entering edit mode

Hi @Sy80, Even I am looking for answer to my similar question. Was just curious to ask were you able to get it done. I'll appreciate if you can share your experience here and the strategy you followed to get this done.


Entering edit mode

If you set up your tissue marks file to define each cell line separately, then ChromHMM should have produced segmentation files specific for each cell line. There should also be a file called CellLine_12_coverage.txt for each cell line, which has a genome coverage column.

Entering edit mode
5 months ago
Yussuf • 0

Do it in Rstudio, here is a code I wrote:

Call the file

x<- read.table("segments.bed", sep = "\t")

Kepping only the rows with the same state, for me they were 7 states, the segments bed files should have 4 columns.

E1 <- x[x$V4 == "E1", ] E2 <- x[x$V4 == "E2", ] E3 <- x[x$V4 == "E3", ] E4 <- x[x$V4 == "E4", ] E5 <- x[x$V4 == "E5", ] E6 <- x[x$V4 == "E6", ] E7 <- x[x$V4 == "E7", ]

subtracting V4-V3 to extract the interval bp

E1 ['interval_size'] <- (E1$V3 - E1$V2) E2 ['interval_size'] <- (E2$V3 - E2$V2) E3 ['interval_size'] <- (E3$V3 - E3$V2) E4 ['interval_size'] <- (E4$V3 - E4$V2) E5 ['interval_size'] <- (E5$V3 - E5$V2) E6 ['interval_size'] <- (E6$V3 - E6$V2) E7 ['interval_size'] <- (E7$V3 - E7$V2)

total sum of our new column in each state

sum(E1[, 'interval_size']) sum(E2[, 'interval_size']) sum(E3[, 'interval_size']) sum(E4[, 'interval_size']) sum(E5[, 'interval_size']) sum(E6[, 'interval_size']) sum(E7[, 'interval_size'])

the total sum of all the intervals column should be the genome size in bp, calculate your percentages.

I know it's a bit old question, but I put the code here in case someone needs it.


Login before adding your answer.

Traffic: 2097 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6