ChromHMM Output Descriptions
2
1
Entering edit mode
5.0 years ago
Sinji ★ 3.1k

I've been doing some work that involves characterizing potential chromatin states in a HCT116 cell model. I've successfully been able to run ChromHMM to identify chromatin states using a variety of histone markers and then overlap them with other datasets in order to double-check their annotation.

However, I am having a problem understanding some of the output files that ChromHMM automatically generates. Specifically _emissions.txt and _*.bed. I know there's a couple of people here that are really familiar with the software and could probably help me out.

I have already searched google, and read the ChromHMM manuscript, but neither provided answers.

ChromHMM • 4.0k views
3
Entering edit mode
5.0 years ago
Ryan Dale 4.9k

The _emissions.txt are the values that go into the _emissions.png figures. Each row is a state, each column is an input data file ("mark" or histone mark in the terminology of ChromHMM). Darker blue indicates a higher likelihood of finding that mark in that state. These, combined with running OverlapEnrichment with biologically meaningful datasets, are critical for figuring out how to interpret the states.

The segments.bed file partitions the genome into contiguous segments, and the names of each feature in that file (E1, E2, etc) correspond to the states (1, 2, etc) in the _emissions.png.

A typical workflow is to figure out what to label each state. Then choose some colors and post-process the BED file with labels and names to get something more useful for downstream analysis.

0
Entering edit mode

Appreciate the information!

Do the emission values go directly on the png, or do they first have to be modified in some way? I have some values of 0.02 as an example, but a 6 in others. Would the 0.02 be treated as a 0?

0
Entering edit mode

Not sure if they're normalized in some way. To figure that out, you need to read the source code or try to reproduce the png given the txt file (and see what, if any, normalization needs to happen). Given the lack of a colormap though, my guess would be that each emissions.txt file is divided by the max of that file.

0
Entering edit mode

How to figure out the label of each state? I got the output of chromHMM,but can't find the annotation information of each state?Tanks

0
Entering edit mode

The label of each state is subjective. Coming up with good labels requires looking carefully at the enrichments (from running OverlapEnrichment) and emissions heatmaps to decide what you want to name them.

1
Entering edit mode
3.7 years ago
Roman Hillje ▴ 40

I'm currently studying how ChromHMM produces its output with a particular interest in the relationship between the enrichment values found in the _overlap.txt file and the colors in the heatmap (since I need to reproduce them). I went through the ChromHMM code and found that the heatmap is produced using the JHeatChart library found here: http://www.javaheatmap.com/documentation

The library itself does not perform any scaling/normalization of the input values. Instead, this is already done by ChromHMM. Unless specified differently, each column gets its own color scale. It subtracts the minimum value in the column and then divides by the maximum column value. The alternative option is a scale based on the values across all columns (activated through -uniformscale in the OverlapEnrichment command).

I hope this helps to understand the connection between values and heatmap colors. Yet, I'm still not sure how to interpret the enrichment values in the _overlap.txt file.