Question

Encode Histone Modification Scoring

3

Entering edit mode

13.1 years ago

Emma ▴ 140

Hi all, I have downloaded the files wgEncodeBroadHistone from the ucsc downloads and I want to create a binary score for each region - signal enrichment/non enrichment. There are two kinds of files that I could get my data from 1) the bigWig files with a density score, but Im not sure what threshold I could use in order to call the signal of the region enriched (or not). And I guess its different for each marker (?). 2)the broadPeak file with the columns (correct me if Im wrong): "chr", "start", "end", "name", "score", "strand", "signalValue", "pValue", "qValue", but Im not sure what those values are and how I could use them for creating my variable. Any help would be much appreaciated! Thanks, Emma

encode • 3.5k views

ADD COMMENT • link updated 13.1 years ago by Gjain 5.8k • written 13.1 years ago by Emma ▴ 140

score 2 · Answer 1 · 2011-09-15

2

Entering edit mode

13.1 years ago

Gjain 5.8k

you can rank them and take say about top 25% or top 30% as a cutoff.

The peaks are already the at a certain cutoff in their peak calling pipeline. You can rank them and use them accordingly.

ADD COMMENT • link 13.1 years ago by Gjain 5.8k

0

Entering edit mode

Can you add some key references related to your answer ? Thanks !

ADD REPLY • link 13.1 years ago by Khader Shameer 18k

0

Entering edit mode

Thanks Gjain, but I don't think that would do in my case. Firstly because Im not going to be looking genome wide, but in specific regions which are more likely to be regulatory so I would like a more consistent cut-off. And secondly because the distribution of the signal of marker h3k4me3, for example, (for chr19) is like:

Min. 1st Qu. Median Mean 3rd Qu. Max. 0.040 1.000 2.000 4.254 3.000 3828.000 which means that the 25% cut-off is going to include lots of regions without a signal of enrichment.

ADD REPLY • link 13.1 years ago by Emma ▴ 140

0

Entering edit mode

Sorry, the distribution came out illegible, so I retype: Min: 0.040, 1st Qu: 1.000, Median: 2.000,Mean: 4.254,3rd Qu.: 3.000,Max: 3828.000

ADD REPLY • link 13.1 years ago by Emma ▴ 140

0

Entering edit mode

what you can do then is to do SnowsPenultimateNormalityTest and check if the scores distribution is normal or not. If its normal then you can calculate the zscores and get for a particular p-value say 0.005 as a cutoff get the zscore threshold.

ADD REPLY • link 13.1 years ago by Gjain 5.8k