What are the 4th and 5th column in ENCODE DNase enrichment files?
2.4 years ago
hivemind ▴ 10

Hello, I have some problems figuring out how to interpret DNase enrichment files (bed format) downloaded from ENCODE. The first three columns are straight forward (chrom, start, stop), but I don't understand what the 4th and 5th column values are and how to interpret them.

The bed files look similar to this:

chr1    181393  181399  i   0.000805238
chr1    181399  181401  i   0.000517216
chr1    181401  181403  i   0.000211336
chr1    181403  181408  i   0.000134738
chr1    181408  181411  i   8.5816e-05
chr1    181411  181412  i   3.45057e-05
chr1    181412  181415  i   2.18045e-05


Does anybody know how to interpret these the last columns?

Thanks.

Is the 4th column always "i"? The 5th column is almost certainly p-values or FDRs from peak calling.

It seems to be the case, at least for the files I inspected so far.

I was wondering, if I could use the values of the 5th column for further filtering. I guess for that I need to know what I'm dealing with.

They are likely already filtered to some degree, especially since ENCODE tends to use IDR whenever possible.