This question below turned to be completely faulty. I don't have to do anything with DNase data for GRCh38. I asked it because of the file count difference between hg38 and hg37, which I thought to be too big. For hg38 there're 95 files *Peak.txt.gz. For hg37 there're 236 *narrowPeak.gz, but after merging pairs PkRep1 & PkRep2 (probably FASTQ(SE/PE) reps) we get only 123 files. Finally, this difference (123 & 95) no longer seems to be big and we have even cleaner situation without PkRep1 & PkRep2.
One again: there's no problem with DNase data for GRCh38 assembly and only my question was misleading. I'd like to apologise for the confusion I introduced.
I'm interesed in transciptional activity, thus I'm willing to use DNase hypersensitivity sites to detect regions where transcription factors are allowed to bind.
In previous genome assembly GRCh37 / hg19 I used to use narrow peaks files from these to sources (University of Washington and Duke University, respectively) (files with suffixes .narrowPeak.gz):
With the most contemporary assembly GRCh38 there're also some annotations attached (files with trailing Peak.txt.gz): http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/
And here four complementary question arise:
Consider only datasets, which come from University of Washington. For GRCh38 / hg19 I counted 236 narrow peak files, whereas for newer GRCh38 there're only 95 files. How to explain this differene? Do the datasets represent exactly the same coverage, but with much lower granularity / precision (datasets that come for several tissue lines are merged into fewer files)?
With GRCg37 / hg19 we have both narrow peaks as well as broad peaks, whereas GRCh38 comes with only one type of of file *Peak.txt.gz. Does it mean that with the newest version we have only narrow peaks? Are the broad peaks hidden somewhere else?
With GRCh37 / hg19 we have two separate sources of DNase data: UofW and Duke. For GRCh38, it seems that only UofW datasets are availabe. Is any other source of DNase data available, maybe stored separately (Duke or other lab)?
Let's suppose that you're in my place and you would like to determine cis-regulatory areas. What type of data can be used to do so? Mabey DNase datasets but from other source or even completly different type of data (NOT DNase)?
Thank you in advance for your answer.