10 months ago
kimkes25 ▴ 30

Hello. I am new to the subject of dna accessibility and have a project concerning dnase-seq results .

My task is to find accesability data for 5 different cell types.

Now,I know how to work with narrowpeak file format (here more about it).

What I currently do with it is:

  • index the file with tabix
  • serach for different segments from a list of queries ( in format : chrom chromstart chromend)
  • see if there is a result
  • if yes I store the score and signal value of it

The goal is to return an answer for each query and understand if it is in an open region.

One of the cells is cd4+ t cell. I found differend results on encode here .

I looked at differnet narrowpeak files from encode and all have this in common: there is no data in most coulmns(4,5,6,8,9,10 -according to the description about the narrowpeak file format here )

Why is that?

When I searched for data on UCSC ( found only 1 cell type I need) all the fields were full.

Another question I have- How can I know if results of the signal value in different files is on the same scale?

If anyone understands the subject and maybe can contact me, it would help alot.thank you

