Question: What are these files?
6 months ago
ebs1524210 wrote:

I am interested in data deposited in GEO under GSE113047. I was hoping for a table of gene symbols and counts. Instead, there is a tar of .bed and .bedGraph files.

I have all sorts of questions.

Why would someone deposit bed and bedGraph files instead of a counts table? Is it more useful, or is it just easier?

Can I get a counts table out of these files?

Why are there two files? What does one do that the other doesn't?

bed files apparently contains a 'score' field. What is the meaning of that score?

bedGraph files apparently contain a 'dataValue' field. What is the meaning of that value?

What are "peaks"?

Do I need an external annotated genome to get counts of symbols? How would I know which reference genome to use?

Where can I find documentation that would explain these things?

Thanks so much, I'm finding this pretty confusing.

We may be able to help you even more if you tell us what exactly you are trying to address with that data set, i.e. why are you interested in it in the first place?

6 months ago
Memphis, TN
jared.andrews078.3k wrote:

This is ATAC-seq data, not RNA-seq data. So no, you will not be getting a gene counts file out of this data.

Reading up on ATAC-seq will provide a full explanation of the assay and how it works. It's a genome-wide assay used to identify accessible regions. There are many reviews and commercial pages that explain it.

The bedgraph format is a fairly popular one for data that is continuous along the genome, like ATAC-seq and ChIP-seq. Those files are used for visualizing the pileups in genome browsers like IGV or the UCSC genome browser. "Peaks" are regions with many reads, denoting that the region is accessible. They are typically stored as BED files since they are discrete regions.

Together, these files can be used to show differences between samples (whether a peak is called or not and to visually show differences in signal for a given region).

6 months ago
United States
GenoMax94k wrote:

That is an ATAC-seq dataset. Read more about the technique here if you are not familiar with it. One more resource. This review tell you how the data is analyzed and should be useful to understand the files you see associated with that dataset.

