Question: What are these files?
gravatar for ebs15242
6 months ago by
ebs1524210 wrote:

I am interested in data deposited in GEO under GSE113047. I was hoping for a table of gene symbols and counts. Instead, there is a tar of .bed and .bedGraph files.

I have all sorts of questions.

Why would someone deposit bed and bedGraph files instead of a counts table? Is it more useful, or is it just easier?

Can I get a counts table out of these files?

Why are there two files? What does one do that the other doesn't?

bed files apparently contains a 'score' field. What is the meaning of that score?

bedGraph files apparently contain a 'dataValue' field. What is the meaning of that value?

What are "peaks"?

Do I need an external annotated genome to get counts of symbols? How would I know which reference genome to use?

Where can I find documentation that would explain these things?

Thanks so much, I'm finding this pretty confusing.

bedgraph rnaseq bed geo • 213 views
ADD COMMENTlink modified 6 months ago by jared.andrews078.3k • written 6 months ago by ebs1524210

We may be able to help you even more if you tell us what exactly you are trying to address with that data set, i.e. why are you interested in it in the first place?

ADD REPLYlink written 6 months ago by Friederike6.7k
gravatar for jared.andrews07
6 months ago by
Memphis, TN
jared.andrews078.3k wrote:

This is ATAC-seq data, not RNA-seq data. So no, you will not be getting a gene counts file out of this data.

Reading up on ATAC-seq will provide a full explanation of the assay and how it works. It's a genome-wide assay used to identify accessible regions. There are many reviews and commercial pages that explain it.

The bedgraph format is a fairly popular one for data that is continuous along the genome, like ATAC-seq and ChIP-seq. Those files are used for visualizing the pileups in genome browsers like IGV or the UCSC genome browser. "Peaks" are regions with many reads, denoting that the region is accessible. They are typically stored as BED files since they are discrete regions.

Together, these files can be used to show differences between samples (whether a peak is called or not and to visually show differences in signal for a given region).

ADD COMMENTlink written 6 months ago by jared.andrews078.3k
gravatar for GenoMax
6 months ago by
United States
GenoMax94k wrote:

That is an ATAC-seq dataset. Read more about the technique here if you are not familiar with it. One more resource. This review tell you how the data is analyzed and should be useful to understand the files you see associated with that dataset.

ADD COMMENTlink written 6 months ago by GenoMax94k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2375 users visited in the last hour