What are these files?
Entering edit mode
23 months ago
ebs15242 ▴ 10

I am interested in data deposited in GEO under GSE113047. I was hoping for a table of gene symbols and counts. Instead, there is a tar of .bed and .bedGraph files.

I have all sorts of questions.

Why would someone deposit bed and bedGraph files instead of a counts table? Is it more useful, or is it just easier?

Can I get a counts table out of these files?

Why are there two files? What does one do that the other doesn't?

bed files apparently contains a 'score' field. What is the meaning of that score?

bedGraph files apparently contain a 'dataValue' field. What is the meaning of that value?

What are "peaks"?

Do I need an external annotated genome to get counts of symbols? How would I know which reference genome to use?

Where can I find documentation that would explain these things?

Thanks so much, I'm finding this pretty confusing.

geo bed bedgraph rnaseq • 526 views
Entering edit mode

We may be able to help you even more if you tell us what exactly you are trying to address with that data set, i.e. why are you interested in it in the first place?

Entering edit mode
23 months ago

This is ATAC-seq data, not RNA-seq data. So no, you will not be getting a gene counts file out of this data.

Reading up on ATAC-seq will provide a full explanation of the assay and how it works. It's a genome-wide assay used to identify accessible regions. There are many reviews and commercial pages that explain it.

The bedgraph format is a fairly popular one for data that is continuous along the genome, like ATAC-seq and ChIP-seq. Those files are used for visualizing the pileups in genome browsers like IGV or the UCSC genome browser. "Peaks" are regions with many reads, denoting that the region is accessible. They are typically stored as BED files since they are discrete regions.

Together, these files can be used to show differences between samples (whether a peak is called or not and to visually show differences in signal for a given region).

Entering edit mode
23 months ago
GenoMax 115k

That is an ATAC-seq dataset. Read more about the technique here if you are not familiar with it. One more resource. This review tell you how the data is analyzed and should be useful to understand the files you see associated with that dataset.


Login before adding your answer.

Traffic: 1352 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6