What are these files?
2
1
Entering edit mode
3.8 years ago
ebs15242 ▴ 10

I am interested in data deposited in GEO under GSE113047. I was hoping for a table of gene symbols and counts. Instead, there is a tar of .bed and .bedGraph files.

I have all sorts of questions.

Why would someone deposit bed and bedGraph files instead of a counts table? Is it more useful, or is it just easier?

Can I get a counts table out of these files?

Why are there two files? What does one do that the other doesn't?

bed files apparently contains a 'score' field. What is the meaning of that score?

bedGraph files apparently contain a 'dataValue' field. What is the meaning of that value?

What are "peaks"?

Do I need an external annotated genome to get counts of symbols? How would I know which reference genome to use?

Where can I find documentation that would explain these things?

Thanks so much, I'm finding this pretty confusing.
-Ed

geo bed bedgraph rnaseq • 917 views
ADD COMMENT
0
Entering edit mode

We may be able to help you even more if you tell us what exactly you are trying to address with that data set, i.e. why are you interested in it in the first place?

ADD REPLY
3
Entering edit mode
3.8 years ago

This is ATAC-seq data, not RNA-seq data. So no, you will not be getting a gene counts file out of this data.

Reading up on ATAC-seq will provide a full explanation of the assay and how it works. It's a genome-wide assay used to identify accessible regions. There are many reviews and commercial pages that explain it.

The bedgraph format is a fairly popular one for data that is continuous along the genome, like ATAC-seq and ChIP-seq. Those files are used for visualizing the pileups in genome browsers like IGV or the UCSC genome browser. "Peaks" are regions with many reads, denoting that the region is accessible. They are typically stored as BED files since they are discrete regions.

Together, these files can be used to show differences between samples (whether a peak is called or not and to visually show differences in signal for a given region).

ADD COMMENT
2
Entering edit mode
3.8 years ago
GenoMax 141k

That is an ATAC-seq dataset. Read more about the technique here if you are not familiar with it. One more resource. This review tell you how the data is analyzed and should be useful to understand the files you see associated with that dataset.

ADD COMMENT

Login before adding your answer.

Traffic: 1836 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6