Correlation For Chip-Seq Data In R -Input Data Format
3
5
Entering edit mode
9.9 years ago
Bcw ▴ 60

Hello

I want to estimate the correlation between two peak data for Histone modification chip seq data using R. I have a bed file and how can I get the information about the mean,std deviation so that I can get the correlation coefficient between several data sets and also plot that.

How do I get the vector for bed file.

Thanks BCW

correlation chip-seq heatmap • 8.3k views
ADD COMMENT
7
Entering edit mode
9.9 years ago

Check out this recent publication by Zhao and Sandelin (2012). They provide an R package called 'GMD' for calculating the "similarity between spatial distributions of read-based sequencing data such as ChIP-seq and RNA-seq". They also provide a detailed set of case studies (vignettes) in CRAN using some ChIP-seq data to illustrate.

ADD COMMENT
0
Entering edit mode

Thanks a lot. This helped.

ADD REPLY
0
Entering edit mode

Hi

I installed the GMD package and worked on the example datasets. But they did not explain about how they got the data(infact referred other paper),meaning usually the chip seq data has chromosome,start and end position details. They started with the matrices that already had the distance coefficients for all 5 histone modificaion files in file called chipseq_mES.

My question is how do I convert my data (bed file) into the vectors of the distance/correlation coefficients.Please share any comments if you have got on workflow of how to get started with correlation btw histone chipseq data.

ADD REPLY
1
Entering edit mode
9.9 years ago
Bcw ▴ 60

Hi I installed the GMD package and worked on the example datasets. But they did not explain about how they got the data(infact referred other paper),meaning usually the chip seq data has chromosome,start and end position details. They started with the matrices that already had the distance coefficients for all 5 histone modificaion files in file called chipseq_mES. My question is how do I convert my data (bed file) into the vectors of the distance/correlation coefficients.Please share any comments if you have got on workflow of how to get started with correlation btw histone chipseq data.

Thanks for your help.

ADD COMMENT
1
Entering edit mode

(Great thanks to Obi for forwarding the message.) Hi BCW. The data "chipseq_mES" provided in the GMD package gives binned signal intensities of a set of six epi-marks which are saved in numerical vectors. We use those vectors as INPUT to compute the pairwise distance/dissimilarity. If you already have the signal in the "bed" format, one way to construct the binned signal is to use tools to convert bed files into wig files, which are actually histograms along the genome. Be aware that a histogram should include empty bins (if there's any) which are ignored in wig files.

ADD REPLY
0
Entering edit mode

I've forwarded your question to the GMD authors to see if they can help with this specific question. But looking at chipseq_mES, I don't think those are already distance coefficients. I think they represent peak heights at 1000 positions/bins for the 5 samples. They then calculate the pairwise correlations between those peak heights using gmdm function. So, maybe you just need to define some coordinate bins and sum/average signal for your bed files. What exactly is in your bed files, maybe provide a sample above.

ADD REPLY
1
Entering edit mode
ADD COMMENT
0
Entering edit mode

I got a question for correlation type analyses.

Why using coverage, (are we referring to genome coverage?) to calculate the coefficiency/correlation of chip-seq experiments and not using coordinates+read counts as in macs bed output?
If I have only the bed files can I use your proposed method?

Is there a tutorial that will explain the difference between the usage of coverage and read count per coordinates?

thank you

P.S. I hope that I am still in topic

 

ADD REPLY

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6