Question: Pearson Correlation test between two Chip-Seq data sets
0
gravatar for dally
4.4 years ago by
dally190
United States
dally190 wrote:

I was wondering how to run a pearson correlation test between two chip-seq data sets. I have narrowpeak files for both which are essentially bed files.

Is there any bioconductor packages in R that can do this? I've tried a variety of bedtools, and programs such as CHANCE, but they haven't given me what I required.

I used MACS2 to call peaks.

I used Bowtie to align to hg19.

Thanks!

pearsons chip-seq R • 5.7k views
ADD COMMENTlink modified 4.4 years ago by Constantine280 • written 4.4 years ago by dally190

Do you know what Pearson correlation is? It measures the collinearity between two one dimensional measures.

ADD REPLYlink modified 4 months ago by RamRS26k • written 4.4 years ago by karl.stamm3.6k

I am only vaguely familiar with the test. I've seen it on a large amount of papers recently while reading up on a new project. Essentially what I am trying to do as quoted from a paper is "find the number of uniquely aligned reads per Pol II gene for two complete biological replicates generated from * insert cell type * cells (Pearson's correlation, R = 0.##)

ADD REPLYlink written 4.4 years ago by dally190

There's no test. There's no hypothesis, and there's no p-value. The correlation is just a measurement.

ADD REPLYlink written 4.4 years ago by karl.stamm3.6k

You can window the data along the genome and calculate a correlation for the values in each window, but you definitely want to filter low windows. This post had some additional ideas.

ADD REPLYlink written 4.4 years ago by Madelaine Gogol5.1k
3
gravatar for Constantine
4.4 years ago by
Constantine280
USA
Constantine280 wrote:

Try the bamCorrelate function from the deeptools software

ADD COMMENTlink modified 4 months ago by RamRS26k • written 4.4 years ago by Constantine280
3

We have updated to deepTools 2.0 and now `bamCorrelate` is called `multiBamCoverage`. The output of that program can be used to compute a correlation (using `plotCorrelation`) of a PCA (using `plotPCA`). The new documentation is here:

 http://deeptools.readthedocs.org/en/latest/content/tools/multiBamCoverage.html 

ADD REPLYlink written 4.2 years ago by Fidel1.9k

Thank you, this was exactly what I needed.

ADD REPLYlink written 4.4 years ago by dally190
1
gravatar for karl.stamm
4.4 years ago by
karl.stamm3.6k
United States
karl.stamm3.6k wrote:

The "number of uniquely aligned reads per PolII gene" is really just a row of a table of integers.

Sample    Gene1    Gene2    Gene3
Rep1      10       20       30
Rep2      12       18       40

For that table, you can compute the correlation in R like

> cor(c(10,20,30),c(12,18,40))
[1] 0.9496528

and say they are 95% correlated.

So your first step is to generate the table, specifying gene column headers to help you count reads at each site.

ADD COMMENTlink modified 4 months ago by RamRS26k • written 4.4 years ago by karl.stamm3.6k

This makes much more sense. Is there any software that will generate this table for me? Or is this a sort of "gotta build your own custom script" thing? I do mostly wet lab work, so something with custom scripts is probably out of my ballpark.

ADD REPLYlink written 4.4 years ago by dally190

There's a lot of different Biostars questions asked and answered for counting reads. Look for "how to count reads". The most popular tool is probably "htseq-count"

I like coverageBed from bedtools. You give it the BAM and a BED annotating your regions of interest, and it spits out how many reads were in each site.

ADD REPLYlink modified 4 months ago by RamRS26k • written 4.4 years ago by karl.stamm3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1225 users visited in the last hour