Question: how to get correlation between the counts over each gene at the same timepoint (two replicates)
1
20 months ago by
Lila M 830
UK
Lila M 830 wrote:

Hi everybody, I have the counts (obtained by HTSeq) for a lot of genes(~58,000) at different time points (replicates).

``````gene                           t1_S1    t1_S2
ENSG00000000003.14              0        0
ENSG00000000005.5               0        0
ENSG00000000419.12              1        3
[...]
``````

I woul like to calculate the correlation between the counts over each gene at the same timepoint to understand how reproducible the replication timing and progression is for each repeat. Any suggestions?

modified 20 months ago by Nicolas Rosewick9.0k • written 20 months ago by Lila M 830
1

Check out the `cor` function in R. Different kinds of correlation measures are available, including Spearman and Pearson.

1

This is what I am doing, but as I have a huge number of genes, R gets stuck . This is what I'm trying:

``````xx <- read.table(file="matrix_count", sep="\t", header = T)
cor(t(xx), method="pearson")
``````

any other suggestion?

ADD REPLYlink written 20 months ago by Lila M 830
1

Do I understand correctly that you aim to calculate 58000 correlation coefficients?

1

Read count correlation between samples

5
20 months ago by
Belgium, Brussels
Nicolas Rosewick9.0k wrote:

Do you want to test the correlation between the different timepoints or between the different genes.

Let say you have 10 timepoints and 58000 genes

To test the different timepoints :

``````cor(xx, method="pearson")
``````

will give you a 10x10 matrix , so 100 correlations calculation (even though I guess the `cor` function is smart and should not compute twice the `cor` function between col A and col B ; and between col B and col A ; thus 45 correlations should be computed)

To test the different genes (in a pairwise manner) :

``````cor(t(xx), method="pearson")
``````

here a 58,000 x 58,000 matrix , = 3.364e+09 correlations (or 1,681,971,000 correlations if `cor` function is smart). That's why R crashes, it will take to long to compute so many correlations.

Edit based on OP comments

Use the coefficent of variation : https://en.wikipedia.org/wiki/Coefficient_of_variation :

``````dat.coeff.var <- apply(dat,1,function(x){sd(x)/mean(x)})
``````
ADD COMMENTlink modified 20 months ago • written 20 months ago by Nicolas Rosewick9.0k
1

Maybe I miss explain what I want. I want to know the correlation for, lets say gene ENSG00000000003.14 in the two replicates, to see if there are differences in each replicate for each gene. I'm not interested in the correlation ENSG00000000003.14 and ENSG00000000005.5. Has more sense?

ADD REPLYlink written 20 months ago by Lila M 830
1

Ok so you want to check the correlation between replicates : then `cor(xx,method="pearson")`

ADD REPLYlink modified 20 months ago • written 20 months ago by Nicolas Rosewick9.0k

Not exactly, because it gives to me the cor between replicates, and what I want to know is if the counts for the gene ENSG00000000003.14 is different in t1_S1 and t1_S2 (and also for the others genes)

ADD REPLYlink written 20 months ago by Lila M 830
2

Use maybe the coefficent of variation : https://en.wikipedia.org/wiki/Coefficient_of_variation : `dat.coeff.var <- apply(dat,1,function(x){sd(x)/mean(x)})`

ADD REPLYlink modified 20 months ago • written 20 months ago by Nicolas Rosewick9.0k
1

that's exactly what I want! thanks!

ADD REPLYlink written 20 months ago by Lila M 830

ok great. I modified my answer to archive the right answer. If the answer suits you you can accept the question.

1

There is no correlation for a single pair of measures. The correlation between samples will give you a general view of how similar samples are, and you can plot the values to check outliers. However, you have to take into account sample sequencing depth.