Question: How Can I Create Similarity Matrix From Gene Expression And Rnaseq Data?
gravatar for samsara
6.0 years ago by
The Earth
samsara600 wrote:

I have gene expression (log2 lowess normalized) of different samples as follows

genes    sample1    sample2    sample3    sample4
g1    -0.25    -0.91275    -0.641    0.37
g2    1.3245    -2.126    7.495    -1.151
g3    0.31775    0.731    -1.151    0.182

I have rnaSeq (gene quanification data) with RPKM values

genes    sample1    sample2    sample3    sample4
g1    0.834890179247665    11.2357774823452    6.39239374912979    0.504388295468584
g2    0.1332993726877    1.09436685773882    0.00311332856051572    3.82123310407725
g3    0.764239475051307    0.609334844909887    0.107913669064748    2.71814202633155

How can i create similarity matrix (among samples) based on above available distributions ?

rnaseq • 4.5k views
ADD COMMENTlink modified 2.8 years ago by Biostar ♦♦ 20 • written 6.0 years ago by samsara600

If you wanna use correlation as similarity this can easily be done in R. Else, you have to define what 'similarity' means to you. For example you can calculate a norm between two values representing the same gene in different samples, i.e. ||x_ij - x_ik ||² . Doing that you end up with very small values for genes which have similar expression values in different samples and large values for genes which have different expression values.

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by Phil S.660

When you say "among samples" are you referring both the data matrices that you have? What I mean is, do you want to

a) Calculate a separate similarity matrix for the samples from your "gene expression (log2 lowess normalized)" data and separately for the "rnaSeq (gene quanification data)" OR b) Calculate the similarity among samples both within and across the two data you have.

In both the cases you can get the correlation coefficient (for example, pearson's correlation coefficient) for any two columns. You would also be interested in the p-value associated with the coefficient, which will tell you how statistically significant that value is.

ADD REPLYlink written 6.0 years ago by vj420
gravatar for User 1933
6.0 years ago by
User 1933340
User 1933340 wrote:

You need to measure the similarity. There are different methods, such as correlation, information theory and a other kernel methods. for example in R

let say you have a dataset like this

dt = replicate(3, rnorm(4))

there are bunch of kernel functions in the kernlab library

say we want to measure the similarity using rbf function.

rbf <- rbfdot(sigma = 0.05)
kernelMatrix(rbf, dt)

or for using correlation, you can simply use


you can make a one universal correlation matrix in different way, either merge them in the first step (how ever seems they have scaling issue, you might need to replace the values with rank - or instead first normalize them). Or after making the correlation matrix, you can bind them, using a dot product.

If you explain your further analysis, like what you want to do after making this similarity matrices I might be able to make more concrete suggestions.

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by User 1933340
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 785 users visited in the last hour