I have gene expression (log2 lowess normalized) of different samples as follows

```
genes sample1 sample2 sample3 sample4
g1 -0.25 -0.91275 -0.641 0.37
g2 1.3245 -2.126 7.495 -1.151
g3 0.31775 0.731 -1.151 0.182
...
```

I have rnaSeq (gene quanification data) with RPKM values

```
genes sample1 sample2 sample3 sample4
g1 0.834890179247665 11.2357774823452 6.39239374912979 0.504388295468584
g2 0.1332993726877 1.09436685773882 0.00311332856051572 3.82123310407725
g3 0.764239475051307 0.609334844909887 0.107913669064748 2.71814202633155
...
```

How can i create similarity matrix (among samples) based on above available distributions ?

If you wanna use correlation as similarity this can easily be done in R. Else, you have to define what 'similarity' means to you. For example you can calculate a norm between two values representing the same gene in different samples, i.e. ||x_ij - x_ik ||² . Doing that you end up with very small values for genes which have similar expression values in different samples and large values for genes which have different expression values.

When you say "among samples" are you referring both the data matrices that you have? What I mean is, do you want to

a) Calculate a separate similarity matrix for the samples from your "gene expression (log2 lowess normalized)" data and separately for the "rnaSeq (gene quanification data)" OR b) Calculate the similarity among samples both within and across the two data you have.

In both the cases you can get the correlation coefficient (for example, pearson's correlation coefficient) for any two columns. You would also be interested in the p-value associated with the coefficient, which will tell you how statistically significant that value is.