Question: expression analysis of miRNAs
0

Hey All

I am confused in on part of my analysis .

What I need to do is extract Up-regulated and down-regulated miRNA's from my data frame. I have data frame with 5 Samples A,B,C,D,E. A is parent (reference)sample and rest of samples are from patients. each row represents a miRNA and value against that row in each column represents Back ground subtraction values of that miRNA in each sample. so on the basis of this I want to extract miRNA's which are up regulated and down-regulated in each sample.Since I have no replicates, there really aren't any statistical tests that make sense.  So want to  divide B, C, D, and E by A.  This gives me fold change for each sample with respect to sample A, the parent.  then I can filer my rows (where UP will be >1 and DOWN will be less than 1). I am also to do this for two columns and is not able to do that for 5 columns.

My data Look like

A                        B                      C                   D                     E

hsa-miR-199a-3p, hsa-miR-199b-3p               NA                    13.13892            5.533703        25.67405             NA
hsa-miR-365a-3p, hsa-miR-365b-3p              15.70536             52.86558          18.467540        223.51424    31.93503
hsa-miR-3689a-5p, hsa-miR-3689b-5p            NA                    21.41597           5.964772         NA              24.26073
hsa-miR-3689b-3p, hsa-miR-3689c                 9.58696             44.56490          10.102051       13.26785             NA

hsa-miR-4520a-5p, hsa-miR-4520b-5p         18.06865             28.06991              NA                 NA                   NA
hsa-miR-516b-3p, hsa-miR-516a-3p                NA                     10.77471           8.039662          NA                    NA

now I want to firstly divide B/C/D/E with A

but I have to take care of these conditions.

if ( B &&C && D &&  E)==NA ---> result is NA

now I will take B&C (expression of C with respect to B  (C/B)

if numerator(C) is NA --->result=NA

if denominator (B) is NA ---->result=value of C (numerator) <<<- why because when I will compare C with respect to B if miRNA was expressed in B but not expressed in C then result should be NA and if miRNA was not expressed in B but it expressed in C then result should be C (Updated value of  that miRNA)

else I will simply divide (C/B) and will store in result . Now result should be divided with D

result/D with same conditions of NA of numerator and denominator and again the result should updated and again should be divided with E with updated value and same NA conditions.

A                        B                      C                   D                     E

Lest suppose                        18.06865                 28.06991              NA              441.00                   NA

B/C/D/E

B/C ------>result=NA

D/result(updated)=441.00 (Updated)

E/441.00=NA.

now I can divide that result with A ----> result/A======== NA

I would really appreciate your help

Best

R • 1.8k views
modified 4.9 years ago • written 4.9 years ago by adnanjaved198860

Something/NA==NA

Aside from that, it's really unclear what your question is.

Hey Devon Ryan

Sorry I know its bit confusing or may be how I explained it making you confuse.

Below is the code for two columns. I want to check Up and down regulation of miRNAs. Possibilities are

compare between two samples. or compare between all samples

suppose if A is parent and B is disease sample and u want to see if this miRNA is either up regulated or down regulated in patient if you have no replicate you would do like that

B/A (expression of B with respect to A). but when you have NA values in data you have to deal with different conditions I have to include NA values in data otherwise I would have replaced them with 0 or would have removed them. for that I mentioned different condition on my post while comparing 4 columns at a same time.

this is code for two columns and now I want to compare 4 columns

file = list.files(pattern = ".*.txt")
d<-data.frame(d)
rnames <- as.matrix(d[1:2019,1])
d1<-as.matrix(d[1:2019,c(4,12,20,28,36)])
rownames(d1)<-rnames
d1<-data.frame(d1)

colnames(d1)<-c("A","B","C","D","E")

tem<-d1[,2]
tem<-data.frame(tem)
div<-d1[,1]
div<-data.frame(div)
C<-data.frame(matrix(NA,nrow=2019,ncol=1))
for(i in 1:nrow(tem))
{
for(j in 1:ncol(tem))
{
ifis.na(tem[i,j]) && is.na(div[i,j]))
{
C[i,j]=NA
}
else ifis.na(tem[i,j]))
#|| is.na(div[i,j]))
{
C[i,j]=div[i,j]
}
else ifis.na(div[i,j]))
{
C[i,j]=tem[i,j]
}
else
{
C[i,j]<-tem[i,j]/div[i,j]
}
}
}
colnames(C)<-c("Regulation")
ab<-cbind(div,tem,C)
colnames(ab)<-c("A","B","res")

I want to do that for 4

B/C/D/E

I Am  really not a good programmer :-/  I am getting more confused . What I want to request to u if you can write code when will give me the final result of B/C/D/E in regulation column don't consider A.  lets say I have one data frame and I add new column in It result. what I am trying to do is I will divide B and C and D and E and store in result and when I will be done with first row . Forget about previous explanations :) see this  may be  I am able to tell you.

so my result values after fulfilling the conditions should be like this . for first Row

C/B--->result =   `0.4211688`  then  `D/0.4211688=       `

`25.67405`/0.4211688= `60.95905 and finally E/ 60.95905 which will be `
`NA/ 60.95905 and final value in result should be NA.`

for second row

```18.467540/52.86558 =`0.3493301`
```
`223.51424` /0.3493301  = `639.8368`
`31.93503/639.8368    =0.04991121`

similarly for  5TH row                                               A                 B               C                     D                E

`hsa-miR-3689b-3p, hsa-miR-3689c     9.58696 44.56490 10.102051  13.26785     NA`
`10.102051/ 44.56490 = 0.2266818`
`13.26785 / 0.2266818 =58.53072`
` NA     /58.53072 =    NA`
```d
A        B         C         D
hsa-miR-199a-3p, hsa-miR-199b-3p         NA 13.13892  5.533703  25.67405
hsa-miR-365a-3p, hsa-miR-365b-3p   15.70536 52.86558 18.467540 223.51424
hsa-miR-3689a-5p, hsa-miR-3689b-5p       NA 21.41597  5.964772        NA
hsa-miR-3689b-3p, hsa-miR-3689c     9.58696 44.56490 10.102051  13.26785
hsa-miR-4520a-5p, hsa-miR-4520b-5p 18.06865 28.06991        NA        NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA 10.77471  8.039662        NA
E
hsa-miR-199a-3p, hsa-miR-199b-3p         NA
hsa-miR-365a-3p, hsa-miR-365b-3p   31.93503
hsa-miR-3689a-5p, hsa-miR-3689b-5p 24.26073
hsa-miR-3689b-3p, hsa-miR-3689c          NA
hsa-miR-4520a-5p, hsa-miR-4520b-5p       NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA
```

Thank you so much for your help I really appreciate your time :)

why  I was writing conditions as I am looking for fold change

If one miRNA in Sample was not expressed but in next sample it expressed then I have to mention its new value

`                    A        B         C         D    E`
```
NA     10.77471  8.039662    NA    6.22```
` 8.039662/10.77471=0.7461604`

NA/0.7461604=NA

But now in E it expressed so  if I will do the same

`6.22/NA the result would be NA which is Not right result should be `
`6.22  which shows that miRNA expressed in E sample `

Ah, so you want some sort of cumulative ratio. I'd have to think of the best way to do that, since it's such an uncommon thing to want to do. I suppose one could `apply()` a function to subset you initial matrix into a list of submatrices and then `lapply()` a function to just `apply()` the cumulative ratio to the rows using a for loop. You might just give that a try.

0
Devon Ryan92k wrote:

OK, so I'll restate your problem in a single sentence: "In R when computing the ratio between values in a dataframe and a vector, is there a way to replace resulting NA values with either the vector or dataframe values when one of the latter is not NA?"

This, then, becomes a simple data processing problem. Let us suppose that your values are in a dataframe named `d`:

```> d
A        B         C         D
hsa-miR-199a-3p, hsa-miR-199b-3p         NA 13.13892  5.533703  25.67405
hsa-miR-365a-3p, hsa-miR-365b-3p   15.70536 52.86558 18.467540 223.51424
hsa-miR-3689a-5p, hsa-miR-3689b-5p       NA 21.41597  5.964772        NA
hsa-miR-3689b-3p, hsa-miR-3689c     9.58696 44.56490 10.102051  13.26785
hsa-miR-4520a-5p, hsa-miR-4520b-5p 18.06865 28.06991        NA        NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA 10.77471  8.039662        NA
E
hsa-miR-199a-3p, hsa-miR-199b-3p         NA
hsa-miR-365a-3p, hsa-miR-365b-3p   31.93503
hsa-miR-3689a-5p, hsa-miR-3689b-5p 24.26073
hsa-miR-3689b-3p, hsa-miR-3689c          NA
hsa-miR-4520a-5p, hsa-miR-4520b-5p       NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA```

So we could simply do the following:

```l <- lapply(c(1:5), function(x) as.matrix(d[,-x]/d[,x])) #There has to be nicer way to do this!
l2 <- mapply(function(x, y) {x[is.na(x)] <- as.matrix(d[,-y])[is.na(x)]; x}, l, c(1:5), SIMPLIFY=F)
l3 <- mapply(function(x, y) {x[is.na(x)] <- rep(d[,y], ncol(d[,-y]))[is.na(x)]; x}, l2, c(1:5), SIMPLIFY=F)```

I kept the various steps of creating the lists (l, l2, and l3) so you can follow along. I've not heavily tested that.

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Devon Ryan92k

Hey Devon Ryan thanks for your effort. But I am sorry I really didn't understand what your did. In My code what I was doing I was checking the condition and was updating results according to the condition. I used two loops one for row one for column and when I was comparing two values in Regulation column I replace Value with either numerator or denominator. C is basically that data frame where my result values are being stored

The list `l` contains the five possible sets of fold changes (all except A vs. A, all except B vs B, etc.). In `l2`, the `NA` values from these ratios are replaced by values from the numerator. In `l3`, values that are still `NA` are replaced by whatever was the denominator. Perhaps those steps should be swapped, I'd have too look.

In general, you should avoid `for` loops in R and other functional programming languages, they have terrible performance.