Question: expression analysis of miRNAs
0
gravatar for adnanjaved1988
4.9 years ago by
Germany
adnanjaved198860 wrote:

Hey All

I am confused in on part of my analysis .

What I need to do is extract Up-regulated and down-regulated miRNA's from my data frame. I have data frame with 5 Samples A,B,C,D,E. A is parent (reference)sample and rest of samples are from patients. each row represents a miRNA and value against that row in each column represents Back ground subtraction values of that miRNA in each sample. so on the basis of this I want to extract miRNA's which are up regulated and down-regulated in each sample.Since I have no replicates, there really aren't any statistical tests that make sense.  So want to  divide B, C, D, and E by A.  This gives me fold change for each sample with respect to sample A, the parent.  then I can filer my rows (where UP will be >1 and DOWN will be less than 1). I am also to do this for two columns and is not able to do that for 5 columns.

My data Look like

                                                                    A                        B                      C                   D                     E

 hsa-miR-199a-3p, hsa-miR-199b-3p               NA                    13.13892            5.533703        25.67405             NA
hsa-miR-365a-3p, hsa-miR-365b-3p              15.70536             52.86558          18.467540        223.51424    31.93503
hsa-miR-3689a-5p, hsa-miR-3689b-5p            NA                    21.41597           5.964772         NA              24.26073
hsa-miR-3689b-3p, hsa-miR-3689c                 9.58696             44.56490          10.102051       13.26785             NA  

hsa-miR-4520a-5p, hsa-miR-4520b-5p         18.06865             28.06991              NA                 NA                   NA
hsa-miR-516b-3p, hsa-miR-516a-3p                NA                     10.77471           8.039662          NA                    NA

now I want to firstly divide B/C/D/E with A

but I have to take care of these conditions.

if ( B &&C && D &&  E)==NA ---> result is NA

now I will take B&C (expression of C with respect to B  (C/B)

if numerator(C) is NA --->result=NA

if denominator (B) is NA ---->result=value of C (numerator) <<<- why because when I will compare C with respect to B if miRNA was expressed in B but not expressed in C then result should be NA and if miRNA was not expressed in B but it expressed in C then result should be C (Updated value of  that miRNA)

else I will simply divide (C/B) and will store in result . Now result should be divided with D

result/D with same conditions of NA of numerator and denominator and again the result should updated and again should be divided with E with updated value and same NA conditions.

                                                  A                        B                      C                   D                     E

Lest suppose                        18.06865                 28.06991              NA              441.00                   NA

B/C/D/E

B/C ------>result=NA

D/result(updated)=441.00 (Updated)

E/441.00=NA.

now I can divide that result with A ----> result/A======== NA

 I would really appreciate your help

Best

Adnan Javed

 

 

 

 

 

R • 1.8k views
ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by adnanjaved198860

Something/NA==NA

Aside from that, it's really unclear what your question is.

ADD REPLYlink written 4.9 years ago by Devon Ryan92k

Hey Devon Ryan 

Sorry I know its bit confusing or may be how I explained it making you confuse. 

Below is the code for two columns. I want to check Up and down regulation of miRNAs. Possibilities are 

compare between two samples. or compare between all samples

suppose if A is parent and B is disease sample and u want to see if this miRNA is either up regulated or down regulated in patient if you have no replicate you would do like that

B/A (expression of B with respect to A). but when you have NA values in data you have to deal with different conditions I have to include NA values in data otherwise I would have replaced them with 0 or would have removed them. for that I mentioned different condition on my post while comparing 4 columns at a same time.

this is code for two columns and now I want to compare 4 columns

file = list.files(pattern = ".*.txt")
d = lapply(file,function(x)read.table(x, header=T,sep="\t"))
d<-data.frame(d)
rnames <- as.matrix(d[1:2019,1])
d1<-as.matrix(d[1:2019,c(4,12,20,28,36)])
rownames(d1)<-rnames
d1<-data.frame(d1)

colnames(d1)<-c("A","B","C","D","E") 

tem<-d1[,2]
tem<-data.frame(tem)
div<-d1[,1]
div<-data.frame(div)
C<-data.frame(matrix(NA,nrow=2019,ncol=1))
 for(i in 1:nrow(tem))
{
  for(j in 1:ncol(tem))
  {
    ifis.na(tem[i,j]) && is.na(div[i,j]))
    {
      C[i,j]=NA
    }
    else ifis.na(tem[i,j])) 
     #|| is.na(div[i,j]))
    {
      C[i,j]=div[i,j]
    }
      else ifis.na(div[i,j]))
    {
      C[i,j]=tem[i,j]
    }
    else
    {
    C[i,j]<-tem[i,j]/div[i,j]
    }
  }
}
colnames(C)<-c("Regulation")
ab<-cbind(div,tem,C)
colnames(ab)<-c("A","B","res")

 I want to do that for 4

B/C/D/E

ADD REPLYlink written 4.9 years ago by adnanjaved198860

I Am  really not a good programmer :-/  I am getting more confused . What I want to request to u if you can write code when will give me the final result of B/C/D/E in regulation column don't consider A.  lets say I have one data frame and I add new column in It result. what I am trying to do is I will divide B and C and D and E and store in result and when I will be done with first row . Forget about previous explanations :) see this  may be  I am able to tell you.

so my result values after fulfilling the conditions should be like this . for first Row

C/B--->result =   0.4211688  then  D/0.4211688=      

25.67405/0.4211688= 60.95905 and finally E/ 60.95905 which will be 
NA/ 60.95905 and final value in result should be NA.

for second row

18.467540/52.86558 =0.3493301 
223.51424 /0.3493301  = 639.8368
31.93503/639.8368    =0.04991121

similarly for  5TH row                                               A                 B               C                     D                E

hsa-miR-3689b-3p, hsa-miR-3689c     9.58696 44.56490 10.102051  13.26785     NA
10.102051/ 44.56490 = 0.2266818
13.26785 / 0.2266818 =58.53072
 NA     /58.53072 =    NA
d
                                          A        B         C         D
hsa-miR-199a-3p, hsa-miR-199b-3p         NA 13.13892  5.533703  25.67405
hsa-miR-365a-3p, hsa-miR-365b-3p   15.70536 52.86558 18.467540 223.51424
hsa-miR-3689a-5p, hsa-miR-3689b-5p       NA 21.41597  5.964772        NA
hsa-miR-3689b-3p, hsa-miR-3689c     9.58696 44.56490 10.102051  13.26785
hsa-miR-4520a-5p, hsa-miR-4520b-5p 18.06865 28.06991        NA        NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA 10.77471  8.039662        NA
                                          E       
hsa-miR-199a-3p, hsa-miR-199b-3p         NA
hsa-miR-365a-3p, hsa-miR-365b-3p   31.93503
hsa-miR-3689a-5p, hsa-miR-3689b-5p 24.26073
hsa-miR-3689b-3p, hsa-miR-3689c          NA
hsa-miR-4520a-5p, hsa-miR-4520b-5p       NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA

Thank you so much for your help I really appreciate your time :)

 

ADD REPLYlink written 4.9 years ago by adnanjaved198860

why  I was writing conditions as I am looking for fold change

If one miRNA in Sample was not expressed but in next sample it expressed then I have to mention its new value

                    A        B         C         D    E
 
                    NA     10.77471  8.039662    NA    6.22
 8.039662/10.77471=0.7461604

NA/0.7461604=NA

But now in E it expressed so  if I will do the same

6.22/NA the result would be NA which is Not right result should be 
6.22  which shows that miRNA expressed in E sample 
ADD REPLYlink written 4.9 years ago by adnanjaved198860

Ah, so you want some sort of cumulative ratio. I'd have to think of the best way to do that, since it's such an uncommon thing to want to do. I suppose one could apply() a function to subset you initial matrix into a list of submatrices and then lapply() a function to just apply() the cumulative ratio to the rows using a for loop. You might just give that a try.

ADD REPLYlink written 4.9 years ago by Devon Ryan92k
0
gravatar for Devon Ryan
4.9 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

OK, so I'll restate your problem in a single sentence: "In R when computing the ratio between values in a dataframe and a vector, is there a way to replace resulting NA values with either the vector or dataframe values when one of the latter is not NA?"

This, then, becomes a simple data processing problem. Let us suppose that your values are in a dataframe named d:

> d
                                          A        B         C         D
hsa-miR-199a-3p, hsa-miR-199b-3p         NA 13.13892  5.533703  25.67405
hsa-miR-365a-3p, hsa-miR-365b-3p   15.70536 52.86558 18.467540 223.51424
hsa-miR-3689a-5p, hsa-miR-3689b-5p       NA 21.41597  5.964772        NA
hsa-miR-3689b-3p, hsa-miR-3689c     9.58696 44.56490 10.102051  13.26785
hsa-miR-4520a-5p, hsa-miR-4520b-5p 18.06865 28.06991        NA        NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA 10.77471  8.039662        NA
                                          E
hsa-miR-199a-3p, hsa-miR-199b-3p         NA
hsa-miR-365a-3p, hsa-miR-365b-3p   31.93503
hsa-miR-3689a-5p, hsa-miR-3689b-5p 24.26073
hsa-miR-3689b-3p, hsa-miR-3689c          NA
hsa-miR-4520a-5p, hsa-miR-4520b-5p       NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA

So we could simply do the following:

l <- lapply(c(1:5), function(x) as.matrix(d[,-x]/d[,x])) #There has to be nicer way to do this!
l2 <- mapply(function(x, y) {x[is.na(x)] <- as.matrix(d[,-y])[is.na(x)]; x}, l, c(1:5), SIMPLIFY=F)
l3 <- mapply(function(x, y) {x[is.na(x)] <- rep(d[,y], ncol(d[,-y]))[is.na(x)]; x}, l2, c(1:5), SIMPLIFY=F)

I kept the various steps of creating the lists (l, l2, and l3) so you can follow along. I've not heavily tested that.

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Devon Ryan92k

Hey Devon Ryan thanks for your effort. But I am sorry I really didn't understand what your did. In My code what I was doing I was checking the condition and was updating results according to the condition. I used two loops one for row one for column and when I was comparing two values in Regulation column I replace Value with either numerator or denominator. C is basically that data frame where my result values are being stored

ADD REPLYlink written 4.9 years ago by adnanjaved198860

The list l contains the five possible sets of fold changes (all except A vs. A, all except B vs B, etc.). In l2, the NA values from these ratios are replaced by values from the numerator. In l3, values that are still NA are replaced by whatever was the denominator. Perhaps those steps should be swapped, I'd have too look.

In general, you should avoid for loops in R and other functional programming languages, they have terrible performance.

ADD REPLYlink written 4.9 years ago by Devon Ryan92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1170 users visited in the last hour